Pushing the envelope

Thursday, June 21, 2018

Don't hire a DevOps coach - hire a DevOps change agent

Each day I get between three and five unsolicited emails from staffing firms who have spotted that I have "DevOps" in my LinkedIn profile or Dice/Indeed/etc. resume, and who are trying to fill a "DevOps coach" position.

But when I ask a few questions, I realize that they are really trying to fine a team coach, and really they just want what I would call a "tool jockey". There are lots of people who have learned some of the tools that are associated with DevOps today - AWS EC2 and S3 or maybe Azure or GCE, Chef or Puppet, Docker, maybe Lambda, and know a scripting language or two.

That's not a DevOps coach.

A coach is an expert. It is not someone who is new to this stuff, and it is someone who has used a range of tools, so that they are more than a one trick pony. They have also been around since before DevOps - long before - so that they have perspective and remember why DevOps came about. Otherwise, they don't really understand the problem that DevOps is trying to solve.

In a shift to continuous delivery and other DevOps practices, it is absolutely essential to have an experienced person guiding it. There are too many ways to get into serious trouble. I have seen things completely collapse from the weight of bad decisions, in the form of unmaintainable and brittle automation.

If you need tool engineers, hire them. But don't call them DevOps coaches. Get a real DevOps coach.

A very important reality is this: Very smart people just out of college who know the latest tools can rapidly create a mountain of unmaintainable code and "paint you into a corner" so that there is no way out.

The choices that are made are very important. Should we use a cloud framework? Should we eliminate our middle tier? Should we use Ruby, Java, or Javascript for out back end? Should we have a DevOps team? Should we have a QA function? How should Security work with our teams? Should teams manage their own environments and deployments? Should they support their own apps? Should we have project managers? - it goes on and on.

A tool jockey will not know where to even start with these questions. An experienced DevOps coach will.

Wednesday, April 18, 2018

Creating a lightweight Java microservice

In my recent post Creating a recommender microservice in Java 9, I lamented that if you build your Java app using Maven, it will include the entirety of all of the projects that your app uses—typically hundreds of Jars—making your microservice upwards of a Gb in size—not very “micro”. To solve that problem, one needs a tool that scans the code and removes the unused parts. There actually is a Maven plugin that tries to do that, called shade, but shade does not provide the control that is needed, especially if your app uses reflection somewhere within some library, which most do.

In this article I am going to show how to solve this for Java 8 and earlier. Java 9 and later support Java modules, including for the Java runtime, and that is a very different build process.

To reduce the Jar file footprint, I created a tool that I call jarcon, for Jar Consolidator. Jarcon uses a library known as the Class Dependency Analyzer (CDA), to perform the actual class dependency analysis: my tool merely wraps that functionality in a set of functions that let us call CDA from a command line, assemble just the needed classes, and write them all to a single Jar file, which we can then deploy.

Note that while Maven is the standard build tool for Java apps, I always wrap my Maven calls in a makefile. I do that because when creating a microservice, I need to call command like tools such as Docker, Compose, Kubernetes, and many other things, and make is the most general purpose tool on Unix/Linux systems. It is also language agnostic, and many of my projects are multi-language, especially when I use machine learning components.

To call jarcon to consolidate your Jar files, this is the basic syntax:
java -cp tools-classpath \
    com.cliffberg.jarcon.JarConsolidator \
    your-app-classpath \
    output-jar-name \
    jar-manifest-version \
    jar-manifest-name

Here is a sample makefile snippet that calls jarcon to consolidate all of my projectʼs Jar files into a single Jar file, containing only the classes that are actually used:

consolidate:
    java -cp $(JARCON_ROOT):$(CDA_ROOT)/lib/* \
        com.cliffberg.jarcon.JarConsolidator \
        --verbose \
        "$(IMAGEBUILDDIR)/$(APP_JAR_NAME):$(IMAGEBUILDDIR)/jars/*" \
        scaledmarkets.recommenders.mahout.UserSimilarityRecommender \
        $(ALL_JARS_NAME) \
        "1.0.0" "Cliff Berg"

The above example produces a 6.5Mb Jar file containing 3837 classes. This is in contrast to the 97.7Mb collection of 114 Jar files that would be included in the container image if jarcon were not used.

The components of the microservice container image are,

Our application Jars, and the Jars used by our application.
Java runtime.
Base OS.

In the example above, we have compressed #1 from 97.7Mb down to 6.5Mb, but the Java runtime still consumes many tens of Mb. The OS can vary a great deal: if we use, say, Centos, we are talking about 300Mb just for the OS. If instead we use Alpine Linux, then #3 is only about 20Mb. That leaves the Java runtime. To solve that we need the Java module system, which requires Java 9 or later. Java 9 also requires some different considerations for Maven. I will leave that for a future article.

Saturday, February 24, 2018

A deep learning DevOps pipeline

Many organizations today are using deep learning and other machine learning techniques to analyze customer behavior, recommend products to customers, detect fraud and other patterns, and generally use data to improve their business.
Still more organizations are dabbling with machine learning, but have difficulty moving those experiments into the mainstream of their IT processes: that is, how do they create a machine learning DevOps “pipeline”?

The difficulty is that machine learning programs do not implement business logic as testable units, and so “code coverage” cannot be measured in the normal way, since neural networks logically consist of “neurons”—aka “nodes”—instead of lines of functional code; nor is it possible to deterministically define “correct” outputs for some of the input cases. Worse, creating test data can be laborious, since a proper test data set for a machine learning application generally consists of 20% of the size of data used to train the application—which is often tens or hundreds of thousands of cases. A final challenge—and this is arguably the worst problem of all—is that neural networks often work fine for “normal” data but can produce very wrong results if the data is slightly off: identifying these “slightly off” cases can be difficult. To summarize, the problems pertain to,

Measuring coverage.
Generating test data.
Identifying apparently normal test cases that generate incorrect results.

In a paper last September, four researchers (Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana) explained how they solved these problems by treating them as an optimization problem. We have used their techniques to create a DevOps style automated testing pipeline. For the purpose of this article, I will confine the discussion to multilayer—so-called “deep”—neural networks.

Developing a deep neural network is an iterative process: one must first analyze the data to decide how to organize it, for example whether it should be clustered, categorized, or additional attributes derived, whether time or spatial correlations need to be accounted for (e.g., through convolution). After that, the neural network architecture must be chosen: how many layers, and the layer sizes, as well as the way in which the network learns (e.g., through back propagation). This process takes quite a long time, with each iteration measured by the success rate of the network when tested with an independent data set.

Thus, developing a neural network model is an exploratory process. However, many real neural network applications entail multiple networks, and often functional software code as well, making the networks part of a larger application system, with an entire team working together. In addition, before a change to a network can be tested, the modified network must be trained, and training is extremely compute intensive, often requiring hours of GPU time. These factors make an automated testing “pipeline” useful.

In the figure above, step 1 represents the process of network adjustment: think of it as a developer or data scientist making a change to the network to see if the change improves performance. Most likely the developer will test the change locally, using a local GPU; but that test is probably only cursory, possibly using a somewhat simplified network architecture, or trained with a small dataset. To do a real test, a much higher capacity run must be performed. The change must also be tested in the context of the entire application. To enable these things, the programmer saves their changes to a shared model definition repository, which is then accessed by an automated testing process.

What is unique about our process is that in step 2 we use the algorithm developed by Kexin et. al. to generate additional test cases. These additional test cases are derived by analyzing the network, and finding cases that produce anomalous results. We then execute the test suite (step 3), including the additional test cases, and we record which neural nodes were activated, producing a coverage metric (step 4). Finally, in step 5 we execute the test cases again (probably in parallel with the first execution), but using an independent network implementation: our expectation is that the results will be extremely similar, within a certain percentage, say one percent: any differences are examined manually to determine with is the “correct” result, which is then recorded and checked against the next time around. This process avoids having to manually inspect and label a large number of test cases.

We are still in the early days of using this process, but we have found that it dramatically improves the overall process of test case management, test case completeness assessment, and also reduces the turnaround time for testing model changes.

Tuesday, February 6, 2018

Low code and DevOps

So-called low-code and no-code frameworks enable non-programmers to create business applications without having to know programming languages such as Java, Python, and C Sharp. People sometimes ask me if this means that DevOps is not relevant.

The amount of code is not the issue. The issues are,

Do your apps interact?
Do your apps share data stores?
Do your apps change frequently?
Do you have multiple app teams?
Do you have severe lead time pressure?
Do you have very high pressure that things work?
Do you have large scale usage?

If any of these are true, then you begin to have a need for DevOps, and the more that are true, the greater the case for DevOps.

Whether your code is created via a drag-and-drop GUI, or through careful hand coding of Java, C Sharp, or some other language does not matter. Low-code platforms provide a runtime platform, which can be a SaaS service or an on-premises server, and so it is little different from a PaaS arrangement that is either in a cloud or on premises. The question of whether it is low-code or coded is irrelevant.

To illustrate, consider a low-code application, developed using a low-code tool such as Appian. Two of the Appian app types are site and Web API. Suppose that we create one of each: a site, and a separate Web API which the site uses. If we assume that our user base is a few thousand users, and that they work 9-5 five days a week, then we do not need 24x7 availability and so we can use weekends to perform upgrades, and it also means that we do not need to worry about scale, because handling a few thousand users will be pretty easy, and so we do not need a sophisticated scaling architecture. So far, it sounds like we do not need DevOps.

Now consider what happens when we add a few more apps, and a few more teams. Suppose some of those other apps use our Web API, and suppose our app uses some of their Web APIs. Also suppose that we need a new feature in our app, and it requires that one of the other Web APIs add a new method. Now suppose that we want to be able to turn around new features in two week sprints, so that every two weeks we have a deployable new release, which we will deploy on the weekend. Do we need DevOps?

We do, if we want to stay sane. In particular, we will want to have,

Continuous automated unit level integration testing for each team.
Automated regression tests against all of the Web APIs.
The ability of each team to perform automated integration tests that verify inter-dependent API changes and app changes.
The ability to stand up full stack test environments on demand, so that the above tests can be run whenever needed, without having to wait for an environment.

This is starting to sound a-lot like DevOps - and it is. At this point, we are well on our way to fully automated pipelines continuous delivery: we are doing DevOps, and the fact that code is created by drag-and-drop tools does not matter one bit, except that it means that our developers are more productive and hence we probably have an even higher rate of feature development - making the case for DevOps even greater.

What if non-programmers are creating their own apps? In that case, the question is, are they impacting each other? For example, are they modifying database schema, or data structures in NoSQL stores? If so, they you will be in serious trouble if you do not have DevOps practices in place. Are those non-programmers writing Web APIs? If so, then you have the same considerations.

It is an integration question: if you have lots of things, and they need to work together, and you cannot afford to take your time about it, then you need DevOps.

Saturday, December 16, 2017

Creating a recommender microservice in Java 9

Even if you don't know what a recommender is, you no doubt have used one: if you have bought products online, most sites now recommend purchases based on your prior purchases. For example, if you use Netflix, they will recommend movies and TV shows that you might like. Recommenders are very important for businesses because they stimulate more sales. They also help consumers by suggesting things that the consumer might like. Most people view that in a positive way, in contrast to unrelated ads that show up while the user is using viewing content.

Traditional recommenders

Traditional recommenders are based on statistical algorithms. These fall into two categories: (1) user similarity: algorithms that compare you to other users and guess that you might like what those other users like, and (2) item similarity: algorithms that compare catalog items, and guess that if you liked one item, you might like other similar items. Thus, in the first type of algorithm, the task is to compare you to other customers, usually based on your purchase history: customers are clustered or categorized. In the other approach, purchase items are compared and categorized. Combining these two techniques is especially powerful: it turns out that similar users can be used to recommend an array of similar items. There are more sophisticated approaches as well, including “singlular value decomposition” (SVD), which reduces the dimensionality of the item space by mathematically grouping items that are dependent on each other. There are also neural network algorithms. In this article I will focus on a user similarity approach, since what I most want to demonstrate is the microservice oriented implementation of a recommender. (In a future article, I will show how this approach can be applied to a neural network based recommender.)

Traditional recommenders use a purely statistical approach. For example, under an item similarity approach, a customer who purchases an item with categories “book”, “adventure”, and “historical” would be assumed to be potentially interested in other books that are also categorized as “adventure” and “historical”. In practice, matches are graded based on a similarity metric, which is actually a distance in the multi-dimensional category space: those items that are the shortest “distance” from a purchase item are deemed to be most similar.

For a user similarity approach, customers are profiled based on the categories of their purchases, and matched up with other customers who are most similar. The purchases of those other similar customers are then recommended to the current customer. Again, similarity is measured using “distance” in category space.

The challenge of building a traditional recommender therefore reduces to three aspects: (1) finding or creating a statistical library that can categorize items or users and measure metrics like “distance” between them, (2) finding a data processing system that can handle the volume of users and catalog items, and (3) tying those things together. In the open source world, the tools most commonly used for these tasks are the Lucene, Mahout, and SOLR packages from Apache.

The basic user similarity recommendation algorithm

I am focusing on user similarity because it is really the first step for a recommender: once you figure out the kinds of things someone might like, item similarity is then applicable. Item similarity alone is kind of limited because it has no way of expanding someone’s exposure to other kinds of items that they might like.

The basic algorithm for a user similarity recommender is as follows:

Neighborhood analysis:
Given user U,
For every other user Uother,
1. Compute a similarity S between U and Uother.
2. Retain the top users, ranked by similarity, as a neighborhood N.

Then,

User similarity, by item, scoped to the neighborhood:
For every item I that a user in N has a preference for, but for which U has no preference yet,
1. Compute every user Un in N that has a preference for I.
2. Incorporate Un’s preference for I, weightted by S, into a running average.

Below we will see what this looks like with Mahout’s APIs.

Search versus another real time algorithm

Many recommenders (most?) today rely on a two-step process, whereby a batch job (for example, using Mahout or Spark) pre-computes a set of similarities, and then indexes that via a search engine. The search engine is then used as the real time algorithm to fetch recommended (similar) items.

I am not going to use that approach here. The reason is that this article is really preparing for the next article on the recommender topic, which will use a neural network based algorithm, such that a pre-trained neural network is used to make real time recommendations. That is a much more powerful approach, and in this article I am laying the groundwork for that.

Why a microservice?

A microservice is a lightweight Web service designed to be massively scalable and easy to maintain. The microservice approach is highly applicable to a recommender:

Multiple recommenders can be tried and compared, being selected merely by their URL.
The approach can handle large volumes of requests, because microservices are stateless and so can be replicated, leveraging container technology or serverless technology.
The recommenders can be fine-tuned frequently, being re-deployed each time with minimal complexity.

In the interest of scalability, it is desirable to keep the footprint of a microservice small. Also, many organizations have in-house expertise in Java, and so if Java can be used, that widens the scope of who can work on the recommender.

The design

There are many technology stack choices for a Java Web application. The most popular is Spring Boot, and it is a very excellent framework. However, I will use SparkJava (not to be confused with Apache Spark) because it is extremely lightweight, and because it also has a wonderfully understandable API. Note that Spring Boot has things that SparkJava does not, such as a persistence framework, but for our machine learning microservice, the mathematical framework we will be using (Mahout—see below) has persistence, so that’s covered. We are also going to specifically address scaling in a particular way which addresses the unique needs of a recommender, which needs to perform very heavy offline data processing, so we would also not be using Spring Boot’s scaling mechanisms.

To give you an idea of how simple it is to use SparkJava to create a Web service, all I had to do is add this to the main method:

spark.Spark.port(8080);
spark.Spark.get("/recommend", "application/json", (Request request, spark.Response response) -> {
    ...handler code...
    // return any POJO message object
    return new MyResponseObject(...results...);
}, new JsonTransformer());

static class JsonTransformer implements spark.ResponseTransformer {
    private Gson gson = new Gson();
    public String render(Object responseObject) {
        return gson.toJson(responseObject);
    }
}

No weird XML jazz, no creating some funky directory structure: just call a few intuitive methods.

In order to build a recommender, one also has to decide on how to perform the statistical calculations. Again, there are many choices there, including writing your own. I chose the Apache Mahout framework because it is rich and powerful. The down side is that its documentation is fragmented and incomplete: if you use Mahout, expect to have to dig around to find things (for example, the API docs for MySQLJDBCDataModel are not with the other API docs), and expect to have to look at the framework’s source code in github. Most (but not all) of the APIs can be found here, but the API docs also do not tell you much—they are full of the notoriously unhelpful kinds of programmer comments such as “getValue() - gets the value”. Then again, it is open source, so it cannot be expected to be as well documented as, say, AWS’s APIs.

I also chose MySQL, because it is simple to set up and many people are familiar with it, and because Mahout has a driver for MySQL, so use of something like Hibernate is not necessary. (Mahout supports other database types as well, including some NoSQL databases.)

Creating a recommender with Mahout

Creating a recommender with Mahout is actually pretty simple. Consider the code below.

UserSimilarity similarity = new PearsonCorrelationSimilarity(this.model);
UserNeighborhood neighborhood =
new ThresholdUserNeighborhood(0.1, similarity, this.model);
UserBasedRecommender recommender =
new GenericUserBasedRecommender(model, neighborhood, similarity);
recommendations = recommender.recommend(2, 2);

This instantiates a recommender given a model, and implements the algorithms shown earlier. More on the model in a moment. Right now, notice the choice of similarity algorithm: PearsonCorrelationSimilarity. That similarity algorithm measures the covariance between each user’s item preferences. Some common alternative approaches are cosine similarity and Euclidean similarity: these are geometric approaches based on the idea that similar items will be “close together” in item space. The range of similarity algorithms supported by Mahout can be found here:

http://apache.github.io/mahout/0.10.1/docs/mahout-mr/org/apache/mahout/cf/taste/impl/similarity/package-frame.html

Note also the use of ThresholdUserNeighborhood. This selects users who are within a certain similarity range of each other: an alternative neighborhood algorithm is NearestUserNeighborhood, which selects the nearest set (in similarity) of a specified number of users.

Back to the model: To create a model, you need to prepare your reference data: the recommender does statistical analysis on this data to compare users, based on their preferences. For example, I created a model as follows:

File csvFile = new File("TestBasic.csv");
PrintWriter pw = new PrintWriter(csvFile);
Object[][] data = {
    {1,100,3.5},
    ...
    {10,101,2.8}
}
printData(pw, data);
pw.close();
this.model = new org.apache.mahout.cf.taste.impl.model.file.FileDataModel(csvFile);

Note that above, I used a file-based model, whereas I will use MySQL for the full example.

The data rows consist of values for [ user-id, item-id, preference ]. The first two fields are obvious; “preference” is a float value from 1 through 5 that indicates the user’s preference for the item, where 1 is the lowest preference.

To use a recommender, you have to first prepare it by allowing it to analyze the data. Thus, the general pattern for a recommender has two parts: a data analysis part, and a usage part. The analysis is usually done offline on a recurring basis, for example nightly or weekly. That part is not a microservice: it is a heavy-duty data processing program, typically running on a cluster computing platform such as Hadoop. In the sample code above, only the following line should be in the microservice:

recommendations = recommender.recommend(2, 2);

All of the lines that precede it perform the preparatory analysis: these would not be called as a microservice.

In the sample microservice that I show here, the preparatory analysis steps are performed when the microservice starts

Persisting the analyzed model—or not

For a trivial recommender, you can simply perform the analysis calculations in the main method of your application, and then publish a web service in that main method that can then use the trained model. However, that is not a very scalable approach: each deployed instance of your recommender would need to go through the data analysis.

To avoid that, you can separate the data analysis into a separate program (as shown in the architecture shown earlier: see the “Data Preparation” block), and persist the trained model to a distributed file system, such as Hadoop’s HDFS. Each microservice instance can then simply load the trained model at startup.

Mahout refers to the ability to persist a trained model as a persistence strategy. Unfortunately, at present the SVDRecommender is the only recommender in Mahout that has implemented a persistence strategy. An SVD recommender is an important class of recommender, based on a mathematical technique for identifying redundant degrees of freedom in the data and collapsing them out, so that one ends up with a more compact model. This is highly applicable for product recommenders when the product catalog is large. The mathematics for performing SVD are time intensive, and that is why a persistence strategy was implemented for the SVDRecommender. The others need one too, however.

Mahout also has an API called Refreshable: All of the Mahout Recommender classes (package org.apache.mahout.cf.taste.impl.recommender) implement the Refreshable interface, providing them with the refresh(Collection<Refreshable>) method. It is intended for updating models on the fly. It could in theory be used for injecting pre-analyzed matrices, but the current implementations do not support that: they re-compute the entire model based on updated source data. Thus, each newly launched container will have to go through a model data prep computation.

It’s not that bad: the model data prep calculations don’t take very long; SVD computation is an exception, and that is why it has a special persistence implementation. So all we need to do is call a recommender’s refresh() method with a null argument, and it will purge all derived objects, as well as the cache maintained by the MySQLJDBCDataModel class, causing it to lazily re-load data as needed to re-compute all derived objects. Newly launched containers containing the recommender will compute the derived objects from scratch.

Packaging as a container

To package the microservice as a container image, you must create a Dockerfile. (See here.)

If you create the container image based on Centos 7 and include JDK 8, the resulting image is 776MB. That’s not a very “micro” microservice. Much of the size comes from the Centos 7 Linux distro that we added to the container image. If you build from Alpine Linux and include only JRE 8, the resulting image is 468MB. The remaining size is due to two things: (1) all of the 113 third party Java jars that Maven thinks need to be added to the image, and (2) the size of the JRE.

It is a certainty that the application does not actually need all 113 jars. Maven determines what is needed based on the dependencies declared in pom files. However, the actual number of dependent jars is usually much smaller: your code usually only calls a small percent of the methods of each dependent project, and many projects have multiple jar files.

What we need is a Java method traverser, that can remove uncalled Java methods from the class files and then package all that as a single jar. I do not know of such a tool, however. The tools would also have to allow one to manually add method signatures for those that are known to be called by reflection. Or, if the tool works at runtime instead of through static analysis, it could gather that information automatically. Test tools such as Cobertura instrument the class files and track which code gets used at runtime: such a tool could easily track with methods never get called and then strip those from the class files—including those in the third party JARs. I wish someone would write a tool like that—I don’t have time to do so—we could have Java applications that are 20Mb instead of 200Mb.

I should mention that Java 9 introduces a very useful new feature for minimizing the footprint of a deployed Java application: the module system (formerly known as “Jigsaw”). The module system makes it possible to deploy only the pieces of the JDK runtime that are needed by an application, greatly reducing the deployed footprint. I did not use the module feature for this demonstration because it would not have made much of a difference: it might have saved 10Mb in Java standard modules, but none of the required external JARs are packaged as modules at this point, so our footprint would still be essentially what it is without modules.

Launching the container

In the sample code I launch MySQL and the recommender container with the Docker Compose tool, using these two Compose files:

test-bdd/docker-compose-mysql.yml
docker-compose.yml

I used Compose because it is a great tool for development. Normally you would launch the container using an orchestration system such as Kubernetes or native cloud infrastructure that provides elastic scaling. I will not go into that here because it is beyond the scope of this article, which is really about creating a recommender.

Full code

The full code for this sample microservice can be found in github at https://github.com/ScaledMarkets/recommender-tfidf

Sunday, August 27, 2017

SAFe is invaluable in a large organization DevOps or Agile transition

Don’t “do” Agile: instead, “be” Agile.

That’s the refrain of most Agile consultants when you ask them, How do I make Agile work in my large company?

The problem is, while the advice is correct, it is not very helpful. People need tangible, actionable advice. If you try to change their culture first, it won’t work: change management experts will tell you that culture is the hardest thing to change, and while behavior is a result of culture, rapid cultural change in an organization only follows directed behavioral change.

When SAFe came out circa 2011, it was met with scorn by much of the Agile community. One reason for this is that SAFe looked like a waterfall plan, with everything laid out. Agilists had learned that one cannot be so prescriptive, and so they were immediately suspicious of any big plan or all-encompassing methodology. If they had read through the SAFe documentation, however, they would have seen that it was really just a model—a way of looking at the problem. SAFe makes it very clear that contextual judgment should be applied.

Flash forward to today, and there is still some derision of SAFe in some elements of the Agile community. However, I have found that derision to come mostly from those who don’t understand SAFe. On the other hand, SAFe has helped many large organizations to get a handle on Agile.

There are risks with applying SAFe. Just as those who denounced SAFe had feared, rigid application of SAFe can be very damaging. For example, SAFe presumes that one has Agile technical practices well in hand—a cross functional team is not possible unless you have automated integration tests. But if one applies SAFe thoughtfully, it solves some big problems that team-level Agile methodologies such as Scrum do not address or even acknowledge. SAFe provides a model for thinking about what is missing from team-level Agile: i.e., it provides ideas for “How do I make Agile work in my large company?”, and therefore helps to define the discussions around what needs to change, and those discussions get you on the path to “being Agile”.

I have been doing Agile transformations for a long time. My own former company Digital Focus adopted eXtreme Programming (XP) in 2000. Since then I have been a principal thought leader or consultant in seven other Agile and DevOps transformations in very large organizations. What I have seen is that there is an enormous disconnect between the executive management of most non-IT companies and the realities that the IT groups experience. In short, the execs simply don’t have a clue how software is built or tested, or how one should manage those processes. They delegate it to “IT”, and don’t get involved. That used to work for waterfall, but it does not work for Agile, which, in the words of Gary Gruver, author of Leading the Transformation, “Executives can’t just manage this transformation with metrics.” (Kindle Locations 343-344) Today’s successful executives recognize that their technology platforms are strategic parts of their business and they make it their business to learn about those platforms. Leaders such as Jeff Bezos and Elon Musk epitomize this.

SAFe helps with that: it provides a model that identifies all of the many functions that must change to enable Agile to work—including the business functions.

The main fear of Agilists about SAFe is legitimate, however: Agile transformation is not a process change: it is a transformation of how people think about their work, and it requires a huge infusion of knowledge from the outside. It is primarily a training, mentoring, and collaborative growth activity. Thus, one cannot simply “implement” SAFe, or any organization-wide Agile process. One has to proactively grow into it. SAFe helps because it provides a taxonomy and defines the landscape of things that need to change: with SAFe, it is obvious that Agile is not just an “IT thing”, or confined to the software teams—it is organization-level in scope.

Effective Agile or DevOps transformation is not a process rollout. Effective change of that depth and magnitude requires working with managers and staff, on an individual basis, to help them to understand what Agile means for their job function, and think through how to change their processes. They need to own the change of their own processes. It takes time: as I said, it is a growth process.

Saturday, August 26, 2017

Companies waste so much money in the cloud

Cloud services are what made DevOps possible: cloud services make it possible to dynamically provision infrastructure and services, via scripts, and that is what makes continuous delivery possible. Cloud services make it possible to dynamically scale, and that is what makes it possible for organizations to economically deploy Internet-scale applications without having to operate their own data center.

Still, any tool can be misused. Cloud services enable one to scale economically, adjusting application size based on demand; but cloud services can also be over used, resulting in huge unnecessary costs. We have identified some of the way this typically happens with the goal being helping people ultimately right-size their cloud deployments.

Pay close attention to your development services and instances

Software development teams generally do not pay for their cloud accounts, and so they don’t have an incentive to manage the cost of those accounts. For example, it is not unusual for development teams to create VMs and leave them running, resulting in a large set of abandoned running VMs, running up the bill.

To avoid misuse of cloud services by development teams, you need someone who proactively manages those resources. Don’t take away the ability of teams to create resources in their cloud accounts; but instead, have someone who keeps track of what the programmers are doing, and what they are using the cloud resources for. That person should continually peruse resource usage, and ask, “What is this for? Do we need this?” Also, set up standards that make that job easier: e.g., establish a rule that all VMs should be tagged with their purpose and, if the VMs exist in a multi-project environment, also tag them with their project code - which helps to perform accounting. (Scaled Markets includes setting up such a tagging scheme as part of its Continuous Delivery Accelerator offering - see Accelerators.)

Beware third party services

Today, it is possible to outsource much of one’s application’s services, using so-called platform-as-a-service (PaaS) services, either provided by the cloud provider or by a third party. For example, instead of creating monitoring, one can use third party monitoring services, to verify that one’s app is still up. While this is convenient, it generally costs money for each check that gets performed, based on the frequency - so your app now has another taxi meter running, eating into your profits. As another example, most cloud providers now provide scalable messaging services that can feed your applications with event messages from IoT devices or from other applications. While the scalability of those services is a powerful value proposition, they generally are paid for per message - and so the bill can be quite large. Some PaaS services are priced based on their footprint, rather than on their usage: Azure’s HDInsight service is an example of that: you pay for an HDInsight cluster a fixed amount per hour, regardless of whether you are using it or not. Also, since it takes quite awhile to start up an HDInsight cluster, it is not something that you can just create on demand.

Design your apps for local testability

In a recent talk by Yunong Xiao of Netflix, about Netflix’s move to a PaaS model for the services that it provides to Netflix integrators such as Roku, Yunong mentions that they invest heavily in the ability of developers to “mock” components so that developers can perform integration testing (using mocks) before committing their code. This is something that is often lost on corporate teams that try to adopt DevOps: they find that their applications are not amenable to be stood up in a test mode in local integration test environments. In other words, their applications have not been “designed for testability”.

In order to be able to “shift left” one’s integration testing, so that it can be done upstream - ideally on developer laptops - prior to full scale integration testing by a QA team, it is necessary to design an application so that it can be stood up in a small footprint test mode. That’s what Yunong really means by “mocks”. For example, consider Apache Spark. Spark is used by Web services that need to implement extremely high volume stream processing. However, in order for a programmer to test Spark code, they don’t have to stand up a full scale Spark cluster: they can run Spark on their laptop in “standalone” mode - even in a Docker container (that is how we normally do it). One can also set the memory used by Spark: for production, one would set it to multiple Gigabytes, but for local integration testing, one can set it at something like 300M.

Use a cloud framework that can run locally

It is absolutely critical that when the architecture of a cloud-based application is conceived, that the architects and planners think through how it will be tested, and the costs involved. Again, this is design-for-testability.

For example, if your application uses PaaS services that are native to your cloud provider, can you run those services locally (in laptops or on-premises VMs)? - or are those PaaS services tied to the cloud - and therefore require you to pay for their use in each and every test environment?

This is a critical consideration, because in a continuous delivery process, testing is not only continuous, but it is also done with a-lot of parallelism. The parallelism occurs in two ways: (1) an integration test environment is often spun up whenever code is committed - often at a team or even single developer level - and, (2) regression test suites are often broken up into subsets that are run in parallel in order to reduce run time to a few hours or less. If you are paying for services being used by each test environment, you are going to be paying an awful lot of money.

That’s why frameworks such as Kubernetes, OpenShift, and Cloud Foundry let you create small local container clusters on a laptop, and there are also local “mini cloud” tools that can run on a laptop, so that you can first debug your application locally, and then deploy to a real cloud using, say Terraform. Lattice is one such “local cloud” system: in other words, you write your infrastructure code using a cloud-agnostic tool such as Terraform (which has a driver for Lattice), test it in your local cloud, and then deploy to a real cloud.

Why do this? It is so that individual developers can perform integration testing - before they commit their code changes. This is true “shift left” testing, and the result is very high developer throughput and a very short feature lead time. This is becoming standard DevOps practice today. However, for your developers to be able to do this, they need to be able to run the cloud services locally, and ideally for free. If you think that your application is too complex to ever run locally, think again: a typical laptop can run 20 containers without any problem. If you can’t run your integration tests locally, it is very likely that that is because your apps were not designed to be able to run locally in a test mode - and you can fix that with some re-engineering, and save a-lot of money.

Summary

Cloud services have enormous value for deploying flexible and scalable applications and services. However, it is essential to consider how those will be tested, and the costs involved. Applications must be designed to be testable; if you don’t carefully design a testing approach that economically uses your cloud services, and an application architecture that makes it possible to test economically (e.g., locally), then the cloud services can rapidly become a financial burden.

Cliff & Phil