Pushing the envelope: 2017

Saturday, December 16, 2017

Creating a recommender microservice in Java 9

Even if you don't know what a recommender is, you no doubt have used one: if you have bought products online, most sites now recommend purchases based on your prior purchases. For example, if you use Netflix, they will recommend movies and TV shows that you might like. Recommenders are very important for businesses because they stimulate more sales. They also help consumers by suggesting things that the consumer might like. Most people view that in a positive way, in contrast to unrelated ads that show up while the user is using viewing content.

Traditional recommenders

Traditional recommenders are based on statistical algorithms. These fall into two categories: (1) user similarity: algorithms that compare you to other users and guess that you might like what those other users like, and (2) item similarity: algorithms that compare catalog items, and guess that if you liked one item, you might like other similar items. Thus, in the first type of algorithm, the task is to compare you to other customers, usually based on your purchase history: customers are clustered or categorized. In the other approach, purchase items are compared and categorized. Combining these two techniques is especially powerful: it turns out that similar users can be used to recommend an array of similar items. There are more sophisticated approaches as well, including “singlular value decomposition” (SVD), which reduces the dimensionality of the item space by mathematically grouping items that are dependent on each other. There are also neural network algorithms. In this article I will focus on a user similarity approach, since what I most want to demonstrate is the microservice oriented implementation of a recommender. (In a future article, I will show how this approach can be applied to a neural network based recommender.)

Traditional recommenders use a purely statistical approach. For example, under an item similarity approach, a customer who purchases an item with categories “book”, “adventure”, and “historical” would be assumed to be potentially interested in other books that are also categorized as “adventure” and “historical”. In practice, matches are graded based on a similarity metric, which is actually a distance in the multi-dimensional category space: those items that are the shortest “distance” from a purchase item are deemed to be most similar.

For a user similarity approach, customers are profiled based on the categories of their purchases, and matched up with other customers who are most similar. The purchases of those other similar customers are then recommended to the current customer. Again, similarity is measured using “distance” in category space.

The challenge of building a traditional recommender therefore reduces to three aspects: (1) finding or creating a statistical library that can categorize items or users and measure metrics like “distance” between them, (2) finding a data processing system that can handle the volume of users and catalog items, and (3) tying those things together. In the open source world, the tools most commonly used for these tasks are the Lucene, Mahout, and SOLR packages from Apache.

The basic user similarity recommendation algorithm

I am focusing on user similarity because it is really the first step for a recommender: once you figure out the kinds of things someone might like, item similarity is then applicable. Item similarity alone is kind of limited because it has no way of expanding someone’s exposure to other kinds of items that they might like.

The basic algorithm for a user similarity recommender is as follows:

Neighborhood analysis:
Given user U,
For every other user Uother,
1. Compute a similarity S between U and Uother.
2. Retain the top users, ranked by similarity, as a neighborhood N.

Then,

User similarity, by item, scoped to the neighborhood:
For every item I that a user in N has a preference for, but for which U has no preference yet,
1. Compute every user Un in N that has a preference for I.
2. Incorporate Un’s preference for I, weightted by S, into a running average.

Below we will see what this looks like with Mahout’s APIs.

Search versus another real time algorithm

Many recommenders (most?) today rely on a two-step process, whereby a batch job (for example, using Mahout or Spark) pre-computes a set of similarities, and then indexes that via a search engine. The search engine is then used as the real time algorithm to fetch recommended (similar) items.

I am not going to use that approach here. The reason is that this article is really preparing for the next article on the recommender topic, which will use a neural network based algorithm, such that a pre-trained neural network is used to make real time recommendations. That is a much more powerful approach, and in this article I am laying the groundwork for that.

Why a microservice?

A microservice is a lightweight Web service designed to be massively scalable and easy to maintain. The microservice approach is highly applicable to a recommender:

Multiple recommenders can be tried and compared, being selected merely by their URL.
The approach can handle large volumes of requests, because microservices are stateless and so can be replicated, leveraging container technology or serverless technology.
The recommenders can be fine-tuned frequently, being re-deployed each time with minimal complexity.

In the interest of scalability, it is desirable to keep the footprint of a microservice small. Also, many organizations have in-house expertise in Java, and so if Java can be used, that widens the scope of who can work on the recommender.

The design

There are many technology stack choices for a Java Web application. The most popular is Spring Boot, and it is a very excellent framework. However, I will use SparkJava (not to be confused with Apache Spark) because it is extremely lightweight, and because it also has a wonderfully understandable API. Note that Spring Boot has things that SparkJava does not, such as a persistence framework, but for our machine learning microservice, the mathematical framework we will be using (Mahout—see below) has persistence, so that’s covered. We are also going to specifically address scaling in a particular way which addresses the unique needs of a recommender, which needs to perform very heavy offline data processing, so we would also not be using Spring Boot’s scaling mechanisms.

To give you an idea of how simple it is to use SparkJava to create a Web service, all I had to do is add this to the main method:

spark.Spark.port(8080);
spark.Spark.get("/recommend", "application/json", (Request request, spark.Response response) -> {
    ...handler code...
    // return any POJO message object
    return new MyResponseObject(...results...);
}, new JsonTransformer());

static class JsonTransformer implements spark.ResponseTransformer {
    private Gson gson = new Gson();
    public String render(Object responseObject) {
        return gson.toJson(responseObject);
    }
}

No weird XML jazz, no creating some funky directory structure: just call a few intuitive methods.

In order to build a recommender, one also has to decide on how to perform the statistical calculations. Again, there are many choices there, including writing your own. I chose the Apache Mahout framework because it is rich and powerful. The down side is that its documentation is fragmented and incomplete: if you use Mahout, expect to have to dig around to find things (for example, the API docs for MySQLJDBCDataModel are not with the other API docs), and expect to have to look at the framework’s source code in github. Most (but not all) of the APIs can be found here, but the API docs also do not tell you much—they are full of the notoriously unhelpful kinds of programmer comments such as “getValue() - gets the value”. Then again, it is open source, so it cannot be expected to be as well documented as, say, AWS’s APIs.

I also chose MySQL, because it is simple to set up and many people are familiar with it, and because Mahout has a driver for MySQL, so use of something like Hibernate is not necessary. (Mahout supports other database types as well, including some NoSQL databases.)

Creating a recommender with Mahout

Creating a recommender with Mahout is actually pretty simple. Consider the code below.

UserSimilarity similarity = new PearsonCorrelationSimilarity(this.model);
UserNeighborhood neighborhood =
new ThresholdUserNeighborhood(0.1, similarity, this.model);
UserBasedRecommender recommender =
new GenericUserBasedRecommender(model, neighborhood, similarity);
recommendations = recommender.recommend(2, 2);

This instantiates a recommender given a model, and implements the algorithms shown earlier. More on the model in a moment. Right now, notice the choice of similarity algorithm: PearsonCorrelationSimilarity. That similarity algorithm measures the covariance between each user’s item preferences. Some common alternative approaches are cosine similarity and Euclidean similarity: these are geometric approaches based on the idea that similar items will be “close together” in item space. The range of similarity algorithms supported by Mahout can be found here:

http://apache.github.io/mahout/0.10.1/docs/mahout-mr/org/apache/mahout/cf/taste/impl/similarity/package-frame.html

Note also the use of ThresholdUserNeighborhood. This selects users who are within a certain similarity range of each other: an alternative neighborhood algorithm is NearestUserNeighborhood, which selects the nearest set (in similarity) of a specified number of users.

Back to the model: To create a model, you need to prepare your reference data: the recommender does statistical analysis on this data to compare users, based on their preferences. For example, I created a model as follows:

File csvFile = new File("TestBasic.csv");
PrintWriter pw = new PrintWriter(csvFile);
Object[][] data = {
    {1,100,3.5},
    ...
    {10,101,2.8}
}
printData(pw, data);
pw.close();
this.model = new org.apache.mahout.cf.taste.impl.model.file.FileDataModel(csvFile);

Note that above, I used a file-based model, whereas I will use MySQL for the full example.

The data rows consist of values for [ user-id, item-id, preference ]. The first two fields are obvious; “preference” is a float value from 1 through 5 that indicates the user’s preference for the item, where 1 is the lowest preference.

To use a recommender, you have to first prepare it by allowing it to analyze the data. Thus, the general pattern for a recommender has two parts: a data analysis part, and a usage part. The analysis is usually done offline on a recurring basis, for example nightly or weekly. That part is not a microservice: it is a heavy-duty data processing program, typically running on a cluster computing platform such as Hadoop. In the sample code above, only the following line should be in the microservice:

recommendations = recommender.recommend(2, 2);

All of the lines that precede it perform the preparatory analysis: these would not be called as a microservice.

In the sample microservice that I show here, the preparatory analysis steps are performed when the microservice starts

Persisting the analyzed model—or not

For a trivial recommender, you can simply perform the analysis calculations in the main method of your application, and then publish a web service in that main method that can then use the trained model. However, that is not a very scalable approach: each deployed instance of your recommender would need to go through the data analysis.

To avoid that, you can separate the data analysis into a separate program (as shown in the architecture shown earlier: see the “Data Preparation” block), and persist the trained model to a distributed file system, such as Hadoop’s HDFS. Each microservice instance can then simply load the trained model at startup.

Mahout refers to the ability to persist a trained model as a persistence strategy. Unfortunately, at present the SVDRecommender is the only recommender in Mahout that has implemented a persistence strategy. An SVD recommender is an important class of recommender, based on a mathematical technique for identifying redundant degrees of freedom in the data and collapsing them out, so that one ends up with a more compact model. This is highly applicable for product recommenders when the product catalog is large. The mathematics for performing SVD are time intensive, and that is why a persistence strategy was implemented for the SVDRecommender. The others need one too, however.

Mahout also has an API called Refreshable: All of the Mahout Recommender classes (package org.apache.mahout.cf.taste.impl.recommender) implement the Refreshable interface, providing them with the refresh(Collection<Refreshable>) method. It is intended for updating models on the fly. It could in theory be used for injecting pre-analyzed matrices, but the current implementations do not support that: they re-compute the entire model based on updated source data. Thus, each newly launched container will have to go through a model data prep computation.

It’s not that bad: the model data prep calculations don’t take very long; SVD computation is an exception, and that is why it has a special persistence implementation. So all we need to do is call a recommender’s refresh() method with a null argument, and it will purge all derived objects, as well as the cache maintained by the MySQLJDBCDataModel class, causing it to lazily re-load data as needed to re-compute all derived objects. Newly launched containers containing the recommender will compute the derived objects from scratch.

Packaging as a container

To package the microservice as a container image, you must create a Dockerfile. (See here.)

If you create the container image based on Centos 7 and include JDK 8, the resulting image is 776MB. That’s not a very “micro” microservice. Much of the size comes from the Centos 7 Linux distro that we added to the container image. If you build from Alpine Linux and include only JRE 8, the resulting image is 468MB. The remaining size is due to two things: (1) all of the 113 third party Java jars that Maven thinks need to be added to the image, and (2) the size of the JRE.

It is a certainty that the application does not actually need all 113 jars. Maven determines what is needed based on the dependencies declared in pom files. However, the actual number of dependent jars is usually much smaller: your code usually only calls a small percent of the methods of each dependent project, and many projects have multiple jar files.

What we need is a Java method traverser, that can remove uncalled Java methods from the class files and then package all that as a single jar. I do not know of such a tool, however. The tools would also have to allow one to manually add method signatures for those that are known to be called by reflection. Or, if the tool works at runtime instead of through static analysis, it could gather that information automatically. Test tools such as Cobertura instrument the class files and track which code gets used at runtime: such a tool could easily track with methods never get called and then strip those from the class files—including those in the third party JARs. I wish someone would write a tool like that—I don’t have time to do so—we could have Java applications that are 20Mb instead of 200Mb.

I should mention that Java 9 introduces a very useful new feature for minimizing the footprint of a deployed Java application: the module system (formerly known as “Jigsaw”). The module system makes it possible to deploy only the pieces of the JDK runtime that are needed by an application, greatly reducing the deployed footprint. I did not use the module feature for this demonstration because it would not have made much of a difference: it might have saved 10Mb in Java standard modules, but none of the required external JARs are packaged as modules at this point, so our footprint would still be essentially what it is without modules.

Launching the container

In the sample code I launch MySQL and the recommender container with the Docker Compose tool, using these two Compose files:

test-bdd/docker-compose-mysql.yml
docker-compose.yml

I used Compose because it is a great tool for development. Normally you would launch the container using an orchestration system such as Kubernetes or native cloud infrastructure that provides elastic scaling. I will not go into that here because it is beyond the scope of this article, which is really about creating a recommender.

Full code

The full code for this sample microservice can be found in github at https://github.com/ScaledMarkets/recommender-tfidf

Sunday, August 27, 2017

SAFe is invaluable in a large organization DevOps or Agile transition

Don’t “do” Agile: instead, “be” Agile.

That’s the refrain of most Agile consultants when you ask them, How do I make Agile work in my large company?

The problem is, while the advice is correct, it is not very helpful. People need tangible, actionable advice. If you try to change their culture first, it won’t work: change management experts will tell you that culture is the hardest thing to change, and while behavior is a result of culture, rapid cultural change in an organization only follows directed behavioral change.

When SAFe came out circa 2011, it was met with scorn by much of the Agile community. One reason for this is that SAFe looked like a waterfall plan, with everything laid out. Agilists had learned that one cannot be so prescriptive, and so they were immediately suspicious of any big plan or all-encompassing methodology. If they had read through the SAFe documentation, however, they would have seen that it was really just a model—a way of looking at the problem. SAFe makes it very clear that contextual judgment should be applied.

Flash forward to today, and there is still some derision of SAFe in some elements of the Agile community. However, I have found that derision to come mostly from those who don’t understand SAFe. On the other hand, SAFe has helped many large organizations to get a handle on Agile.

There are risks with applying SAFe. Just as those who denounced SAFe had feared, rigid application of SAFe can be very damaging. For example, SAFe presumes that one has Agile technical practices well in hand—a cross functional team is not possible unless you have automated integration tests. But if one applies SAFe thoughtfully, it solves some big problems that team-level Agile methodologies such as Scrum do not address or even acknowledge. SAFe provides a model for thinking about what is missing from team-level Agile: i.e., it provides ideas for “How do I make Agile work in my large company?”, and therefore helps to define the discussions around what needs to change, and those discussions get you on the path to “being Agile”.

I have been doing Agile transformations for a long time. My own former company Digital Focus adopted eXtreme Programming (XP) in 2000. Since then I have been a principal thought leader or consultant in seven other Agile and DevOps transformations in very large organizations. What I have seen is that there is an enormous disconnect between the executive management of most non-IT companies and the realities that the IT groups experience. In short, the execs simply don’t have a clue how software is built or tested, or how one should manage those processes. They delegate it to “IT”, and don’t get involved. That used to work for waterfall, but it does not work for Agile, which, in the words of Gary Gruver, author of Leading the Transformation, “Executives can’t just manage this transformation with metrics.” (Kindle Locations 343-344) Today’s successful executives recognize that their technology platforms are strategic parts of their business and they make it their business to learn about those platforms. Leaders such as Jeff Bezos and Elon Musk epitomize this.

SAFe helps with that: it provides a model that identifies all of the many functions that must change to enable Agile to work—including the business functions.

The main fear of Agilists about SAFe is legitimate, however: Agile transformation is not a process change: it is a transformation of how people think about their work, and it requires a huge infusion of knowledge from the outside. It is primarily a training, mentoring, and collaborative growth activity. Thus, one cannot simply “implement” SAFe, or any organization-wide Agile process. One has to proactively grow into it. SAFe helps because it provides a taxonomy and defines the landscape of things that need to change: with SAFe, it is obvious that Agile is not just an “IT thing”, or confined to the software teams—it is organization-level in scope.

Effective Agile or DevOps transformation is not a process rollout. Effective change of that depth and magnitude requires working with managers and staff, on an individual basis, to help them to understand what Agile means for their job function, and think through how to change their processes. They need to own the change of their own processes. It takes time: as I said, it is a growth process.

Saturday, August 26, 2017

Companies waste so much money in the cloud

Cloud services are what made DevOps possible: cloud services make it possible to dynamically provision infrastructure and services, via scripts, and that is what makes continuous delivery possible. Cloud services make it possible to dynamically scale, and that is what makes it possible for organizations to economically deploy Internet-scale applications without having to operate their own data center.

Still, any tool can be misused. Cloud services enable one to scale economically, adjusting application size based on demand; but cloud services can also be over used, resulting in huge unnecessary costs. We have identified some of the way this typically happens with the goal being helping people ultimately right-size their cloud deployments.

Pay close attention to your development services and instances

Software development teams generally do not pay for their cloud accounts, and so they don’t have an incentive to manage the cost of those accounts. For example, it is not unusual for development teams to create VMs and leave them running, resulting in a large set of abandoned running VMs, running up the bill.

To avoid misuse of cloud services by development teams, you need someone who proactively manages those resources. Don’t take away the ability of teams to create resources in their cloud accounts; but instead, have someone who keeps track of what the programmers are doing, and what they are using the cloud resources for. That person should continually peruse resource usage, and ask, “What is this for? Do we need this?” Also, set up standards that make that job easier: e.g., establish a rule that all VMs should be tagged with their purpose and, if the VMs exist in a multi-project environment, also tag them with their project code - which helps to perform accounting. (Scaled Markets includes setting up such a tagging scheme as part of its Continuous Delivery Accelerator offering - see Accelerators.)

Beware third party services

Today, it is possible to outsource much of one’s application’s services, using so-called platform-as-a-service (PaaS) services, either provided by the cloud provider or by a third party. For example, instead of creating monitoring, one can use third party monitoring services, to verify that one’s app is still up. While this is convenient, it generally costs money for each check that gets performed, based on the frequency - so your app now has another taxi meter running, eating into your profits. As another example, most cloud providers now provide scalable messaging services that can feed your applications with event messages from IoT devices or from other applications. While the scalability of those services is a powerful value proposition, they generally are paid for per message - and so the bill can be quite large. Some PaaS services are priced based on their footprint, rather than on their usage: Azure’s HDInsight service is an example of that: you pay for an HDInsight cluster a fixed amount per hour, regardless of whether you are using it or not. Also, since it takes quite awhile to start up an HDInsight cluster, it is not something that you can just create on demand.

Design your apps for local testability

In a recent talk by Yunong Xiao of Netflix, about Netflix’s move to a PaaS model for the services that it provides to Netflix integrators such as Roku, Yunong mentions that they invest heavily in the ability of developers to “mock” components so that developers can perform integration testing (using mocks) before committing their code. This is something that is often lost on corporate teams that try to adopt DevOps: they find that their applications are not amenable to be stood up in a test mode in local integration test environments. In other words, their applications have not been “designed for testability”.

In order to be able to “shift left” one’s integration testing, so that it can be done upstream - ideally on developer laptops - prior to full scale integration testing by a QA team, it is necessary to design an application so that it can be stood up in a small footprint test mode. That’s what Yunong really means by “mocks”. For example, consider Apache Spark. Spark is used by Web services that need to implement extremely high volume stream processing. However, in order for a programmer to test Spark code, they don’t have to stand up a full scale Spark cluster: they can run Spark on their laptop in “standalone” mode - even in a Docker container (that is how we normally do it). One can also set the memory used by Spark: for production, one would set it to multiple Gigabytes, but for local integration testing, one can set it at something like 300M.

Use a cloud framework that can run locally

It is absolutely critical that when the architecture of a cloud-based application is conceived, that the architects and planners think through how it will be tested, and the costs involved. Again, this is design-for-testability.

For example, if your application uses PaaS services that are native to your cloud provider, can you run those services locally (in laptops or on-premises VMs)? - or are those PaaS services tied to the cloud - and therefore require you to pay for their use in each and every test environment?

This is a critical consideration, because in a continuous delivery process, testing is not only continuous, but it is also done with a-lot of parallelism. The parallelism occurs in two ways: (1) an integration test environment is often spun up whenever code is committed - often at a team or even single developer level - and, (2) regression test suites are often broken up into subsets that are run in parallel in order to reduce run time to a few hours or less. If you are paying for services being used by each test environment, you are going to be paying an awful lot of money.

That’s why frameworks such as Kubernetes, OpenShift, and Cloud Foundry let you create small local container clusters on a laptop, and there are also local “mini cloud” tools that can run on a laptop, so that you can first debug your application locally, and then deploy to a real cloud using, say Terraform. Lattice is one such “local cloud” system: in other words, you write your infrastructure code using a cloud-agnostic tool such as Terraform (which has a driver for Lattice), test it in your local cloud, and then deploy to a real cloud.

Why do this? It is so that individual developers can perform integration testing - before they commit their code changes. This is true “shift left” testing, and the result is very high developer throughput and a very short feature lead time. This is becoming standard DevOps practice today. However, for your developers to be able to do this, they need to be able to run the cloud services locally, and ideally for free. If you think that your application is too complex to ever run locally, think again: a typical laptop can run 20 containers without any problem. If you can’t run your integration tests locally, it is very likely that that is because your apps were not designed to be able to run locally in a test mode - and you can fix that with some re-engineering, and save a-lot of money.

Summary

Cloud services have enormous value for deploying flexible and scalable applications and services. However, it is essential to consider how those will be tested, and the costs involved. Applications must be designed to be testable; if you don’t carefully design a testing approach that economically uses your cloud services, and an application architecture that makes it possible to test economically (e.g., locally), then the cloud services can rapidly become a financial burden.

Cliff & Phil

Friday, July 28, 2017

Why DevOps implies microservices

The microservice design approach pertains to architecture, whereas DevOps pertains to how you build, test, and deliver—these should therefore be separate considerations, right?

No. Back in the 1980s I worked in the microchip design and testing field, and it was essential—even back then—to design chips in a way that made them testable: "design-for-testability" was—and is to this day—a major consideration for design. In fact, a good percentage of a chip's circuitry was there only to enable the chip to be tested.

This is not new, and designing software for testability is even more important today than it was only a decade ago, because today, if you want to "shift left" your integration testing, you will need to be able to create small-footprint transient integration test environments. That means that you are deploying your apps again and again, and if your apps are big and monolithic, then not only will deployment take a long time, but the apps will also have a large footprint, which translates into money.

Consider, for example, that one of your application components is a large SOA based system that provides an "information layer", and that this SOA layer can only be deployed as a single unit—i.e., all-or-nothing. If one of your teams is building an app that uses that SOA layer, then in order for the team to create an integration test environment on demand, they will have to dynamically deploy the entire SOA based system into their environment.

Why do they need to do this to test? They need to because they want to test changes to their app, and not disturb anyone else. That's why they need a test environment of their own. It only needs to be their own for the duration of their test. They can't use a shared test instance of the SOA layer, because the SOA layer contains databases, and therefore is stateful, and so the tests change the SOA layer. (Even worse, the features being tested might even included code changes to the SOA layer.) Any change to the SOA layer would affect other teams that are using that SOA layer for their testing, and that is why a test environment must be exclusive to the test agent—it cannot be shared (at least for the duration of the test run). In other words, a test environment needs to be isolated.

Many DevOps teams achieve isolation by spinning up an environment on demand and deploying all of the various application components to that environment, running the tests, and then destroying the environment. Thus, if one or more of the components is a large, monolithic system, it is difficult and expensive to use a dynamic environment testing strategy. This is a very significant handicap.

Microservices are small, independently deployable components, and so when one needs to perform an integration test that involves microservices, one can select only those needed for the test, deploy them, run the tests, and then destroy. Of course, one must consider dependencies among the microservices: that is part of the test planning that goes into the development of a feature or story. Essentially, one must design what the "test bench" should be for the system-under-test—i.e., what other components are needed besides the component that has been modified and is being tested, to enable the test to be performed.

The small footprint and independent deployability of microservices is therefore a major enabler for shift-left integration testing. If one has monolithic components, then shifting integration testing left, to the team (or to the individual developer) is very difficult, and one usually has to fall back to a shared integration test process whereby all components are deployed at regular intervals into that environment and tests are run. In such a process, there are usually a-lot of failed tests—the test run does not stay "green", and so determining the cause of failures is difficult. Also, in that approach, it is necessary to use "feature toggles" to turn off features that are not yet complete, since incompletely integrated features will cause other tests to fail that used to pass, putting those features in doubt. Using feature toggles is complicated, and it is a messy situation. That is why DevOps teams try to "shift left" and perform early integration testing before they make a feature visible to the downstream test pipeline; but to do that, your components need to be small enough to be able to deploy them frequently into local integration test environments.

Saturday, June 24, 2017

Why DEV, INT, and QA environments get in the way of DevOps

Organizations that develop software often have static development and test environments, with names like "DEV", "INT", "QA", and so on. These are usually statically provisioned environments. They fit the paradigm of "servers as pets and cattle".

DevOps does not work that way, and if you continue to use names like that, you will find it very hard to transition to a DevOps approach. The modern paradigm is to think of servers as something that get created on demand. In fact, if you use "platform-as-a-service" or "serverless" services from your cloud provider, you don't even worry about servers (or containers)—the cloud pushes that behind the scenes for you, and all you worry about is your application.

The problem is, many organizations have deeply embedded their internal "DEV", "INT", "QA" terminology into their governance processes and their software development and testing methodologies, and getting them out is extremely difficult. Thus, if one wants to, say, stand up an environment on demand for testing, whenever needed, people in the organization ask, "Are you doing that in INT, or in QA?" Such a question makes no sense, because in a modern development pipeline there is no INT or QA environment: environments are created on demand. You might characterize a test suite as an "INT" or "QA" test suite, and you might even characterize the environment template for that test as a "INT" or "QA" environment template, but the environment itself does not exist until you deploy to it for testing. Also, you should not leave test environments sitting around after you have used them: doing that is expensive and wasteful.

All this sounds pretty harmless, but it is not—far from it. Static test environments really cannot support the modern version of Agile processes, which rely on a feature-based approach to development. Consider the gold standard of a traditional Agile development process: continuous integration, aka "CI". CI requires that the CI job—perhaps defined in a Jenkins server—has an environment that it alone has control of; otherwise, the tests that the CI job runs will not be reproducible: if someone else is doing things in that environment (e.g., doing manual testing, which leaves behind database data), then a CI test might pass one time, but run again it might fail—even if the application code didn't change. The CI process becomes untrustworthy.

Historically, development teams have run only their unit tests in their CI environment. That's fine for a single-team project; but things get sketchy really fast if you have multiple teams that are building a large system that has many different components. For integration testing of how the many components work together, Agile projects have historically used a single project-wide statically provisioned integration test environment. Because there was only one such environment, access to it had to be scheduled, or restricted to the "integration test team". If you think about that, it is really a waterfall process, wrapped around all of the project teams' individual Agile processes. Integration testing done this way is single-threaded. So when a programmer makes a change, they can't tell if they have broken something in the system as a whole until the integration test team runs the integration tests. If those tests are manual, then it might be days (or weeks, if integration test deployment is manual) before the programmer gets feedback. That's not very Agile.

This was as good as we could do until cloud computing came along. Clouds made it possible to stand up entire test environments on demand. That made it possible to "shift left" the integration testing—to move it into the CI cycle (or even better, to the individual programmer), so that the team's Jenkins job runs integration tests. To be clear, in this approach each team's Jenkins job deploys the entire system (isolated from any other team's) and then runs system-wide integration tests against that. (See this article for more information on how continuous delivery methods revise traditional Agile practices.)

This is where the DEV, INT, and QA thinking gets in the way. If the organization has a QA group, they will want a "QA environment" where they can run their integration tests. But you say, "We create that on demand", and then they get confused, because tradition has linked three things that are actually independent: (1) a class of tests (integration tests), (2) a place where they get executed; and (3) who creates or performs those tests.

It gets worse. Because people equate these things in their thinking, they can't get their head around the idea that DevOps teams don't "push code to an environment", and for that reason, they also can't understand how DevOps teams can perform integration tests on features that have not yet been merged into the main codebase. This is because static environment thinking says that everyone has to "put their code somewhere", and that implies that all the features for the release are present in the code—otherwise, many tests will not pass. But DevOps teams are usually set up as "feature teams", meaning that they work on a cross-component feature at a time, modifying any of the system's components, and they integration test that cross-component feature. That integration testing happens before they merge the feature's code changes into the shared development code branch for each component. Thus, they don't "put their code into an environment". Rather, they integration test a set of features changes—spanning multiple components—and then merge the changes. Then, if another team wants to integration test (either another development team, or a QA team), they pull the latest merged code, and they will obtain only completed (working) features.

Sometimes I hear cloud vendors speak in terms of "creating your INT and QA environments", and when I ask them about it, they say that they are trying to "bridge to something the customer understands". However, that is holding those customers back. The DEV, INT, QA terminology and thinking is a major impediment to letting go of the static paradigm, and letting that go is foundational for understanding DevOps. Start to think of "INT" and "QA" as kinds of environment—not as specific environments that are sitting there, waiting for you. Even better, start to think in terms of kinds of tests that you need to run, and the environment configurations that you need for each of those, and then create a template or script to provision each of those types of environment on demand. See this article series for more on defining a testing strategy.

Saturday, January 21, 2017

Inserting DevOps Into a Not-Very-Agile Organization

This is an account of a recent experience at a very large company that has a mix of traditional (waterfall) and some partly Agile IT projects, and a tiny bit of DevOps here and there. Tricia Ratliff and I were asked to take a new project and help the project's manager to “do Agile right”—meaning use DevOps and anything else that makes sense.

It worked. In two months, a team that had almost no Agile experience and no knowledge of any automation tools was able to,

Learn Acceptance Test-Driven Development (ATDD) and the associated tools (Cucumber, Selenium).
Learn to use virtualization and Linux (Docker) containers, both on their laptops and in our data center.
Learn to use OpenShift—Red Hat's enhanced version of the Kubernetes container orchestration framework.
Learn a NoSQL database (Cassandra—project has since switched to MongoDB), having no prior experience with NoSQL databases.
Get productive and start delivering features in two-week Agile iterations, using a fully “left-shifted” testing process in which developers get all integration tests to pass before they merge their code into the main development branch.

All team members received all training: e.g., analysts learned all of the same tools that testers and developers learned, and everyone received access to all repositories and servers (TeamForge, Jenkins, OpenShift, and our container image registry). This was actually very important, because it established very early a relationship between the analysts, testers and developers, since the developers helped the analysts and testers to set up and start using the tools, including git, GitEye, Eclipse, etc. That relationship is what makes things work so well on this team.

The Benefits We Have Seen

One of the benefits we have seen is that the team is not dependent on anyone or anything outside our team except for our git server (TeamForge). For example, our OpenShift cluster and Jenkins server were down for two weeks awhile back, but the team was able to continue working and delivering completed stories, because they were able to perform integration tests locally on their laptops.

Another benefit that we have seen is that work is going faster and faster, as the team becomes more proficient in using the tools and refines its process. It is almost impossible to compare the productivity of different Agile teams, but I personally estimate that this team is at least two times as productive as the other non-DevOps “Agile” team that I am working with at the same customer location—even though this team's application is more difficult to test because it has a UI, whereas the other team's application does not. (I actually think that the DevOps team might be three or four times as productive as other teams at this client.)

What makes the team so productive is that the team is able to work in a “red-green cycle” (see here, and here), whereby they code, test, code, test, code, and test until the tests pass, and then move on to the next Agile story. The red-green cycle is known to the Test-Driven Development (TDD) community, but we use it for integration testing, locally (on our laptops). It is made possible by the use of the ATDD process, in which automated tests are written before the associated application code. (See here for a comparison of TDD and ATDD.) I will note that the team's application is a Web application, with automated tests written for each bit of UI functionality before the associated application code is written.

Figure 1: The red-green cycle of test-first development.

This is important: the ability of developers to use a red-green cycle is a game changer, in terms of productivity, code quality, and code maintainability; and we are doing this for integration testing—shifting it “left” into a red-green cycle on the developer's laptop.

How We Did It

The shift to proper Agile and DevOps is not a small adjustment or substitution. It is a very large set of interdependent changes, involving new methods, new mindsets, and new tools. More on this later.

It is important to emphasize that there is no single one way to do DevOps. By its nature, all of this is contextual. It is not possible to define a “standard process” for how teams should provision, develop/test and deploy. However, it is possible to devise common patterns that can be re-used, as long as a team feels free to adjust the pattern. If you want to have a learning organization—and having a learning organization is essential for effective Agile and DevOps—then teams need to have the ability to experiment and to tailor their process.

We did not use “baby steps” in our adoption of DevOps for this project. Rather, we undertook to change everything from the outset, since ours was a green field project, and we did not want to entrench legacy practices. We used an Agile coach (Tricia) and DevOps coach (me) to explain the new techniques to the team and make a compelling case for each engineering practice, but nothing was mandated. The coaches facilitated discussions on the various topics, and the team developed its own processes as a group.

By far the most impactful thing we did was to establish a complete end-to-end testing strategy from the outset. We included the entire team and all extended team members in the development of this strategy. The team consisted of all of the developers and testers (including technical lead and test lead), as well as the team's manager. The extended team members who we included were the team's application architect, an expert from the organization's test automation group, and testing managers representing integration testing and user acceptance testing. (We would have liked to include someone from the infrastructure side, but at that time it was not clear who that should be.) Devising the testing strategy began with a four-hour session at a whiteboard. The output of that session was a table of testing categories, with columns as follows: (1) category of test, (2) who writes the tests, (3) where/when the tests will be run, and (4) how we plan to measure sufficiency or “coverage” for the tests. (The format of this planning artifact was based on prior work of mine and others, which is described in this four-part article.)

We did not try to fill in all of the cells in our testing strategy table in our first session, but having the rows defined provided the foundation for our DevOps “pipeline”. Some of the testing categories that we included in our table were (a) unit tests, (b) behavioral integration tests, (c) end-to-end use case level tests, (d) failure recovery tests, (e) performance and stress tests, (f) security tests and scans, (g) browser compatibility tests, and (h) exploratory tests. Our “shift left” testing philosophy dictated that we run every kind of test as early as possible—on our laptops if it makes sense—rather than waiting for it to be run only downstream by a Jenkins job. In practice, we run the unit tests, behavioral integration tests, use case level tests, and basic security scans on our laptops before committing code to the main branch, which triggers a Jenkins “continuous integration” (CI) job that re-runs the unit tests, builds a container image, deploys a system instance (service/pod) to our OpenShift cluster, and then re-runs the behavioral and end-to-end tests against that system instance. Our Jenkins CI job therefore verifies that the system, as deployed, will work the same as the system that we tested on our laptops, because we use the same OpenShift template for deploying a test instance as we use for deploying a production instance. We plan to add Jenkins jobs for non-functional testing such as performance testing, deep security scanning and automated penetration testing: these will each begin by deploying a system instance to test, and will end by destroying that system instance: thus, no system instance is re-used across multiple tests, and we can perform any of these tests in parallel.

The issue of running tests locally is an important one. Too often DevOps is depicted or described as a sequence of tests run by a server such as Jenkins. However, that model re-creates batch processing, whereby developers submit jobs and wait. Shifting left is about avoiding the waiting: it is about creating a red-green cycle for each developer. Thus, real DevOps is actually about shortening that sequence of Jenkins jobs—perhaps even eliminating it. In an ideal DevOps process, there would be no Jenkins jobs: those jobs are a necessary evil, and they exist only because some tests are impractical to perform on one's laptop.

We spent two months preparing for our first Agile development iteration. During the two month startup period we requested tools, defined our work process, met with business and technical stakeholders, and received training. Most meetings involved the entire team, although there were also many one-on-one behind-the-scenes meetings with various stakeholders to talk through concerns. We arranged for an all-day all-team hands-on training on OpenShift from Red Hat, and we also arranged for a four-hour hands-on Cucumber and Selenium training session from an internal test automation group. Both of those training sessions were essential, and we saw an immense jump in overall understanding among the team after each of those training sessions.

However, it was not all smooth.

Obstacles We Overcame

We encountered many institutional obstacles. Our project manager was committed to using a DevOps approach, and was key because he was always ready to escalate an issue and get blockers removed. One of the myths in the Agile community is that project managers are not needed in Agile projects; yet in my experience a project manager is extremely important if the setting is a large IT organization with a-lot of centralization, because a project manager has indisputable authority within the organization and a person with authority is needed to advocate effectively for the team.

One mistake we made is that we did not make it clear to the Testing resources unit that the test programmers would need to arrive at the same time as our developers. The late arrival of the test programmers was a significant problem during the initial iterations, because when they arrived they were not up to speed on how we were doing things or on the application stories that the team and Product Owner had collaboratively developed. This problem was exacerbated by the fact that the new arrivals could not participate in the actual work until they had access to the team’s source code repo, but per organization policy they could not be granted access to that until they had had corporate git training, and there was no git training scheduled until the next month. Fortunately the project manager was eventually able to escalate this issue and get the test programmers git access, but during iteration one the application developers wrote most of the automated tests because the test programmers did not have git access.

Bureaucratic obstacles like this were the norm, and so it was extremely important that the team coaches and project manager stay on top of what was in the way each day, and escalate issues to the appropriate manager, explaining that in an Agile project such as ours, a two week delay was an entire iteration, and so obstacles had to be removed in hours or days—not weeks. Some of the functional areas that we had extended conversations with were application architecture, systems engineering, data architecture, the infrastructure engineering team, and others, because not only did each of these functions have authority over different aspects of our project pipeline, but we needed their help to enable us to connect our application into the enterprise infrastructure.

For example, I recall explaining to the assigned infrastructure engineer that we did not need “an environment”, nor did we need for him to install an application server for us, since we were going to receive access to a cluster that was being created, into which we would dynamically create containers built from base images that contain all of the components that we need (such as an application server), and it took three conversations before he grasped that. I still have trouble explaining to people in the organization that this team does not have any “environments” because we use dynamic environment provisioning via an OpenShift cluster. This has presented communication challenges and policy confusion because much of the organization's procedures and governance rules are phrased in terms of control of environment types, such as the “integration test environment” and the “user acceptance environment”. For us, however, an environment does not exist until we perform a deployment, and then after we run the tests, we destroy the environment to free up the resources.

Some of the tools that we needed were not available right away, because various enterprise groups were busy setting them up or testing them for security, so team members had to download them at home to try them out. These external groups included (1) the new OpenShift cluster team, (2) the Infrastructure Engineering team which was creating the JBoss base image that we were to use; (3) the team that manages the internal software library repository; and (4) the team that was setting up the container image registry (Artifactory) that we were to use. Once they understood what we were trying to do, all of these groups did their best to get us what we needed, but it took time since these things were being done for the first time.

Another significant challenge was that the organization requires its software developers to work on its “production” network, using “production” laptops—a practice that makes it very difficult for software engineers who need to be able to download new tools on a frequent basis to try them out: one is not allowed to download software from the Internet into the production network. (I have advocated for having a less secure sandbox network that all developers can access via a remote desktop.) It also turned out that the laptops, which use Windows, have a browser security tool called Bromium which uses virtualization, and it was preventing our team from being able to run virtual machines on their laptops—something that is required to be able to run Linux containers under Windows. We had to have Bromium technical support spend several weeks at our site, coming up with a configuration that would allow our team to run virtual machines. Diagnosing that problem and arranging for the solution delayed the team's learning about Linux containers and OpenShift by almost two months. (Also, using Windows laptops for building software that will be deployed in Linux containers makes absolutely no sense, and has the side effect that developers do not become familiar with the target environment—Linux).

DevOps Is Not a Small Change

I mentioned earlier that the shift to proper Agile and DevOps is not a small adjustment or substitution. This is interesting because we asked for some changes to the IT process controls in order to enable us to do things the way we wanted to do them. From the point of view of the controls group, our process was almost the same as the standard process, but that is only because the IT control view of a software development process is so disconnected from the reality: our process is vastly different from how other Agile teams at this client work. Some of the differences are,

We created an integrated automated testing strategy without handoffs: other Agile projects at this customer have a sequential testing process, with different teams responsible for each.
We use what is called a “build once, deploy many” approach, whereby we build a deployable image and store that, and that is deployed automatically to successive test environments for automated testing—many times per day.
Developer laptops are their “DEV” environment—we do not have a shared “DEV” environment as other Agile teams at this client do. Thus, each developer tests with their own full stack, including their own test database instance, eliminating interference between testing by developers.
We replaced manual integration testing with an industry standard “continuous integration” (CI) automated testing approach. This is unusual for this client.
Analysts and testers write automated test specs using the well known Cucumber tool—they don’t run tests. The analysts focus mainly on the logical functions of the application, and the testers focus on nuances such as error cases and data requirements, and encode their understanding in the Cucumber test specs (feature files). This is widely known in the industry as Acceptance Test-Driven Development (ATDD).
Test programmers and developers write automated tests that implement the test specs.
Team members are allowed to switch roles, and one of our analysts has written test code. The only restriction that we have is that if you have written the test code for a story, someone else must write the application code for that story. This ensures that for a story to pass its tests, two different people must have the same understanding of the story's requirements.
Our test lead supervises this process to make sure that this rule is followed, and that tests are written in a timely manner. Our Scrum Master makes sure that test coding and app coding tasks are maintained on our physical planning wall, as well as in our Agile tool (VersionOne).
As a result of the above, a developer never waits for a tester: tests are written and developers run the tests—not the testers. Thus, a developer codes a story, runs tests, fixes the defects, re-runs the tests, and so on until there are no defects. In other words, we use a red/green cycle.
The organization's “User Acceptance Testing” (UAT) team is a testing partner—not a handoff. They provided two members of our team, and those testers write our end-to-end use case level tests, which get run alongside all of our other tests.
UAT adds its tests to the project repo, and so the developers can run the UAT tests locally on their laptops and in their CI environment—there is no delay.
The development team writes its own CI build scripts. Other projects at this client typically receive their Jenkins build scripts from an integration team.
The development team writes the production deployment OpenShift templates and scripts, tests those, and uses those to deploy to each test environment.
All functional tests are run in each environment type (laptop, CI).
Developers run security scans and code quality scans locally before they check in their code, and review the scan results produced by the CI build as an integrated task for the development of each story.
All of our CI test environments are created from scratch for each test run, using the OpenShift template that the team coded. Thus, CI tests are always “clean”.
The CI test database is cleared and data is loaded from scratch prior to each behavioral test case. Thus, there is never an issue with data left over from a prior test, or with a tester waiting for someone to be done testing a database table. (Eventually we plan to integrate the database into our OpenShift pod configuration, so that a fresh database will be deployed each time we deploy the application for a test run, but we are waiting for an approved container image for our database.)
We don't have a DBA on our team, because we are using a NoSQL database, which has no schema. We work with an analyst who maintains a logical data model, but the model is maintained during development—not ahead of time. Our CI tests are all regression tests, so any breaks caused by changes to the data model are caught immediately.
We have a very minimal need for “defect management”, because most tests are automated, and so most test results are visible in a dashboard in our Jenkins server. A developer does not check in a story’s code until it passes all of the behavioral tests locally for that story, and all affected tests are still passing. However, our exploratory testing is manual (by definition), and we record issues found during those test sessions. Exploratory testing is for the purpose of discovering things that the test designers did not anticipate, as well as for assessing the overall usability of the application.
We don’t accept any story as done unless it has zero known defects. Thus, our build images that are marked as deployable typically have zero known defects.
Everyone on the team (analyst, tester programmer, developer, coaches, Scrum Master, PM) has write access to all of the team’s tools and repositories (git repos, images, Jenkins project, OpenShift project, VersionOne project, Sharepoint project), and everyone received training in all of those tools. However, we feel we have devised a secure process that leverages the built-in secure change history that these tools provide.
Our Product Owner reviews test scenarios as written in the “Cucumber” test specs, to ensure that they meet the intent of the associated Agile story's acceptance criteria. To do this, the Product Owner accesses test specs from the git source code control system. There are no spreadsheets—all artifacts are “executable”. To learn how to do this, the Product Owner—who is a manager who works in a business area—attended git training.
Our iterations are two weeks—automated testing makes a three week iteration unnecessary: three weeks is the norm for other Agile teams at this client. Yet, our team seem to produce significantly more completed work per iteration than other teams.
We only need minimal data item level tests, since our acceptance test driven process, for which we measure coverage, actually covers data items.
We revise governance artifacts during each iteration, so that they are always up to date.

These are not small changes, and when you put them all together, it is huge; yet, we were able to implement all of this in a very short period of time, starting with a team that had no experience with any of these techniques. It is also important to note that we did not receive “exceptions” for our processes—we have official approval for everything we did. This shows that it can be done, and that adopting these approaches does not have to be gradual: but to do it requires a commitment from the project's management, as well as the insertion of people who know how these techniques work.