Saturday, June 24, 2017

Why DEV, INT, and QA environments get in the way of DevOps

Organizations that develop software often have static development and test environments, with names like "DEV", "INT", "QA", and so on. These are usually statically provisioned environments. They fit the paradigm of "servers as pets and cattle".

DevOps does not work that way, and if you continue to use names like that, you will find it very hard to transition to a DevOps approach. The modern paradigm is to think of servers as something that get created on demand. In fact, if you use "platform-as-a-service" or "serverless" services from your cloud provider, you don't even worry about servers (or containers)—the cloud pushes that behind the scenes for you, and all you worry about is your application.

The problem is, many organizations have deeply embedded their internal "DEV", "INT", "QA" terminology into their governance processes and their software development and testing methodologies, and getting them out is extremely difficult. Thus, if one wants to, say, stand up an environment on demand for testing, whenever needed, people in the organization ask, "Are you doing that in INT, or in QA?" Such a question makes no sense, because in a modern development pipeline there is no INT or QA environment: environments are created on demand. You might characterize a test suite as an "INT" or "QA" test suite, and you might even characterize the environment template for that test as a "INT" or "QA" environment template, but the environment itself does not exist until you deploy to it for testing. Also, you should not leave test environments sitting around after you have used them: doing that is expensive and wasteful.

All this sounds pretty harmless, but it is not—far from it. Static test environments really cannot support the modern version of Agile processes, which rely on a feature-based approach to development. Consider the gold standard of a traditional Agile development process: continuous integration, aka "CI". CI requires that the CI job—perhaps defined in a Jenkins server—has an environment that it alone has control of; otherwise, the tests that the CI job runs will not be reproducible: if someone else is doing things in that environment (e.g., doing manual testing, which leaves behind database data), then a CI test might pass one time, but run again it might fail—even if the application code didn't change. The CI process becomes untrustworthy.

Historically, development teams have run only their unit tests in their CI environment. That's fine for a single-team project; but things get sketchy really fast if you have multiple teams that are building a large system that has many different components. For integration testing of how the many components work together, Agile projects have historically used a single project-wide statically provisioned integration test environment. Because there was only one such environment, access to it had to be scheduled, or restricted to the "integration test team". If you think about that, it is really a waterfall process, wrapped around all of the project teams' individual Agile processes. Integration testing done this way is single-threaded. So when a programmer makes a change, they can't tell if they have broken something in the system as a whole until the integration test team runs the integration tests. If those tests are manual, then it might be days (or weeks, if integration test deployment is manual) before the programmer gets feedback. That's not very Agile.

This was as good as we could do until cloud computing came along. Clouds made it possible to stand up entire test environments on demand. That made it possible to "shift left" the integration testing—to move it into the CI cycle (or even better, to the individual programmer), so that the team's Jenkins job runs integration tests. To be clear, in this approach each team's Jenkins job deploys the entire system (isolated from any other team's) and then runs system-wide integration tests against that. (See this article for more information on how continuous delivery methods revise traditional Agile practices.)

This is where the DEV, INT, and QA thinking gets in the way. If the organization has a QA group, they will want a "QA environment" where they can run their integration tests. But you say, "We create that on demand", and then they get confused, because tradition has linked three things that are actually independent: (1) a class of tests (integration tests), (2) a place where they get executed; and (3) who creates or performs those tests.

It gets worse. Because people equate these things in their thinking, they can't get their head around the idea that DevOps teams don't "push code to an environment", and for that reason, they also can't understand how DevOps teams can perform integration tests on features that have not yet been merged into the main codebase. This is because static environment thinking says that everyone has to "put their code somewhere", and that implies that all the features for the release are present in the code—otherwise, many tests will not pass. But DevOps teams are usually set up as "feature teams", meaning that they work on a cross-component feature at a time, modifying any of the system's components, and they integration test that cross-component feature. That integration testing happens before they merge the feature's code changes into the shared development code branch for each component. Thus, they don't "put their code into an environment". Rather, they integration test a set of features changes—spanning multiple components—and then merge the changes. Then, if another team wants to integration test (either another development team, or a QA team), they pull the latest merged code, and they will obtain only completed (working) features.

Sometimes I hear cloud vendors speak in terms of "creating your INT and QA environments", and when I ask them about it, they say that they are trying to "bridge to something the customer understands". However, that is holding those customers back. The DEV, INT, QA terminology and thinking is a major impediment to letting go of the static paradigm, and letting that go is foundational for understanding DevOps. Start to think of "INT" and "QA" as kinds of environment—not as specific environments that are sitting there, waiting for you. Even better, start to think in terms of kinds of tests that you need to run, and the environment configurations that you need for each of those, and then create a template or script to provision each of those types of environment on demand. See this article series for more on defining a testing strategy.