Sunday, August 27, 2017

SAFe is invaluable in a large organization DevOps or Agile transition

Don’t “do” Agile: instead, “be” Agile.

That’s the refrain of most Agile consultants when you ask them, How do I make Agile work in my large company?

The problem is, while the advice is correct, it is not very helpful. People need tangible, actionable advice. If you try to change their culture first, it won’t work: change management experts will tell you that culture is the hardest thing to change, and while behavior is a result of culture, rapid cultural change in an organization only follows directed behavioral change.

When SAFe came out circa 2011, it was met with scorn by much of the Agile community. One reason for this is that SAFe looked like a waterfall plan, with everything laid out. Agilists had learned that one cannot be so prescriptive, and so they were immediately suspicious of any big plan or all-encompassing methodology. If they had read through the SAFe documentation, however, they would have seen that it was really just a model—a way of looking at the problem. SAFe makes it very clear that contextual judgment should be applied.

Flash forward to today, and there is still some derision of SAFe in some elements of the Agile community. However, I have found that derision to come mostly from those who don’t understand SAFe. On the other hand, SAFe has helped many large organizations to get a handle on Agile.

There are risks with applying SAFe. Just as those who denounced SAFe had feared, rigid application of SAFe can be very damaging. For example, SAFe presumes that one has Agile technical practices well in hand—a cross functional team is not possible unless you have automated integration tests. But if one applies SAFe thoughtfully, it solves some big problems that team-level Agile methodologies such as Scrum do not address or even acknowledge. SAFe provides a model for thinking about what is missing from team-level Agile: i.e., it provides ideas for “How do I make Agile work in my large company?”, and therefore helps to define the discussions around what needs to change, and those discussions get you on the path to “being Agile”.

I have been doing Agile transformations for a long time. My own former company Digital Focus adopted eXtreme Programming (XP) in 2000. Since then I have been a principal thought leader or consultant in seven other Agile and DevOps transformations in very large organizations. What I have seen is that there is an enormous disconnect between the executive management of most non-IT companies and the realities that the IT groups experience. In short, the execs simply don’t have a clue how software is built or tested, or how one should manage those processes. They delegate it to “IT”, and don’t get involved. That used to work for waterfall, but it does not work for Agile, which, in the words of Gary Gruver, author of Leading the Transformation, “Executives can’t just manage this transformation with metrics.” (Kindle Locations 343-344) Today’s successful executives recognize that their technology platforms are strategic parts of their business and they make it their business to learn about those platforms. Leaders such as Jeff Bezos and Elon Musk epitomize this.

SAFe helps with that: it provides a model that identifies all of the many functions that must change to enable Agile to work—including the business functions.

The main fear of Agilists about SAFe is legitimate, however: Agile transformation is not a process change: it is a transformation of how people think about their work, and it requires a huge infusion of knowledge from the outside. It is primarily a training, mentoring, and collaborative growth activity. Thus, one cannot simply “implement” SAFe, or any organization-wide Agile process. One has to proactively grow into it. SAFe helps because it provides a taxonomy and defines the landscape of things that need to change: with SAFe, it is obvious that Agile is not just an “IT thing”, or confined to the software teams—it is organization-level in scope.

Effective Agile or DevOps transformation is not a process rollout. Effective change of that depth and magnitude requires working with managers and staff, on an individual basis, to help them to understand what Agile means for their job function, and think through how to change their processes. They need to own the change of their own processes. It takes time: as I said, it is a growth process.

Saturday, August 26, 2017

Companies waste so much money in the cloud

Cloud services are what made DevOps possible: cloud services make it possible to dynamically provision infrastructure and services, via scripts, and that is what makes continuous delivery possible. Cloud services make it possible to dynamically scale, and that is what makes it possible for organizations to economically deploy Internet-scale applications without having to operate their own data center.

Still, any tool can be misused. Cloud services enable one to scale economically, adjusting application size based on demand; but cloud services can also be over used, resulting in huge unnecessary costs.  We have identified some of the way this typically happens with the goal being helping people ultimately right-size their cloud deployments.

Pay close attention to your development services and instances

Software development teams generally do not pay for their cloud accounts, and so they don’t have an incentive to manage the cost of those accounts. For example, it is not unusual for development teams to create VMs and leave them running, resulting in a large set of abandoned running VMs, running up the bill.

To avoid misuse of cloud services by development teams, you need someone who proactively manages those resources. Don’t take away the ability of teams to create resources in their cloud accounts; but instead, have someone who keeps track of what the programmers are doing, and what they are using the cloud resources for. That person should continually peruse resource usage, and ask, “What is this for? Do we need this?” Also, set up standards that make that job easier: e.g., establish a rule that all VMs should be tagged with their purpose and, if the VMs exist in a multi-project environment, also tag them with their project code - which helps to perform accounting. (Scaled Markets includes setting up such a tagging scheme as part of its Continuous Delivery Accelerator offering - see Accelerators.)

Beware third party services

Today, it is possible to outsource much of one’s application’s services, using so-called platform-as-a-service (PaaS) services, either provided by the cloud provider or by a third party. For example, instead of creating monitoring, one can use third party monitoring services, to verify that one’s app is still up. While this is convenient, it generally costs money for each check that gets performed, based on the frequency - so your app now has another taxi meter running, eating into your profits. As another example, most cloud providers now provide scalable messaging services that can feed your applications with event messages from IoT devices or from other applications. While the scalability of those services is a powerful value proposition, they generally are paid for per message - and so the bill can be quite large. Some PaaS services are priced based on their footprint, rather than on their usage: Azure’s HDInsight service is an example of that: you pay for an HDInsight cluster a fixed amount per hour, regardless of whether you are using it or not. Also, since it takes quite awhile to start up an HDInsight cluster, it is not something that you can just create on demand.

Design your apps for local testability

In a recent talk by Yunong Xiao of Netflix, about Netflix’s move to a PaaS model for the services that it provides to Netflix integrators such as Roku, Yunong mentions that they invest heavily in the ability of developers to “mock” components so that developers can perform integration testing (using mocks) before committing their code. This is something that is often lost on corporate teams that try to adopt DevOps: they find that their applications are not amenable to be stood up in a test mode in local integration test environments. In other words, their applications have not been “designed for testability”.

In order to be able to “shift left” one’s integration testing, so that it can be done upstream - ideally on developer laptops - prior to full scale integration testing by a QA team, it is necessary to design an application so that it can be stood up in a small footprint test mode. That’s what Yunong really means by “mocks”. For example, consider Apache Spark. Spark is used by Web services that need to implement extremely high volume stream processing. However, in order for a programmer to test Spark code, they don’t have to stand up a full scale Spark cluster: they can run Spark on their laptop in “standalone” mode - even in a Docker container (that is how we normally do it). One can also set the memory used by Spark: for production, one would set it to multiple Gigabytes, but for local integration testing, one can set it at something like 300M.

Use a cloud framework that can run locally

It is absolutely critical that when the architecture of a cloud-based application is conceived, that the architects and planners think through how it will be tested, and the costs involved. Again, this is design-for-testability.

For example, if your application uses PaaS services that are native to your cloud provider, can you run those services locally (in laptops or on-premises VMs)? - or are those PaaS services tied to the cloud - and therefore require you to pay for their use in each and every test environment?

This is a critical consideration, because in a continuous delivery process, testing is not only continuous, but it is also done with a-lot of parallelism. The parallelism occurs in two ways: (1) an integration test environment is often spun up whenever code is committed - often at a team or even single developer level - and, (2) regression test suites are often broken up into subsets that are run in parallel in order to reduce run time to a few hours or less. If you are paying for services being used by each test environment, you are going to be paying an awful lot of money.

That’s why frameworks such as Kubernetes, OpenShift, and Cloud Foundry let you create small local container clusters on a laptop, and there are also local “mini cloud” tools that can run on a laptop, so that you can first debug your application locally, and then deploy to a real cloud using, say Terraform. Lattice is one such “local cloud” system: in other words, you write your infrastructure code using a cloud-agnostic tool such as Terraform (which has a driver for Lattice), test it in your local cloud, and then deploy to a real cloud.

Why do this? It is so that individual developers can perform integration testing - before they commit their code changes. This is true “shift left” testing, and the result is very high developer throughput and a very short feature lead time. This is becoming standard DevOps practice today. However, for your developers to be able to do this, they need to be able to run the cloud services locally, and ideally for free. If you think that your application is too complex to ever run locally, think again: a typical laptop can run 20 containers without any problem. If you can’t run your integration tests locally, it is very likely that that is because your apps were not designed to be able to run locally in a test mode - and you can fix that with some re-engineering, and save a-lot of money.

Summary

Cloud services have enormous value for deploying flexible and scalable applications and services. However, it is essential to consider how those will be tested, and the costs involved. Applications must be designed to be testable; if you don’t carefully design a testing approach that economically uses your cloud services, and an application architecture that makes it possible to test economically (e.g., locally), then the cloud services can rapidly become a financial burden.

Cliff & Phil