Pushing the envelope: February 2018

Many organizations today are using deep learning and other machine learning techniques to analyze customer behavior, recommend products to customers, detect fraud and other patterns, and generally use data to improve their business.
Still more organizations are dabbling with machine learning, but have difficulty moving those experiments into the mainstream of their IT processes: that is, how do they create a machine learning DevOps “pipeline”?

The difficulty is that machine learning programs do not implement business logic as testable units, and so “code coverage” cannot be measured in the normal way, since neural networks logically consist of “neurons”—aka “nodes”—instead of lines of functional code; nor is it possible to deterministically define “correct” outputs for some of the input cases. Worse, creating test data can be laborious, since a proper test data set for a machine learning application generally consists of 20% of the size of data used to train the application—which is often tens or hundreds of thousands of cases. A final challenge—and this is arguably the worst problem of all—is that neural networks often work fine for “normal” data but can produce very wrong results if the data is slightly off: identifying these “slightly off” cases can be difficult. To summarize, the problems pertain to,

Measuring coverage.
Generating test data.
Identifying apparently normal test cases that generate incorrect results.

In a paper last September, four researchers (Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana) explained how they solved these problems by treating them as an optimization problem. We have used their techniques to create a DevOps style automated testing pipeline. For the purpose of this article, I will confine the discussion to multilayer—so-called “deep”—neural networks.

Developing a deep neural network is an iterative process: one must first analyze the data to decide how to organize it, for example whether it should be clustered, categorized, or additional attributes derived, whether time or spatial correlations need to be accounted for (e.g., through convolution). After that, the neural network architecture must be chosen: how many layers, and the layer sizes, as well as the way in which the network learns (e.g., through back propagation). This process takes quite a long time, with each iteration measured by the success rate of the network when tested with an independent data set.

Thus, developing a neural network model is an exploratory process. However, many real neural network applications entail multiple networks, and often functional software code as well, making the networks part of a larger application system, with an entire team working together. In addition, before a change to a network can be tested, the modified network must be trained, and training is extremely compute intensive, often requiring hours of GPU time. These factors make an automated testing “pipeline” useful.

In the figure above, step 1 represents the process of network adjustment: think of it as a developer or data scientist making a change to the network to see if the change improves performance. Most likely the developer will test the change locally, using a local GPU; but that test is probably only cursory, possibly using a somewhat simplified network architecture, or trained with a small dataset. To do a real test, a much higher capacity run must be performed. The change must also be tested in the context of the entire application. To enable these things, the programmer saves their changes to a shared model definition repository, which is then accessed by an automated testing process.

What is unique about our process is that in step 2 we use the algorithm developed by Kexin et. al. to generate additional test cases. These additional test cases are derived by analyzing the network, and finding cases that produce anomalous results. We then execute the test suite (step 3), including the additional test cases, and we record which neural nodes were activated, producing a coverage metric (step 4). Finally, in step 5 we execute the test cases again (probably in parallel with the first execution), but using an independent network implementation: our expectation is that the results will be extremely similar, within a certain percentage, say one percent: any differences are examined manually to determine with is the “correct” result, which is then recorded and checked against the next time around. This process avoids having to manually inspect and label a large number of test cases.

We are still in the early days of using this process, but we have found that it dramatically improves the overall process of test case management, test case completeness assessment, and also reduces the turnaround time for testing model changes.

So-called low-code and no-code frameworks enable non-programmers to create business applications without having to know programming languages such as Java, Python, and C Sharp. People sometimes ask me if this means that DevOps is not relevant.

The amount of code is not the issue. The issues are,

Do your apps interact?
Do your apps share data stores?
Do your apps change frequently?
Do you have multiple app teams?
Do you have severe lead time pressure?
Do you have very high pressure that things work?
Do you have large scale usage?

If any of these are true, then you begin to have a need for DevOps, and the more that are true, the greater the case for DevOps.

Whether your code is created via a drag-and-drop GUI, or through careful hand coding of Java, C Sharp, or some other language does not matter. Low-code platforms provide a runtime platform, which can be a SaaS service or an on-premises server, and so it is little different from a PaaS arrangement that is either in a cloud or on premises. The question of whether it is low-code or coded is irrelevant.

To illustrate, consider a low-code application, developed using a low-code tool such as Appian. Two of the Appian app types are site and Web API. Suppose that we create one of each: a site, and a separate Web API which the site uses. If we assume that our user base is a few thousand users, and that they work 9-5 five days a week, then we do not need 24x7 availability and so we can use weekends to perform upgrades, and it also means that we do not need to worry about scale, because handling a few thousand users will be pretty easy, and so we do not need a sophisticated scaling architecture. So far, it sounds like we do not need DevOps.

Now consider what happens when we add a few more apps, and a few more teams. Suppose some of those other apps use our Web API, and suppose our app uses some of their Web APIs. Also suppose that we need a new feature in our app, and it requires that one of the other Web APIs add a new method. Now suppose that we want to be able to turn around new features in two week sprints, so that every two weeks we have a deployable new release, which we will deploy on the weekend. Do we need DevOps?

We do, if we want to stay sane. In particular, we will want to have,

Continuous automated unit level integration testing for each team.
Automated regression tests against all of the Web APIs.
The ability of each team to perform automated integration tests that verify inter-dependent API changes and app changes.
The ability to stand up full stack test environments on demand, so that the above tests can be run whenever needed, without having to wait for an environment.

This is starting to sound a-lot like DevOps - and it is. At this point, we are well on our way to fully automated pipelines continuous delivery: we are doing DevOps, and the fact that code is created by drag-and-drop tools does not matter one bit, except that it means that our developers are more productive and hence we probably have an even higher rate of feature development - making the case for DevOps even greater.

What if non-programmers are creating their own apps? In that case, the question is, are they impacting each other? For example, are they modifying database schema, or data structures in NoSQL stores? If so, they you will be in serious trouble if you do not have DevOps practices in place. Are those non-programmers writing Web APIs? If so, then you have the same considerations.

It is an integration question: if you have lots of things, and they need to work together, and you cannot afford to take your time about it, then you need DevOps.

Pushing the envelope

Saturday, February 24, 2018

A deep learning DevOps pipeline

Tuesday, February 6, 2018

Low code and DevOps