Incremental Gain, Monumental Pain

A common paradigm of software engineering is making progress continuously rather than discretely. By this, I mean that given an ideal, most would prefer to make small edits that get them closer to the ideal rather than a large change all at once. This means that the workflow for change looks like the following:

There is a realization that something is locally wrong.
The root of the problem is identified.
A change is implemented that fixes the problem.

This belief has its merits. After all, for most systems, it’s often unclear what the eventual state of the system should look like. It’s easy to cut down on known ugly parts of the system, since knowing what it shouldn’t look like amounts somewhat to knowing what it should look like.

But in reflection, we often find pieces of software in utter disarray. Reasonable questions such as “How did we get here” end with answers like “We made the best call at every turn, but somehow we ended in a suboptimal state.” To be clear, it’s not always as if corners were cut to ship a feature. Sometimes, it’s a partial refactor; other times, it’s a feature slapped on top of infrastructure not designed to support it. At the time the shots were called, these were always the right business decisions, but they somehow converged to a less-than-suboptimal state.

So what happened?

A Machine Learning Analogy

To take an analogy from machine learning, if we think of a program as something solving a problem, we can also think of it as minimizing some objective function. For example, an objective function might be minimizing the manual labor required.

In this analogy, making small changes at a time to a system amounts to doing gradient descent on a somewhat unknown objective function. Concretely, the three steps above would look like the following:

There is a realization that the program is not producing output in a way that minimizes the objective function. In following the example above, it might be that manual intervention is higher than it needs to be.
The immediate cause is identified. In the example above, that might be something as simple as realizing that previous automation has since been broken.
A software engineer implements a fix. In the example above, that might be fixing the previous automation.

Before we continue, let’s address some things. First, why was this described as a “somewhat” unknown objective function? This is largely a criticism of step 2. When problems are discovered, it’s often the case that everyone is acutely aware of the immediate problem, but that the broader or deeper problem is less obvious. In this case, we can say that the objective function is locally obvious, but globally unknown.

We know how this ends with machine learning. If a learning algorithm is only allowed to explore a local space, it will always converge only within the confines of that space. In our example here, it means that we will only solve problems that we identify. In such a case, the foregone conclusion would be to consider a wider range of problems than the one at hand. That’s perhaps an obvious statement, but a deeper example is more interesting.

There is a more insidious variation that I think is more prevalent. Consider the case where the global objective function is identified in step #2 rather than only the local objective function. Furthermore, assume that there exists a global minimum that is not close to the existing state; that is, there exists a local minimum that is closer than the global minimum to the existing state, and that moving from the existing state to the global one requires performing worse on the objective function before performing better.

In such a case, we are faced with two possibilities: go for the local optimum, in which case a smaller amount of work is required; or go for the global optimum, in which case much planning is required and much risk is taken on, but perhaps it gets us closer to the global minimum.

Most would choose the first option: go for the local optimum. We call this “incremental improvement.” But most also never make the connection that they can never “incrementally improve” towards the global optimum under such circumstances; they will always be in a better state than they were before, but there will exist pain points which cannot be fixed.

So What To Do

It’s a curious balance of optima and work that we have to balance. We have on one hand the comfort of our own optimum “well,” where we know which problems we have yet to solve and those which are unsolvable. We have on the other hand the ideal “global minimum well” where there are undoubtedly some dragons lurking accompanied with the hope of a brighter future. I don’t have a silver bullet as to which to choose, but I do have some thoughts, followed by recommendations:

Thoughts:

I think if software ever hits the point of “How did we get here?”, some horrible mistake was made. It means that someone either forgot the difference between a local and global optimum, or that they thought the optimum they were optimizing for was the global one.
This means that it’s quite important to understand the landscape before making changes. As they say, a month in the laboratory can often save an hour in the library!
Expanding on the above, sometimes the correct choice is to optimize locally even if the eventual plan is to make a discrete jump to better optimize globally – it just so happens that accumulating more pain points informs you about the global objective function.