Programming breakthroughs we need

I feel like we need a few breakthroughs to drastically change how we develop software. And when I say breakthrough, I mean huge breakthrough. Something like a "structured programming" breakthrough, which completely changed how we think about programming. Here are some observations and ideas about that.

Writing glue code and boilerplate is a waste 🔗

Most code I write doesn't do anything interesting, it's either some boilerplate or glue for connecting subsystems together. It feels like this kind of code was already written many times before and will be written many times in the future. So why should I even write it again?

Well, the problem is that the code is different enough, that usually I can't use the existing code as it is, I have to modify it. And this comes with a lot of additional baggage in the form of the whole software lifecycle - now I can't just use the code, I have to write it, test it, build some automation, deploy...

To be more specific - for most web projects I work on, I have a very similar yaml file for CI, Dockerfile, some script to compress images, some script to run migrations, some boilerplate for routing, authentication, some language/framework setup like package.json or Cargo.toml, deployment config, 3rd party integrations and so on.

And then, usually the biggest problem is an enormous amount of CRUD boilerplate that looks very similar in each project, but it's nevertheless different in important details. This is typical for web projects, but other kinds of software have a similar thing going on.

Why not use a framework? 🔗

Some projects try to package this into a single framework, but this approach doesn't always work either, because it necessarily introduces new generic mechanisms and complexity, often hiding important details you care about. Because these things require a good amount of customization, such framework often introduces its own kind of glue and boilerplate.

It feels like there's an unexplored dimension of abstraction here. Something where generics, interfaces and higher order functions are too static and low level primitives. Something where it's really difficult to pinpoint what exactly is the repeated pattern here and how to exploit it. Instead of generic framework that can do everything, I'd like an efficient way do something specific.

A good hint is the recent GitHub Copilot development. It can easily generate a lot of common boilerplate code which has these "same, but different" properties I describe. Nevertheless, I think we need something more powerful, which brings me to the next topic.

Editing code in general doesn't work well 🔗

It feels like the way we program is just not optimal. Changing the program is the most common thing we need to do in our work, it's the reason our job even exist. And yet, most of programmer's time is spent reading or planning how to change the code.

Not only that - many activities we do are done only to avoid changing the code prematurely or making the change less risky. That's why we read code so much, we need to understand it really well to change it correctly. That's why we do grooming, code review and automated testing.

Program is not a text 🔗

The question is - if most of the work we want to do is about changing existing code, then why is the system not optimized for change by default? The code we write is optimized for reading the source text and its storage.

We spend endless amounts of time bikeshedding the right syntax, indentation level, tabs vs spaces, or where to put code in the structure of files, but this all feels just pointless - these are all properties of text, but the text is just a tool to manipulate some abstract model of the program.

Program is a model 🔗

I imagine this model as a relational database - you have tables like structs, fields, functions, arguments and relationships between them. When you think about it this way, it becomes clear that using a textual source code is really inefficient way to manipulate this model. It's very error-prone and requires tons of additional processing.

I first started thinking about this when refactoring Zebra, transforming C2Rust output to safe Rust. I like using IntelliJ, because it helps a lot with editing code by automated refactoring, but for this one, it wasn't enough. I needed something much more powerful.

IntelliJ helps you with things like "extract parameter" or "inline function", but I wanted to do stuff like - for each function that touches this global variable, I want to extract it to a parameter that is &mut if it needs to be mutated or & if it's immutable and move the global into main function as a local variable - and also, do it for all globals in this file.

Or even better, I want to specify a goal like "I want this function to not take this parameter" and let the system figure out how to transform the program to achieve this goal. I can imagine a system that can combine small transforms into larger ones and use some AI magic to figure out how to compose them to achieve the specified goal.

Do we need a language? 🔗

Notice that none of the things I want are related to a specific programming language. I don't even care about how the text looks like, where the files are located, how imports work, what is the order of parameters, where to put braces or if braces even exist. None of that is important, so why do we even bother? Why not jump right into the meat of the matter?

Also notice how writing the refactoring as a query over the model is actually not that difficult. I can imagine how I would write an SQL query like that in a few lines. On the other hand, writing an automated refactoring system in IntelliJ or VSCode sounds like a lifetime problem, and it's kinda unsolvable.

Why? Well, because we treat programs as text, so we inherit a lot of complexity connected to that. We have to worry about importing, formatting, filesystem, type inference, macros and more. All of this is accidental complexity, related to how the input text is mapped to the program model.

If we instead focused on building the right model, we could better optimize that model for editing and the text could be just a view of that model. If the text is just a view, it doesn't matter how it's written. Let everybody customize it the way they want. I don't care if you put opening brace on a new line, I don't even want to care.

As far as I understand, this is what Dion project tries to explore. I'm pretty excited for what comes out of that.

Testing and Correctness 🔗

Testing is related to all of this. Here's a bold statement:

Software testing doesn't work

No matter how hard we try, it just sucks. Writing tests is time-consuming, usually doesn't scale, and it easily creates tight coupling with implementation, which makes change difficult. Tests also frequently don't test what we care about and require a lot of tooling and process to make them useful. The only thing worse than testing is not testing, but testing is not much better.

Programmers can't even agree on basic things like when to test, how to test, if unit tests are any good or if TDD is the only true way to build software. This is a good hint that we haven't found the right way, yet. I think we need a breakthrough that will just end this nonsense debate for good.

Whenever someone points out a problem with unit/integration/e2e testing, there's always an army of people who respond with "you just don't do it right". Same with TDD - every complaint on TDD is accompanied by "if done right" comment. If that's the case, and testing really requires so much experience and careful work to "be done right", then is it even worth it? Why don't we find out some better way to test?

I want simpler testing 🔗

We have some promising ideas. Some examples include strong type system, fuzzing, snapshot tests and sanitizers. Those all seem on the right track, because they allow us to cover a whole dimension of tests by a single tool.

What I miss is some generic mechanism to make testing super cheap and effective. That's why I don't think unit testing or any kind of system where you write tests manually is on the right track. Those approaches work and there's something to them, but they just fundamentally don't scale. This week I reviewed a code that had 90 lines of test code to test a one-liner with 2 test cases. That's not very effective.

Notice that if we change our programming model the way I described in the chapter above, we could find some completely new approaches to testing. Maybe you could write tests as queries that would test a whole set of possible programs, not only the current version of your program at the moment. But I don't think that's necessary. Some approaches already work well today, and it's only a matter of better integration, which comes with time.

What is the vision? 🔗

There's a common thread in all the points above. All major breakthroughs require a shift in perspective. Structured programming completely changed how we write code. I think we need that. Structured programming was a shift in how we look at program structure. Here, the shift is related to program change over time.

The whole agile revolution pushes us in this direction - our process is based on feedback and fast iteration. We need to change things all the time and experiment with them. That's why we need tools that allow us to change stuff all the time. We have some - we have git, database migrations, terraform, CI systems, cloud... But many of our practices are still holding us back. We are limited by size of our changes.

If you imagine all the breakthroughs above combined, what world do you see? I see a world where we can develop programs fast, changing requirements is even more welcome than in Agile Manifesto and programming is based on change even more than before.

Refactoring doesn't even need to exist as a separate concept or activity - programming itself is that. We regularly change the whole program every day. Migrating to new database? Payment provider? Frontend framework? You can just do that, without designing an interface abstraction for it, ever. You don't need UI mockups and prototypes - you just program the real thing, because it's that simple. If the user doesn't like it, you can completely restructure it easily.

Taking this even further, what if the whole concept of a programming language doesn't even make sense anymore? You just have the program model, and it doesn't matter how you render it. Maybe even semantics doesn't matter - you can write custom rules as queries for the model. What if the "programming language" is just a set of tables and queries you pick however you want?

Closing thoughts 🔗

Will this vision pan out? I don't know... maybe. But I think we're heading in this direction already. It won't be overnight revolution, even structured programming wasn't. Many of these ideas are not explored enough yet, and some of them might even be impossible to implement, who knows. Testing is the most developed out of these, I'd say. Even though it's still far behind what I want, I think most of the building blocks are there.

For some ideas, I don't even know where to start. I can imagine the model based programming quite easily, but tackling the boilerplate/glue problem seems more difficult. Do we need something like a common protocol? Or do we just generate the boilerplate with AI? Do we have some tests and use AI to generate code until the test pass? And if we use model based programming, how does glue code for it even look like? Is it still the same issue?

Let's see if this wishlist materializes in next few decades.

Discuss on Reddit

13 August 2022