Skip to content

Yoyo Code
Matyáš Racek's blog


The key insight for programming with LLMs effectively

I keep reading various takes on LLMs, and I kinda roll my eyes a bit pretty often. It seems like there are two crowds talking past each other.

One is the "maximalist, embrace exponentials, all my code is written by AI" kind of group.

The other is the "what are you talking about, AI produces slop. It can't even do the basic tasks I do."

At first glance, it might seem weird that such opposite takes can exist at the same time, but I'm not that surprised, actually.

In my experience, LLM usefulness heavily depends on the task, and there's a massive chasm between tasks where it's a good and bad fit.

I can definitely imagine people working only on one side of this chasm and being perplexed by the other side.


Part of it is also a skill, though, so I want to write down what I think is the source of this discrepancy, and maybe it helps you figure out if you're missing out.

The core idea is simple. LLMs are token prediction machines. Modern models are a lot more sophisticated, but the core of the technology is still probabilistic token prediction.

This means that the more predictable the task is, the better will LLM perform.

That's it.

Some obvious examples are

  • translating JavaScript to TypeScript
  • converting require to import
  • generate a test

Some of these are predictable from training data

Some of these become predictable if you give enough instructions

Some of these are inherently predictable

And some of these get predictable with better models because models are fine-tuned to better predict what we actually want from them.


The trick that everyone forgets to mention is the skill of finding out how to turn generic tasks into something that is predictable enough for LLM to do. For example, if you want to generate a test, giving it a list of steps helps.

If some task is not really predictable, there's a related skill of finding out creative use of this "predictability" feature to use LLM to do some subtask you wouldn't otherwise do, but it saves you time if you have LLM available.

A typical example is generating test data. It often follows some predictable structure that you don't want to write by hand, but it's also diverse enough that it's not simple to generate with a script.

The rabbit hole of these techniques goes pretty deep, but they usually exploit this core predictability property. Tool calls, DSLs, tests, skills, and context protocols are all techniques to add information to make the result more predictable from LLM's point of view.


One related skill I find useful is to steer LLMs in a direction that is easy to verify. For example, I tasked an agent to migrate a codebase from JavaScript to TypeScript. This is usually pretty tedious work. It requires some thinking, but a big part of it is very predictable. The problem is that the agent started to fix type errors by changing the runtime code (adding type guards, null checks, etc.).

This is a case of a common complaint about LLM assisted coding in that it changes your work from "fun coding work" to "annoying review work." Verifying that all these runtime changes don't break something is tedious and to some extent nullifies the benefit of using the tool. In this case, it's simpler to give the LLM guidance like: ignore type errors, do the initial pass, and I'll do the rest. This way, all the changes are just changing imports, file extensions or adding type annotations, which is all trivial to check for runtime behavior changes.


To get back to the original example, it should be pretty clear where the stark contrast is coming from. Some projects just have way less of this predictability to exploit.

I think this is pretty typical for mature codebases. They are big and diverse, many common patterns are abstracted into project-specific abstractions, and most of the work is either debugging obscure and hard to track down issues or trying to figure out how to do a few surgical cuts to squeeze in a new feature into all the existing constraints, many of which are not obvious.

I can relate. After my initial excitement from fixing all the low-hanging fruit in our issue tracker, I had to come to terms with the sad reality that most of our other issues are just not easily fixable with LLMs. After you knock out the low-hanging fruit, most of the work falls back to normal or into figuring out how to transform the problem into something LLM can help with.


Models will get better, but I think this problem is pretty fundamental. Program is semantic compression of the problem space it tries to tackle. As the program gets older and this approximation approaches optimum, most changes encode properties of the problem space that were previously not known. If something is a predictable part of the problem space, then it'll probably already be in the program. Changes become less predicatable over time.