<aside> 📢 TLDR: Models are going to keep getting better and replace a lot of jobs, possibly software engineers. This page goes through many, many research papers backing this claim.

</aside>

Github copilot, and similar systems, are using software engineers to train their system, with the eventual goal of replacing anyone who writes code.

This is a big claim, but currently, there isn’t any research to indicate that this would be impossible. I am usually not a fan of claims that hinge on lack of evidence, but following trends in research, token cost and their effect on model performance leads me to believe that software engineering will fundamentally change. Even if “collective intelligence” systems don’t replace humans all together, the copilot-like system will be doing most of the work, only getting human feedback on things it’s unsure about.

Humans hunting for food get a rush of dopamine when they have a successful hunt - nutrition is consumed, and the human feels happy that they ate. The human is more likely to hunt for food in the future because it worked this time. Another time the human successfully kills an animal, takes it back to camp to put it on the fire, but accidentally burns himself by touching a hot rock next to the fire. This hurts the human’s pain receptors in the human’s arm have activated and sent pain signals to the human’s brain. The human remembers this interaction and has learned not to touch the hot stone again.

This is reinforcement learning; Behaviours with positive outcomes lead to positive rewards and are more likely in the future, and behaviours with negative outcomes get negative rewards and are less likely in the future. This is how humans learn, it is also how advanced systems like ChatGPT learn.

Whilst I couldn’t find any explicit reference to reinforcement learning being used in the training AI systems like Github copilot or DIDACT, Google’s internal model, but I am going to assume this based on the available public information on the two systems.

How Github Copilot changed coding

In the pre-copilot days, humans used to write all the code themselves; The software engineer would think about a problem, meet with their team, gather requirements, sit down to write the code, test it, document it, and then release it. This singularity in problem-solving meant that the same problems would often be solved over and over again.

Software engineering before GitHub Copilot

With the introduction of GitHub Copilot in 2021 the software engineering paradigm changed; all of a sudden everyone writing code was connected. When you write code, the system would take all the information associated with that; the code you wrote, the buttons you pressed, if you accepted the suggested code from copilot, lint errors, if the runs failed or passed, human comments on repositories.

The models aren’t just trained on “everyone else’s code”, but a rich representation of everything that’s going on; every developer interaction with the code, every pause in accepting a code snippet, if that code snippet was edited after being generated, and so on. Fundamentally this is not as simple as people realize - The code isn’t just a regurgitation of someone else’s code trained on the next token prediction (so-called “distribution matching”), but the model gets to guess what it thinks is correct, and just like our pre-historic human, will learn

If we apply this flow of information to the post-Copilot systems since 2017, we end up getting a system where the code-generating agent gets feedback on every suggestion it makes; this would come in the form of metrics like code acceptance, code editing after acceptance, how long generated code sticks around, if the generated code passes tests, for example. In this model, software engineers are like reward functions; They are supervisors for the agent, and the agent learns the policy of the reward function through repeated interactions with the environment.

Software engineers as reward functions reinforcing behavior of the Copilot system

This method has proved to be extremely successful, for example in Google’s recent blog post update they announced that their code-generating model reached a code acceptance rate of 37%, with 50% of all code by chars now being written by the model itself.

This lines up with the trends of the models approaching or surpassing humans on all benchmarks.