Feb 6, 2026

AI Learning vs Configuration: The Illusion of Memory

The Trap of Conversational Intuition

Why do we so easily attribute memory to a system that resets after every response? Humans interact and learn through conversation. It is familiar to the point of being invisible. Unsurprisingly, our conversational intuition shapes how we use intelligent tools. Because the tool uses language in a conversational style, we apply our intuition about conversations. Most notably, we expect humans we speak to, to remember at least some of the interaction forever. Chatting with an LLM feels like a simple turn-based chat, but there is more going on behind the scenes.

There are several steps that are mostly hidden by the client you use.

*** Start Conversation ***
[user] Explain SQL injection to me with examples.

[agent] SQL injection is what happens ...

[user] How do I prevent this?

[agent] The short answer is: never let user input ...

*** End Conversation ***

Note that in the conversation above, the second user question does not mention the topic again; it just refers to “this”. This illusion of memory breaks down when you start a new conversation and try to continue the old conversation:

*** Start Conversation ***
[user] How do I prevent this?

[agent] I'm missing the *this*. The universe is vast, but your problem is probably smaller than the galaxy.

*** End Conversation ***

At its core, LLM inference is stateless.

Learning:

One of the outcomes of learning is acquiring or modifying existing knowledge, behaviours, skills, values, or preferences. It often involves permanent changes in the structure of the processing unit (brain or neural network).

The Hidden Conversation

The client program handles conversation state, not the LLM. A chat turn (inference request) from the client to the LLM contains a lot more state than the text you just typed in. Most importantly, it logically contains ALL the text of the conversation thus far, including the responses from the agent. As the conversation gets longer, the context gets larger too. But here is the sticking point: models have an upper limit on input (and output) size. This has been one of the areas of technological improvement. This includes models supporting larger context sizes and caching strategies that allow processing a smaller context. All of this supports longer conversations, more complex instructions and more capable outputs. But notice that a “new conversation” starts with a blank context.

It gets more interesting though, because a “blank context” does not mean an empty context. There is an implicit section to every conversation state that adds behaviour instructions. This system prompt can be as simple as “You are a helpful assistant”, leaving the output largely based on the training data of the LLM. It may also be very complicated to enforce structured output, create “personality”, implement semantic guardrails, limit scope, and other possibilities. The system prompt goes into the context with every message sent to the LLM.

For commercial systems, you usually do not have access to change the system prompt, and this gives them control to shape output.

Tool calling is another common hidden section added by the client.

The theoretical structure of your second LLM chat request looks like this:

[system] You are a helpful assistant
[tools] { ... }
[user] Explain SQL injection to me with examples.
[agent] SQL injection is what happens ...
[user] How do I prevent this?

Notice how a lot of this is repetition and lives in the limited size of the context.

Context is a limited resource and can fill up remarkably quickly if not managed well. Computation and especially memory requirements increase rapidly with context size at the moment. The token count, which is often the unit being charged for, includes the full context and all its hidden fields too. This means that systems that use a very large context can get costly.

The other issue is what to do when the context actually fills up? Much like a whiteboard, you have to remove something to make space. Context compaction strategies must be implemented in your client. An early strategy was to drop the oldest messages, resulting in your bot forgetting parts of your conversation. Summaries or other compaction methods improve on this, but like any summary, it is lossy compression.

Using RAG (Retrieval-Augmented Generation) is a way of getting documents larger than your context, or at least the relevant parts of the documents, into the context. It will still fill your context, making the quality of the added chunks very important.

Ultimately, the size and contents of the context determine a lot of the quality of your conversation. This is not an accident but a feature of how LLMs operate.

Clients matter: The client (e.g., ChatGPT web, OpenCode CLI, Cursor) manages what is actually sent to the model.

Resources: Memory requirements for most current architectures increase significantly with context length. A large context window can easily push memory requirements beyond what consumer hardware can handle.

Commercial Control: System prompts are the “hidden hand” that keeps commercial bots within their intended safety and personality bounds.

Lost in the Middle: Older research suggested that LLMs are better at using information at the beginning and end of the context window than information buried in the middle.

The Whiteboard and The Automaton

Think of the context as a whiteboard. It is a temporary space for structured notes. It has a finite size and can fill up. The LLM is an automaton that looks at the whole board every time and then adds a line at the bottom for you to read. The client is the one that holds the pens and eraser.

The same automaton (LLM) can look at a different whiteboard (context) and also add a useful line there. This is what makes the technology scalable. The LLM is trained in the Machine Learning (ML) sense to turn context into output. The context does not change the LLM model in any way. There is no learning by the LLM. All the changeable parts are on the whiteboard.

There are techniques that allow for fine-tuning of an LLM, which is like changing its prescription lenses. An addition that facilitates focus and perspective, but does not depend on what happens to be written on the whiteboard at inference time.

New LLM models are created often now. The context they can handle expands. The way they use the hidden context sections changes and improves. The way they turn context into useful output improves. But what they do not do is internalise context. That remains a strictly external input.

Training involves adjusting the internal parameters (weights and biases) of the model to minimise error. This is a computationally intensive process usually done on massive clusters of GPUs, not on your local machine during a chat.

Fine Tuning is a smaller-scale training process where a pre-trained model is further trained on a specific dataset to specialise its capabilities. Unlike context, this does permanently change the model’s weights.

Remember me?

If the model is a stateless automaton that never truly “learns” or “internalises” our conversations, then how does it appear to know who we are? How do we adapt this rigid architecture to our deeply personal needs?

Sometimes I want it to learn things and change its behaviour across conversations. Some commercial tools have learned my name and some of my preferences. This is accomplished by letting the client manage context in clever ways. If I define a biographical section that can be added to the context with every request, the LLM will always “know who it is talking to” and generate output accordingly. Even better, if my system prompt has specific instructions about what to do with this information, the output will feel personal and focussed.

[system] The user is described in the *USER* section. Tailor your answers to their experience and preferences.
...
*USER* I am a software developer with 20+ years of experience.

This is something your client does, not the LLM. It lives in a configuration file. You have not changed the LLM; you have changed the context.

Commercial tools have “memories” that are per-user configuration files about you, your preferences, and behaviours. It is stored on their server and injected by their client. It is worth thinking about the security implications of this arrangement and the nature of what these “memories” may contain. This is why using local-first tools where you own the storage is a good idea.

LLM Learning is Configuration

Managing what gets injected into your conversation context is a very powerful technique. Many of the current Command Line Interface (CLI) tools allow some system of agent definition files. These files are essentially system prompts that the client uses in various circumstances. A new agent configuration file creates a new behaviour. Add these agent files to your repo like other configuration files for tracking and improving over time.

We can add another layer by including instructions for behaviour on various levels: project, personal, or global.

Personalised useful behaviour is codified in these configurations.

Use an LLM prompt to assess context and alter configuration files. Openclaw does this as part of its interaction loop.

Making it Useful

One of the key ingredients in making this work is the client. While web-based offerings remain useful, the true power of this method comes from local clients where configuration is a first-class construct. The current crop of CLI clients leans strongly into this way of using configuration.

Here are some of the items I added to my global GEMINI.md file to guide overall behaviour:

## Gemini Instructions
- When a mistake is made, do not apologise. Acknowledge the mistake and focus on providing a solution.
- Use British English spelling and grammar for everything.

For OpenCode, I use project-specific agent files.

Some tips:

Understand how your tool uses configurations
Be deliberate about what you put in your context
Iterate and improve your configurations

Understand the stateless nature of the model. You are not teaching it, but architecting its current context. It is not ‘memory’ that makes it powerful, but the configuration in the present moment.

Writing down clear configuration instructions for an LLM in itself is a useful way to explore your own understanding of a subject.