AI Self-Sufficiency II: Language Models vs Tools

I recently returned to working on my own AI self-sufficiency. In my previous post I wrote about the first steps toward AI self-sufficiency: why it is worth the effort, what you need to buy for your home or office, and where to get open language models. Since that post I have been actively exploring the topic at work: doing software development with the assist of local language models.

The work has been interesting and thought-provoking, as always. This time, however, I would like to use these efforts as a framing story to introduce a particular AI topic that now cuts across the whole field and is crucial for general AI literacy: the relationship between the language model and tools. From the same starting point it is also natural to explain how a language model and an agent differ from each other.

Using a local language model in Cursor

Cursor is an AI-assisted IDE (integrated development environment) primarily for software development. Cursor is a fairly new product. It is one of the early-wave tools for AI-assisted practitioners and, in the opinion of many, currently the best AI-assisted IDE. In practice, using Cursor boils down to buying a Cursor license (or using the free tier) and some AI credits or “tokens” or “token quota” from Cursor, or entering an API key from your chosen model provider into Cursor, and then starting to code with AI assistance.

The second topic in this section is the local language model. I wrote about that in more detail in my previous post.

It is possible to use local language models with Cursor. In short, it works like this:

  • you define an alternative OpenAI API URL (in Cursor),
  • you route traffic to that URL to your own language model, and
  • you add your model to Cursor’s model palette.

The language model must be served using the OpenAI API specification. There are open source projects for this purpose, and for my own experiments I chose mlx-openai-server.

For the model itself, I picked the well-regarded Qwen3-Coder-Next quantized version. Pretty quickly, the model was already answering in Cursor: "Hi Henri. How can I help you with your mlx-openai-server project?" Without delay I got to work and asked the model to read the project’s license file. That is when things stalled: Cursor clearly tried to do something, but nothing happened. Reading files in Cursor requires tools, so I knew I had a tool-usage problem in my setup.

How are language models and tools related?

Let’s briefly review what a language model is. It is a machine learning model that models natural (and often also artificial) language. The model is trained on countless conversations, and its capabilities include, for example, completing sentences and especially being able—based on its training—to guess what a typical answer to a given question or request might be. This can sound complicated, but the following sentence simplifies things greatly: a language model receives only text as input and produces only text as output. A language model cannot do anything else and does not do anything else.

This seems to contradict our experience. We know that when we chat with ChatGPT or Claude, they can do all kinds of things: they can perform web searches, create files for you to download, tell you the time, and so on.

Each of the above actions requires something more than just reading and producing text.

To clarify what is going on, we need to notice that very few of us talk directly to a bare language model:

  • ChatGPT and Claude are in fact websites that sit between the user and the language model,
  • Cursor is software that primarily forwards user requests and other data to the language model and back to the user.

These software products include a so-called tool prompt that is added to every conversation, before the messages sent by the user. The tool prompt describes which tools the calling platform has available. Because of this, the language model knows that tools can be used and it knows which tools are available.

What happens when a user sends a message to the language model? The language model receives information about the tools and an instruction to consider whether any tool should be used, given the content of the user’s message. The language model still cannot do anything but generate text.

After that, the language model does the only thing it knows how to do: it produces text. If it decides that some tool should be used, it says so. It produces a response that expresses that now a particular tool should be used.

In most open-source models, the tool call instruction generated by the model looks something like this:

<tool_call>
<function=Read>
<parameter=path>
/home/henri/mlx-openai-server/app/middleware/auth.py
</parameter>
</function>
</tool_call>

The instruction says that a tool should now be used, which tool to use, and how to use it—for example, use the web search tool to find cat pictures.

So if you ask ChatGPT to check which country is the happiest in the world, the GPT language model’s response might look something like this:

Okay, Ill check that for you.<tool_call>
<function=WebSearch>
<parameter=query>
happiest country
</parameter>
</function>
</tool_call>

What you see is ChatGPT replying "Okay, I’ll check that for you." Under the hood, the ChatGPT web application detects the tool_call marker, runs a small web search program, and then sends the tool’s output back to the language model, which reads the result and continues the conversation with you: "At the moment, the country rated as the happiest in the world is Finland…"

Language model vs agent

Tools are just as essential for agents.

Agents are, to some degree, autonomous programs: you can send them off to perform a certain task. The quality of an agent’s work varies, but the essential point is that the agent can operate somewhat independently and has some degree of judgment and freedom of choice. It has more of both than a bare language model, which can only produce text.

Typically, an agent’s judgment is mainly determined by the capabilities of the underlying language model, but its freedom of choice comes from the fact that it has multiple options. For example, it can decide to use some tool instead of replying to the user directly. Because of this, having tools available is considered a necessary condition for something to qualify as an agent.

from Imgflip Meme Generator

A few closing remarks

In broad strokes, communication between language models and tools works as I described above. In practice, however, the details are often quite intricate. For example, in the OpenAI API, OpenAI does not send tool call markers as raw text to the caller; instead, it parses them and sends the tool calls to the user in structured form. You quickly become familiar with these provider-specific details when working at the interface between language models, agentic platforms, and other software products. In my case, for example, I had to implement the same OpenAI API parsing feature in my model server before tools started working with Cursor and open language models.

Working with Cursor and the OpenAI API also yielded practical results. I gave Cursor and my local language model a test assignment: a relatively small and clearly scoped programming task—add API key support to mlx-openai-server. Fifteen minutes later, and with the machine running noticeably hotter, the work was done; tests showed that the solution behaved as expected. The model also wrote tests for the new feature.

Next I plan to dive into a few new topics:

  • I will experiment with smaller models—they are faster and cheaper to run,
  • I will try to build some kind of model router, a component that chooses a suitable language model for a given task, and
  • I will experiment with using an open source IDE.

contact me

social