Programmatic Interaction with a Local LLM in .NET

February 20, 2026

If you’ve worked with the OpenAI API (I wrote about that here), you’ll know it’s a pretty straightforward process: construct a request, send it to a cloud endpoint, and get a response. But what if you want to do the same thing locally, without sending your data to the cloud?

In this post, I’ll walk through how to interact programmatically with a local LLM using .NET - and then build a real application on top of it. No cloud APIs, no API keys, no usage fees - just a model running on your own machine. The code is here.

Why Local?

There are a few reasons you might want to run an LLM locally:

Privacy - your data doesn’t leave your machine.
Cost - no per-token charges.
Offline - works without an internet connection.
Experimentation - try different models without spending any money (other than electricity).

Disclaimer

You probably need a fairly hefty machine to run some of these - not a supercomputer, but not a 486 either!

LM Studio

For the local server, I’m using LM Studio. It’s a desktop application that lets you download and run open-source LLMs, and it exposes an OpenAI-compatible API. That last bit is the key - because the API is compatible with OpenAI’s format, the code you write to talk to it looks almost identical to what you’d write for the cloud.

Download LM Studio, pull down a model (I’m using google/gemma-3-4b), and start the local server. You should see it listening on port 1234 (by default).

The Basics - Sending a Prompt

The core interaction is a chat completion request - the same pattern you’d use with OpenAI, but pointed at localhost.

The client that talks to LM Studio is an HttpClient wrapper that posts to the chat/completions endpoint:

public sealed class LmStudioClient : IDisposable
{
    private readonly HttpClient _http;
    private readonly string _model;
    private const string DefaultBaseUrl = "http://localhost:1234/v1/";

    public LmStudioClient(string? baseUrl = null, string? model = null, Logger? logger = null)
    {
        var uri = string.IsNullOrEmpty(baseUrl) ? DefaultBaseUrl : (baseUrl.EndsWith('/') ? baseUrl : baseUrl + "/");
        _http = new HttpClient { BaseAddress = new Uri(uri), Timeout = TimeSpan.FromMinutes(5) };
        _model = string.IsNullOrWhiteSpace(model) ? "local-model" : model;
    }

    public async Task<string> ChatAsync(string userMessage, string? systemMessage = null, CancellationToken ct = default)
    {
        var messages = new List<object>();
        if (!string.IsNullOrWhiteSpace(systemMessage))
            messages.Add(new { role = "system", content = systemMessage });
        messages.Add(new { role = "user", content = userMessage });

        var request = new { model = _model, messages, temperature = 0.7, max_tokens = 1024 };

        using var response = await _http.PostAsJsonAsync("chat/completions", request, ct);
        var raw = await response.Content.ReadAsStringAsync(ct);
        response.EnsureSuccessStatusCode();

        var result = JsonSerializer.Deserialize<ChatCompletionResponse>(raw, JsonOptions);
        return result?.Choices?[0].Message?.Content ?? "[No content]";
    }

    public void Dispose() => _http.Dispose();
}

The timeout is set to 5 minutes - local models can be slow, especially on first load. The request is the same format as the OpenAI chat completion API.

We’re using appsettings.json here so that the prompt is configurable:

{
  "LmStudio": {
    "BaseUrl": "http://localhost:1234/v1/",
    "Model": "google/gemma-3-4b"
  },
  "Prompt": {
    "SystemRoles": [
      { "use": true, "value": "You are a helpful assistant. Keep responses concise." },
      { "use": false, "value": "You are William Shakespeare. Keep responses long and confusing." }
    ],
    "Message": "Which is the greatest band of all time?"
  }
}

Note the SystemRoles array - you can define multiple system roles and toggle which one is active by setting use to true. This is useful during demos or experimentation where you want to quickly switch the model’s behaviour without editing the prompt.

Run it with LM Studio active, and you get a response from a model running entirely on your own hardware.

Getting Structured JSON Back

Next, we’re going look at shaping the response.

If you tell the model to return only valid JSON matching a specific schema, it will (usually) comply. The system role for the structured variant looks like this:

{
  "Prompt": {
    "SystemRole": "Respond with only valid JSON matching the schema the user requested. Do not use markdown code fences or any text outside the JSON object.",
    "Message": "Return information about the greatest band of all time. Return the result in the following JSON structure:\n{\n  \"BandName\": \"\",\n  \"DateFormed\": \"\",\n  \"Members\": []\n}"
  }
}

The prompt itself includes the JSON schema you want back. The model returns something like this (incorrectly - but that’s another story):

{
  "BandName": "The Beatles",
  "DateFormed": "1960",
  "Members": ["John Lennon", "Paul McCartney", "George Harrison", "Ringo Starr"]
}

So now we can now deserialise that response into a C# object and use it in our application. But let’s take this further and build something real with it.

Building an App - The Activity Tracker

This is the part I’m most interested in. The programmatic-llm-interaction-structured-app project takes the structured JSON concept and wraps it in a proper interactive application using Terminal.Gui - a console UI framework for .NET.

The idea is simple: you type a natural-language description of a physical activity (“Went for a 5k ParkRun in 28 minutes, felt great”), the LLM extracts structured data from it, and the results display in a table. It’s a practical example of using a local LLM as a natural language parser.

The System Prompt

The system role defines the exact JSON contract:

{
  "Prompt": {
    "SystemRole": "You are a structured data extraction assistant. The user will describe a physical activity in natural language. Extract the activity details and respond with ONLY valid JSON matching this exact schema:\n{\n  \"activity\": \"<type of activity e.g. running, cycling, swimming>\",\n  \"distanceKm\": <distance in kilometres as a number>,\n  \"durationMinutes\": <duration in minutes as a number>,\n  \"sentiment\": \"<one word describing how it felt e.g. easy, moderate, difficult, great>\"\n}\nDo not include markdown code fences or any text outside the JSON object. If a ParkRun is mentioned, the distance is always 5km."
  }
}

There are a couple of things to point out here. The system role is quite prescriptive - it tells the model exactly what schema to return. It also includes domain knowledge: “If a ParkRun is mentioned, the distance is always 5km.” This is a useful pattern - you can encode business rules directly in the system prompt.

The Model

On the C# side, the response maps to a simple class:

public class ActivityResult
{
    [JsonPropertyName("activity")]
    public string? Activity { get; set; }

    [JsonPropertyName("distanceKm")]
    public double DistanceKm { get; set; }

    [JsonPropertyName("durationMinutes")]
    public int DurationMinutes { get; set; }

    [JsonPropertyName("sentiment")]
    public string? Sentiment { get; set; }
}

The UI

The Terminal.Gui setup creates a window with a text field, a submit button, a status label, and a data table:

Application.Init();

var win = new Window("Activity Tracker — Structured LLM")
{
    X = 0, Y = 0,
    Width = Dim.Fill(),
    Height = Dim.Fill()
};

var inputField = new TextField("")
{
    X = 1, Y = 2,
    Width = Dim.Fill(12)
};

var submitBtn = new Button("Submit")
{
    X = Pos.Right(inputField) + 1, Y = 2
};

var dt = new DataTable();
dt.Columns.Add("#", typeof(string));
dt.Columns.Add("Activity", typeof(string));
dt.Columns.Add("Distance", typeof(string));
dt.Columns.Add("Duration", typeof(string));
dt.Columns.Add("Sentiment", typeof(string));

var tableView = new TableView(dt)
{
    X = 1, Y = 6,
    Width = Dim.Fill(1),
    Height = Dim.Fill(1),
    FullRowSelect = true
};

If you haven’t used Terminal.Gui before, it gives you proper UI widgets (text fields, buttons, tables, labels) inside a terminal window. It’s well suited for demo or tool applications where you want more structure than Console.ReadLine but don’t want to spin up a full web or desktop UI.

Handling the LLM Response

The submit handler fires off the LLM request on a background thread (so the UI doesn’t freeze) and then reads the result:

void Submit()
{
    var text = inputField.Text?.ToString();
    if (string.IsNullOrWhiteSpace(text)) return;

    statusLabel.Text = "Asking the LLM to parse your activity...";
    inputField.Enabled = false;
    submitBtn.Enabled = false;

    Task.Run(async () =>
    {
        var response = await client.ChatAsync(text, systemRole);

        var cleaned = response.Trim();
        if (cleaned.StartsWith("```"))
        {
            var firstNewline = cleaned.IndexOf('\n');
            if (firstNewline > 0) cleaned = cleaned[(firstNewline + 1)..];
            if (cleaned.EndsWith("```")) cleaned = cleaned[..^3];
            cleaned = cleaned.Trim();
        }

        var activity = JsonSerializer.Deserialize<ActivityResult>(cleaned, new JsonSerializerOptions
        {
            PropertyNameCaseInsensitive = true
        });

        Application.MainLoop.Invoke(() =>
        {
            if (activity is not null)
            {
                activityCount++;
                dt.Rows.Add(
                    activityCount.ToString(),
                    activity.Activity ?? "-",
                    $"{activity.DistanceKm} km",
                    $"{activity.DurationMinutes} mins",
                    activity.Sentiment ?? "-");
                tableView.Table = dt;
            }

            inputField.Text = "";
            inputField.Enabled = true;
            submitBtn.Enabled = true;
            inputField.SetFocus();
        });
    });
}

Caveat

Remember that the results are not deterministic - so bugs are difficult to track.

The Result

You type something like “Did a 10k run in 48 minutes, legs were heavy”, and the table populates with:

#	Activity	Distance	Duration	Sentiment
1	running	10 km	48 mins	difficult

I always thought I had heavy legs - it’s where all my weight comes from!

You can keep adding activities, and the table builds up. The model handles variations in phrasing - “ParkRun” gets correctly interpreted as 5km, “half marathon” as 21.1km, and sentiment extraction picks up on words like “great”, “tough”, “easy” even when they’re buried in a longer sentence.

What Makes This Pattern Useful

The key thing here isn’t the activity tracker itself (which is kind of pointless in a world where Strava exists) - it’s the pattern. You have a local LLM acting as a natural language to structured data pipeline. The same approach works for:

Parsing expense reports from free-text descriptions
Extracting entities from customer feedback
Classifying support tickets
Converting natural language queries into API parameters

The system prompt defines the contract, the model does the extraction, and your C# code gets a strongly typed object to work with. Because it’s all running locally, you can process sensitive data without it leaving your network.

Running It

The project targets .NET 10. There are no heavy dependencies beyond Terminal.Gui and Microsoft.Extensions.Configuration.

Prerequisites:

Install LM Studio and download a model (e.g. google/gemma-3-4b)
Start the LM Studio local server (default: http://localhost:1234)
Install the .NET 10 SDK
Run:

.\utils\run-demo-programmatic-llm-interaction-structured-app.ps1

Beyond the App

The offline-ai project also includes simpler console variants (if you just want the structured JSON without the UI), and MCP servers that take the concept further - using the LLM to classify user intent and route to tools like file creation. The MCP servers can plug directly into Cursor or VS Code. But that’s probably a topic for another post.

References

LM Studio

Terminal.Gui

OpenAI Chat Completions API

Using the Open AI API

offline-ai on GitHub