A simple chatbot using Ollama running on a Flask API

Posted on Sun 21 September 2025 in ai, langchain, python, api

Following on from my last article, where we defined the simplest possible chatbot accessed via the terminal and using langchain to talk to the model, the natural next step is to think about how to build an API around the langchain code and expose it for a nice and user friendly front end or another integrating system to connect to the llm. Given we're running in python along with it's great library and ecosystem support, it makes a lot of sense to do the API in a python library, normally I lean slightly more towards Django Rest Framework (drf) but it's a 'batteries included' type of framework so lets go with a simple flask api which lets us expose the llm service quickly.

Lets think about the architecture of what our API app will be, something like the below is probably a good layout

 ┣ app
 ┃  ┣ services
 ┃  ┃ ┣ llm_service.py
 ┃  ┣ __init__.py
 ┃  ┣ llmroutes.py
 ┃  ┣ inMemoryHistory.py
 ┃  ┗ models.py
 ┣ config.py
 ┗ run.py

Lets start with our init.py for our app, we'll create it inside the app folder and define the basics. The url_prefix here, is sort of doing a similar job to defining your urls.py in a django setup, the blueprint stores the mapping of the actual endpoint, this would be the equivalent of the global urls.py that gets each django's app's wiring imported.

from flask import Flask
from .llmroutes import llm_bp
from config import Config

def create_app():
    app = Flask(__name__)

    # Initialize app
    db.init_app(app)

    # Register blueprints
    app.register_blueprint(llm_bp, url_prefix="/llm")

    return app

So we'll call this from the run.py file that sits just outside our app folder,

from app import create_app

app = create_app()

if __name__ == "__main__":
    app.run(host='0.0.0.0', port=8082, debug=True)

Really simple but a nice little touch here is setting the host to 0.0.0.0, it allows other machines on our local network to connect to the machine thats running the app, really good for testing. Next step is to setup the routes, routes are analogous to controllers in an ASP.Net core project or an APIView in Django rest framework. We declare our blueprint and set it up with the url routes and whatever verb action we want to allow against each function. In this case, given it's an API, we want to at least have some semblance of pretending to have a user system in place, so what we'll do is take the user_id string in and create a seperate conversation chain for each user. It won't have a full login system but it's more than good enough for a proof of concept.

from flask import Blueprint, request, jsonify

from app.services.llm_service import LLMService

llm_bp = Blueprint("llm", __name__)
llm_service = LLMService()

@llm_bp.route("/chat", methods=["POST"])
def chat():
    data = request.get_json()
    user_id = data.get("user_id")
    message = data.get("message", "")

    # Do some basic validation on our input - do we have it?
    if not user_id:
        return jsonify({"error": "User ID is required"}), 400
    if not message:
        return jsonify({"error": "Message is required"}), 400

    response = llm_service.chat(user_id, message)

    return jsonify({"response": response, "user_id": user_id})

The next step is to look at our LLMService class that you can see I've used, this is really just the modernisation of the ConversationChain class from the last article into it's own wrapper service and using the newer RunnableWithMessageHistory class instead as ConversationChain has been deprecated. We'll start by doing our imports and our init method (similar idea as a constructor for those of you more familiar with c#)

from langchain_ollama import OllamaLLM

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from .InMemoryHistory import InMemoryHistory
from langchain_core.chat_history import BaseChatMessageHistory
from app.services.memory_service import DatabaseConversationMemory

class LLMService:
    def __init__(self):
        self.llm = OllamaLLM(model="gpt-oss:20B")
        self.prompt = ChatPromptTemplate.from_messages([
        ("system", "You are a helpful assistant, specialising as a programming assistant."),
        MessagesPlaceholder(variable_name="history"),
        ("human", "{input}"),
        ])

        self.chain = self.prompt | OllamaLLM(model="gpt-oss:20B")
        # Remove in-memory storage - we'll create chains on-demand with persistent memory
        self.store = {}

        # Create a RunnableWithMessageHistory instance
        self.runnable_with_history = RunnableWithMessageHistory(
            self.chain,
            self.get_by_session_id,
            input_messages_key="input",
            history_messages_key="history",
        )

As you can see above, we've defined our template pulled in all the imports we need and defined a chain to pass into the RunnableWithMessageHistory class. There is a step which I'll go over after this function, we need to define a small in memory class to hold the history, however for now, you can see we're using it to get a session_id and mapping that to user_id coming from the api's route. The actual usage is very similar to the ConversationChain class, call an invoke method on the class and pass back to the response.

    def get_by_session_id(self, session_id: str) -> BaseChatMessageHistory:
        if session_id not in self.store:
            self.store[session_id] = InMemoryHistory()
        return self.store[session_id]

    def chat(self, user_id: str, message: str) -> str:
        """Process a chat message with persistent memory."""
        response = self.runnable_with_history.invoke(
            {"input": message},
            config={"configurable": {"session_id": user_id}}
        )
        return response

This was an example class to use to store history in memory, for a production level system, this would be built out properly to store in some persistent data storage.

from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.messages import BaseMessage
from pydantic import BaseModel, Field

class InMemoryHistory(BaseChatMessageHistory, BaseModel):
    """In memory implementation of chat message history."""

    messages: list[BaseMessage] = Field(default_factory=list)

    def add_messages(self, messages: list[BaseMessage]) -> None:
        """Add a list of messages to the store"""
        self.messages.extend(messages)

    def clear(self) -> None:
        self.messages = []

At this point, there's nothing to do except activate your virtual environment, set your working directory in your terminal to wherever you've stored the run.py and execute like...

python run.py

That should be your API up and running and ready to act as an API chatbot!

You can call it using curl or PostMan like this

curl -X POST http://localhost:8083/llm/chat \
     -H "Content-Type: application/json" \
     -d '{"user_id": "alasdair", "message": "Hi, c# or python - choose! "}'

The response I received was...

  "response": "Sure thing! \ud83c\udfaf  \nWhich one is \u201cbest\u201d really depends on **what you want to build** and a few other factors. Here\u2019s a quick rundown so you can decide:\n\n| Factor | C# | Python |\n|--------|----|--------|\n| **Primary domain** | .NET ecosystem (Windows desktop, enterprise apps, Unity 3D, Xamarin/MAUI mobile, Azure services, high\u2011performance back\u2011ends) | Web (Flask, Django, FastAPI), data science/ML (NumPy, pandas, scikit\u2011learn, TensorFlow), scripting, automation, and a huge amount of general\u2011purpose libraries |\n| **Performance** | Generally faster (compiled, JIT, strong typing) | Slower (interpreted), but often fast enough for I/O\u2011bound or prototyping workloads |\n| **Learning curve** | Moderate \u2013 you\u2019ll need to understand the CLR, LINQ, async/await, etc. | Low \u2013 dynamic typing, readable syntax, huge amount of learning resources |\n| **Tooling** | Excellent IDE (Visual Studio, Rider), 
  great debugging, built\u2011in support for dependency injection, strong type checking | Good tooling (PyCharm, VS\u202fCode, Jupyter notebooks), but debugging can feel less powerful for large projects |\n| **Cross\u2011platform** | .NET 6/7+ is truly cross\u2011platform (Windows, macOS, Linux). | Native cross\u2011platform, 
  but binary distribution can be trickier |\n| **Community & Ecosystem** | Large corporate support, extensive library catalog, NuGet, strong game\u2011dev community via Unity | Massive open\u2011source community, Python Package Index (PyPI), huge scientific & ML ecosystem |\n| **Future prospects** | Growing with .NET Core/5+ and Azure cloud services | Dominant in AI/ML, data science, and remains popular for scripting and web apps |\n| **Typical Use Cases** | Windows desktop apps, enterprise web services, game development, mobile (Xamarin/MAUI) | Web APIs, data pipelines, ML/AI, automation, rapid prototyping |\n\n### Quick decision checklist\n\n| Question | Likely C# | Likely Python |\n|----------|-----------|---------------|\n| You\u2019re building a Windows\u2011only desktop app or a game in Unity? | \u2705 | \u274c |\n| You\u2019re targeting Azure, need strong typing, or want to use a statically typed language? | \u2705 | \u274c |\n| 
  You need to spin up a data\u2011science pipeline or prototype an ML model quickly? | \u274c | \u2705 |\n| You\u2019re doing web scraping, automation scripts, or a small web API? | \u274c | \u2705 |\n| 
  You prefer a language that compiles to a single binary and offers very fast execution? | \u2705 | \u274c |\n| You\u2019re comfortable with dynamic typing and want a gentle learning curve? | \u274c | \u2705 |\n\n---\n\n## Bottom line\n\n- **Go with C#** if you\u2019re targeting the .NET ecosystem (Windows, Azure, Unity, MAUI) or need strong typing & performance.\n- **Go with Python** if you\u2019re in data science, want quick prototyping, or want a language with a huge collection of scientific/ML libraries.\n\nIf you can share a bit more about what you\u2019re planning (web app, game, automation, data pipeline, etc.), I can fine\u2011tune the recommendation!",
  "user_id": "alasdair"
}

Pretty thorough answer!