A simple terminal chatbot using Ollama

Posted on Sun 21 September 2025 in ai, langchain

I'm a fan of using local resources and being in control of your resources as much as possible, so naturally when looking at AI (and llm's in particular) I've gravitated towards use of things and services like Ollama and ExoLabs, so thought it would be a good idea to write up a simple way to wire up a simple program which use's the gpt-oss:20B model as a simple terminal chatbot. While you can start ollama directly and chat with it, this is the start of setting it up as a service which can be consumed by other applications and we have the ability to give the instance a prompt template to customise how we want our llm service to behave.

We'll start by creating our virtual env, I use virtualenv wrapper on my mac so it's a simple

mkvirtualenv llmexp

Use this to activate your virtualenv

workon llmexp

Then we'll want to use pip to get the two langchain packages we want,

 pip install langchain langchain-ollama

From here, our next step is creating a python file, since it's a simple service, we'll just go with main.py - we can wrap it in a service behind a flask api in a later article. We'll start with adding our imports...

from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
from langchain_ollama import OllamaLLM
from langchain.prompts import PromptTemplate

This gives us a memory buffer for our chatbot, a prompt template and the OllamaLLM libaries themselves to hook up.

We'll then get our llm instance, do our template and prompt config (for a more complex example, should stick this in some config files outside the main code.)

llm = OllamaLLM(model="gpt-oss:20B")
template = """You are a helpful assistant, specialising as a programming assistant."""
prompt = PromptTemplate.from_template(template)

We'll then setup our ConversationChain object, which is what we're using really - it's a langchain service designed to make chatbots quite simple to use, it's still fundamentally the same idea as using any llm, a prompt goes to the llm and gives you a response, but it's setup to accept a memory buffer as well.

memory = ConversationBufferMemory()
conversation = ConversationChain(llm=llm, memory=memory)
conversation.prompt = prompt

You can see from the above, that we pass our llm (we're using gpt-oss:20B) and our memory object into the ConversationChain then setting a prompt for it to use.

From this point, we're setup for our simplest possible chatbot with memory in langchain, so lets get a simple loop to keep it running and accept user input to pass to it.

print("Chat starting!")

while True:
    user_input = input("You: ")
    if user_input.lower() == "quit":
        break
    response = conversation.invoke(user_input)
    print("Chatbot:", response["response"])

Using this, we'll have a simple loop on the terminal where it's clear that you're talking to an llm. You can ask it any question you'd like however the bot has a template prompt which will sway it's behaviour towards being a helpful assistant with programming tasks!

LLM output screenshot

For my next article, I'll write up about putting our llm behind a service which can be used by either web front end or integrated and plugged into another service.