Build a Chat App with Ollama, LangChain4J & Spring Boot

^{Published on: 2025-04-01}

In a previous article, we explored how to run LLMs locally. Now, let’s do something fun with them! This time, we’ll build our own chat app using Ollama, LangChain4J, and Spring Boot.

What to expect

Here’s what we’ll be building: a relatively simple Spring Boot application with a user interface connected via WebSockets to a chat service. The chat service will stream the LLM’s response to you – meaning you’ll see the response in chunks, instead of waiting for the entire thing to finish. Plus, we’ll keep some context in memory, so the LLM can ‘remember’ what we’ve already talked about.

Dependencies

Ollama

Download Ollama from https://ollama.com/ and install it. After installation run the following command to install the necessary LLM:

ollama pull llama3.1:8b

This will install llama3.1 which supports 8 billion parameters. After installation run the following command and make sure llama3.1:8b is present in the list.

ollama list

Spring Boot

The easiest way to kick things off is using Spring Boot’s Initializr website. You can pick and choose the dependencies you need and download a basic application structure. This project profile should get you up and running pretty quickly. Alternatively, if you just want to grab everything at once, you can find the project on my GitHub: LLM Chat Example.

If you’ll prefer not to use the Initializr website, you’ll need these Spring dependencies:

Spring Web
Spring WebSocket

LangChain4J

To get everything running, you’ll also need these LangChain4J dependencies. There’s even a Spring Starter dependency for LangChain4J that can automatically create chat models for you, which you could then inject into your classes. But for this tutorial, we’re going to set everything up ourselves, so we won’t need that Spring Starter dependency.

<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j</artifactId>
    <version>1.0.0-beta2</version>
</dependency>
<dependency>
    <groupId>dev.langchain4j</groupId>
    <artifactId>langchain4j-ollama</artifactId>
    <version>1.0.0-beta2</version>
</dependency>

The index.html file

This index.html file contains the chat UI and the client-side WebSocket code to communicate with the backend. In the line where you create a new WebSocket, be sure to adjust your endpoint if you decide to change it. Place this file in src/main/java/resources/static. Once you run your Spring Boot app, it should be the first page you see.

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>LLMChat</title>
  <style>
    body {
      font-family: Arial;
      font-size: 14px;
    }

    label { display: block; }

    textarea#history {
      width: 500px;
      height: 350px;
    }

    textarea#message {
      width: 500px;
      height: 150px;
    }

    .message-system {
      margin-left: 20px;
      background-color: #c0c0c0;
      padding: 5px;
      border-radius: 5px;
    }
  </style>
</head>
<body>
  <h1>LLMChat</h1>
  <main>
    <div class="chat">
      <div class="message-wrapper">
        <div id="history">
        </div>
        <div>
          <label for="message">Message:</label>
          <textarea id="message"></textarea>
        </div>
        <input type="button" id="send" value="Send">
      </div>
    </div>
  </main>
  <script>
    (function() {
      let socket;
      let historyElem;
      let messageText;
      let sendButton;
      let currentSystemMessageElem;

      function addUserMessage(message) {
          const elem = document.createElement('p')
          elem.classList.add('message-user');
          elem.innerText = message;
          historyElem.appendChild(elem);
          return elem;
      }

      function appendSystemMessage(message) {
          if (!currentSystemMessageElem) {
              currentSystemMessageElem = document.createElement('p');
              currentSystemMessageElem.classList.add('message-system');
              historyElem.appendChild(currentSystemMessageElem);
          }

          currentSystemMessageElem.innerText += message;
      }

      async function send() {
          disableChat(true);
          const message = messageText.value;
          addUserMessage(message);
          messageText.value = '';

          socket.send(message);
      }

      function disableChat(value) {
          messageText.disabled = value;
          sendButton.disabled = value;
      }

      function initElems() {
          historyElem = document.querySelector('#history');
          messageText = document.querySelector('#message');
          sendButton = document.querySelector('#send');
          sendButton.addEventListener('click', e => send());
      }

      function initWs() {
          // TODO Change the endpoint if necessary.
          socket = new WebSocket('ws://localhost:8081/ws/chat');

          socket.onmessage = (event) => {
              const message = event.data;

              if (message === '~done~' || message === '~error~') {
                  if (message === '~error~') {
                      appendSystemMessage('Something went wrong, please try again.');
                  }

                  currentSystemMessageElem = undefined;
                  disableChat(false);
                  return;
              }

              appendSystemMessage(message);
          };
      }

      function init() {
          initElems();
          initWs();
      }

      document.addEventListener('DOMContentLoaded', () => {
          init();
      });
    }(window, document));
  </script>
</body>
</html>

Using a bit of JavaScript, we create a WebSocket connection to the backend and render the streamed response to a new element. The initWs method initializes the WebSocket and subscribes to the onmessage event. This event fires whenever data is received through the WebSocket. If the received data isn’t ~done~ or ~error~, it’s displayed in a new

element as the LLM’s answer appears. The special text ~done~ signals that the LLM has finished streaming, and ~error~ obviously indicates that something went wrong.

Please note that this is a pretty basic chat UI. It includes some of the core elements of a chat application. Here’s what it should look like:

Handling web sockets

Now that the UI is ready, let’s move on to configuring the WebSocket endpoint. We also need to implement a WebSocket handler, which will be responsible for receiving messages from the client and sending them back.

WsConfig Class

In this class, we configure the endpoint for our WebSocket and specify which WebSocket handler should be used. Here’s what the class looks like:

@Configuration
@EnableWebSocket
public class WsConfig implements WebSocketConfigurer {
  private final WsChatHandler wsChatHandler;

  public WsConfig(WsChatHandler wsChatHandler) {
    this.wsChatHandler = wsChatHandler;
  }

  @Override
  public void registerWebSocketHandlers(WebSocketHandlerRegistry registry) {
    registry.addHandler(wsChatHandler, "/ws/chat").setAllowedOrigins("*");
  }
}

As you can see, we’re attaching the WebSocket handler to the /ws/chat endpoint, meaning the absolute URL will be http://localhost:8080/ws/chat. We’re also specifying with setAllowedOrigins("*") that the endpoint can be accessed from any domain. Please, never do this in production.

You’ll probably get some errors indicating that WsChatHandler doesn’t exist. That’s what we’ll add next.

WsChatHandler class

@Component
public class WsChatHandler extends TextWebSocketHandler {
  private static final Logger LOG = LoggerFactory.getLogger(WsChatHandler.class);
  private final ChatService chatService;

  public WsChatHandler(ChatService chatService) {
    this.chatService = chatService;
  }

  @Override
  protected void handleTextMessage(WebSocketSession session, TextMessage message) throws Exception {
    chatService.sendMessage(message.getPayload(), new WsClientChatResponse(session));
  }

  static class WsClientChatResponse implements IClientChatResponse {
    private final WebSocketSession session;

    public WsClientChatResponse(WebSocketSession session) {
      this.session = session;
    }

    @Override
    public void send(String message) {
      try {
        session.sendMessage(new TextMessage(message));
      } catch (IOException e) {
        LOG.error("Something went wrong while sending the message.", e);
      }
    }
  }
}

Inside the handleTextMessage method, you’ll see that we’ll eventually be calling the chat service (which we’re about to implement) and passing through the message we received from the client. You’ll also notice the WsClientChatResponse inner class. This is a simple adapter class that helps us avoid tight coupling between our WebSocket classes and the ChatService and its underlying code. If you ever decide to use something other than WebSockets, you can just implement a new adapter, and that’s all it takes.

IClientChatResponse interface

This is the adapter. Nothing fancy. See my previous article Adapter Pattern in Java: Quick Example for more information on the adapter pattern.

public interface IClientChatResponse {
  void send(String message);
}

The chat service

In the chat service, we could perform some additional processing before sending the message to the LLM. In our case, we’re simply passing it to the chat repository and handling any exceptions that might be thrown by the repository. Currently, we’re not handling the exception, we’re just re-throwing it.

The chat service.

@Service
public class ChatService {
  public final IChatRepo chatRepo;

  public ChatService(IChatRepo chatRepo) {
    this.chatRepo = chatRepo;
  }

  public void sendMessage(String message, IClientChatResponse chatResponse) {
    try {
      chatRepo.chat(message, chatResponse);
    } catch (RepoException e) {
      throw new ServiceException("An error occurred while processing the message.", e);
    }
  }
}

The re-thrown exception.

public class ServiceException extends RuntimeException {
  public ServiceException(String message, Throwable cause) {
    super(message, cause);
  }
}

The chat repository

The chat repository eventually sends the message to Ollama and retrieves the response. In our case, we’re going to stream the response, which means we’re sending the data to the client as we receive it from Ollama.

First, let’s create the interface.

public interface IChatRepo {
  void chat(String message, IClientChatResponse chatResponse) throws RepoException;
}

Then the implementation.

@Repository
public class OllamaStreamingChatRepo implements IChatRepo {
  private final StreamingChatLanguageModel model;
  private final MessageWindowChatMemory memory;

  public OllamaStreamingChatRepo() {
    model = OllamaStreamingChatModel.builder()
      .baseUrl("http://localhost:11434")
      .modelName("llama3.1:8b")
      .temperature(0.2)
      .logRequests(true)
      .logResponses(true)
      .build();
    memory = MessageWindowChatMemory.withMaxMessages(20);
    memory.add(SystemMessage.from(
    """
    You are a helpful assistant. You take any questions and answer them to the best of your knowledge. Your answers
    should be around 50 words.
    """
    ));
  }

  @Override
  public void chat(String message, IClientChatResponse chatResponse) throws RepoException {
    memory.add(UserMessage.from(message));

    model.chat(
      memory.messages(),
      new ChatResponseHandler(chatResponse, memory)
    );
  }

  static class ChatResponseHandler implements StreamingChatResponseHandler {
    private final IClientChatResponse clientChatResponse;
    private final ChatMemory memory;

    public ChatResponseHandler(IClientChatResponse clientChatResponse, ChatMemory memory) {
      this.clientChatResponse = clientChatResponse;
      this.memory = memory;
    }

    @Override
    public void onPartialResponse(String s) {
      clientChatResponse.send(s);
    }

    @Override
    public void onCompleteResponse(ChatResponse chatResponse) {
      memory.add(chatResponse.aiMessage());
      this.clientChatResponse.send("~done~");
    }

    @Override
    public void onError(Throwable throwable) {
      throwable.printStackTrace();
      this.clientChatResponse.send("~error~");
    }
  }
}

To stream the response, we’ll use the StreamChatLanguageModel class to communicate with Ollama. We set this up in the constructor, specifying Ollama’s address, the model to use, and the model’s temperature. The temperature parameter controls how creative the model is. A value of 0.0 means it’s not very creative, while 1.0 means it’s quite imaginative.

To help the LLM ‘remember’ what we’ve said (the context), we also need to provide a memory for our chat. We create this memory using MessageWindowChatMemory. This memory holds up to 20 messages, clearing the oldest ones first (like a First In, First Out queue). The chat memory holds three types of messages: SystemMessage, UserMessage, and an LLM response (ChatMessage). The LLM needs all three to have the full context of the conversation.

The SystemMessage essentially tells the LLM how to behave, it’s the personality of the chat, if you will. This SystemMessage is never cleared from the chat memory and is the first message we add when setting up the memory in the constructor. In the OllamaStreamingChatRepo.chat method, we create user messages and add them to the chat memory. A UserMessage is simply what you communicate to the LLM. Eventually, we call the model.chat method, feeding it the memory we’ve created. The LLM will start responding almost immediately.

To handle the streamed responses and send them back to the client, we need a chat response handler. We’re extending the StreamingResponseHandler for this. In the onPartialResponse method, we call the IClientChatResponse adapter (which we created earlier) to send the messages we’re receiving from the LLM back to the client. In the onCompleteResponse method, we add the complete LLM response (ChatMessage) to the chat memory, expanding the context with the latest response, and send a message to the client indicating that the LLM is finished. In the onError method, we have some basic error handling, we just let the client know something went wrong.

That’s really the core of it. Now, run the Spring Boot application and see if it all works! If not, retrace these steps to figure out what might have gone wrong.

Conclusion and final thoughts

As we’ve seen, building a simple chat using Ollama, LangChain4J, and Spring Boot is relatively straightforward. However, it’s important to remember that this is a basic implementation. To keep things simple, we’re currently storing the chat memory directly in the repository, and it’s only instantiated once when Spring Boot starts. This means the memory is shared among all users. If you’re planning to support multiple users, you’ll probably want to move the memory to a separate object that you can associate with each WebSocket session.

The MessageWindowChatMemory is great for prototyping, but it’s not ideal for real-world use. It keeps everything in memory, so eventually, you’ll want to persist that memory somewhere so you can access those chats later. Check out the official docs for more info. Also, MessageWindowChatMemory focuses on the number of messages processed. If you want to focus on the number of tokens processed instead, it can’t do that. You’ll need TokenWindowChatMemory for that. TokenWindowChatMemory requires a tokenizer to count tokens. Tokenizers can vary depending on the LLM, so you’ll need to find one that works with your model.

ConstByte

Menu

Tags