Using locally powered LLM in your favorite Jetbrains IDEs

Sunday, November 26, 2023

3 min read

Canberra, Australia

I've recently found a good setup for using locally powered LLMs in the IDEs I use every day. Specifically, I almost exclusively use Jetbrains IDEs like Rider, WebStorm, PyCharm, etc. After building a large chunk of my own LLM integration for these IDEs, I have finally found a setup that I wanted via the Open Source plugin called CodeGPT (opens new window).

In this guide, I'll walk you through the steps on how I integrated a locally running Large Language Model (LLM), like the fine-tuned Mistral-7B, into your JetBrains IDE setup. This approach offers the benefits of a powerful AI assistant right in your development environment.

This software can be very fiddly to setup, and is subject to a lot of environment specific issues. I highly recommend that if you are going to use text-generation-web-ui, you understand that this is rapidly changing project, so can be quite unstable.

# Setting Up the Text Generation Web UI

The first step is to set up the text-generation-web-ui, a handy tool for running your LLM. Here’s how:

Installation: Head over to the text-generation-web-ui GitHub page (opens new window) and follow the instructions to install it. Remember not to use the --api option when starting it.
Running Locally: Docker is the simplest method for Linux users, but there are also shell and batch scripts for local development machines.
Configuration:
- The default API port is 5000, as shown in the example .env file (opens new window). Feel free to change this in your .env file if necessary. For instance, I use port 5010.
- Load your desired model, like Mistral-7B-OpenOrca (opens new window), to avoid common errors such as "No tokenizer is loaded."
Session Setup: Under Session, choose the "openai" option, apply the changes, and restart.

# Troubleshooting

I ran into several issues trying to get text-generation-web-ui to work with the OpenAI option selection, below are a few of the issues I hit and the steps I took to resolve.

Missing sentence_transformers and causes it to fail to restart with OpenAI support.
- Update the requirements.txt and use docker compose up -d --build to rebuild and restart your container
Missing tiktoken, this can still come up looking like an issue with sentence_transformers
- Update the requirements.txt and use docker compose up -d --build to rebuild and restart your container
Missing sse_starlette
- Update the requirements.txt and use docker compose up -d --build to rebuild and restart your container

Be sure to update the requirements.txt rather than use docker exec -it {containerID} /bin/bash to install using pip install x as I found this just didn't work.

The final requirements additions I had were adding the below list of python packages (minus all the specific git installs) at the top of my requirements.txt:

accelerate==0.24.*
colorama
datasets
einops
exllamav2==0.0.7; platform_system != "Darwin" and platform_machine != "x86_64"
gradio==3.50.*
markdown
numpy==1.24.*
optimum==1.13.1
pandas
peft==0.5.*
Pillow>=9.5.0
pyyaml
requests
safetensors==0.4.0
scipy
sentencepiece
sentence_transformers
sse_starlette
tiktoken
tensorboard
transformers==4.35.*
tqdm
wandb

If you are running a Mac or doing your development on Linux, CodeGPT can manage your Llama based models itself and you don't need the above setup.

# Integrating CodeGPT with JetBrains

The CodeGPT extension enhances your JetBrains IDEs with AI capabilities:

# Installation

Download the CodeGPT (opens new window) from the JetBrains Marketplace.

# Configuration

CodeGPT supports OpenAI but allows for local service configuration. This feature is especially useful if you’re running models on a separate machine.

You'll want to override the Base host and provide a dummy API key. Neither the model selected or the API are used when using text-generation-webui, since that controls which model is loaded, and by default doesn't have any authentication required.

# For Linux and Mac Users

On Linux or Mac, CodeGPT can manage and install Llama-based models via Llama.Cpp, which might eliminate the need for text-generation-webui.

This setup is ideal for Mac users with M1, M2, or M3 CPUs and at least 24GB of RAM since even a 7B model quantized will take up ~4GB of memory.

# Model Compatibility

Note that Mistral and similar models are not Llama-based and might not work with this setup. However, smaller Llama models are still effective coding assistants.

#AI #LLM #Coding