How to run LLMs on Your Local Machine: A Hands-On Guide

LLM In the rapidly evolving world of artificial intelligence, Large Language Models (LLMs) have emerged as powerful tools for a wide range of applications, from natural language processing to creative content generation. While these models are often associated with large-scale cloud computing resources, you might be surprised to learn that you can experiment with LLMs right on your local machine.

This hands-on guide will walk you through the process, providing you with the knowledge and tools needed to harness the power of LLMs locally. Whether you’re a seasoned AI practitioner or a curious enthusiast, this guide will empower you to explore, innovate, and transform your projects with the capabilities of LLMs.

1. First, let’s take a look at some advantages and limitations on running LLMs locally:

Advantages of running LLMs locally:

  • Cost Savings: You can significantly reduce costs associated with cloud computing services.
  • Privacy and Security: All data is kept on your own computer, and you have more control of sensitive information. This reduce the risk of data breaches.
  • Offline Access: You do not rely on an internet connection and can work even in offline or low-connectivity situations

Limitations of running LLMs locally:

  • Hardware Requirements: The size and complexity of the models you can run will be limited by your hardware. More complex LLMs require high-end GPUs and substantial RAM.
  • Maintenance and Updates: Keeping your local environment up-to-date with the latest software, libraries, and model versions requires ongoing maintenance and technical know-how.
  • Scalability: Local setups are generally less scalable than cloud-based solutions. If you need to scale up your experiments or handle large-scale data processing, local resources might quickly become insufficient.

2. Tools needed to run LLMs locally

This spring, I’m attending a part-time course on Generative Artificial Intelligence at Østfold University College. As part of this course, I have been using LM Studio and I’m excited to share my experiences with you in this guide. LM Studio is a powerful application tailored for developing and experimenting with Large Language Models (LLMs) right on your local machine. With a built in chat interface, API integration, and cross-platform support, LM Studio makes it easy to start experimenting with LLMs. You don’t need an internet connection to run the application, as everything is executed locally, ensuring your data stays private and secure.

LM Studio is free for personal use and can be downloaded at https://lmstudio.ai/ for Windows, Mac and Linux.

3. Running LLMs in LM Studio

After installing LM Studio, the first thing to do is to download the model you want to work with. For this guide I use Phi 3.1 Mini 128k that should work with most modern laptops. For more complex models, your machine should have a powerful GPU.

LM Studio Model Search

When the model is downloaded, start a new chat in LM Studio and load the model you downloaded.

LM Studio

Model selection: load model

You can choose to adjust the context length, GPU Offload and so on: adjustments

You are now ready to interact with the model. Here is an example asking “What is the capital of France”:

chat assistant

In the chat window, you can also switch to the role of the Assistance, and tune how your questions are answered. For instance, you can make the assistant answer all your questions in rhymes: role assignment

Chatting with the assistant: answer with role

4. Integrate and Interact with LM Studio via API

In addition to the built-in chat interface of LM Studio, you can also interact through the LM Studio API. This allows you to develop and test your own applications utilizing LLMs without the need to connect to or pay for cloud services.

You can turn on the HTTP server by switching to Developer mode:

Developer mode

Supported endpoints are:

  • /v1/models
  • /v1/chat/completions
  • /v1/completions
  • /v1/embeddings

Here is an example of interacting with the API using Postman:

http://localhost:1234/v1/chat/completions
Request body:
{
  "model": "phi-3.1-mini-128k-instruct",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of Norway?"}
  ],
  "temperature": 0.7,
  "max_tokens": 50
}
API Response:

{
    "id": "chatcmpl-m77bwk6zt4cdpc1nnni3c",
    "object": "chat.completion",
    "created": 1739974759,
    "model": "phi-3.1-mini-128k-instruct",
    "choices": [
        {
            "index": 0,
            "logprobs": null,
            "finish_reason": "length",
            "message": {
                "role": "assistant",
                "content": "The capital of Norway is Oslo. It's situated at the head of the Oslo Fjord and serves as both an economic and governmental center for the country. Historically, it was named Christiania until 187"
            }
        }
    ],
    "usage": {
        "prompt_tokens": 28,
        "completion_tokens": 49,
        "total_tokens": 77
    },
    "system_fingerprint": "phi-3.1-mini-128k-instruct"
}

postman

4.1 Run LM Studio without a GUI

It is also possible to run LM Studio without the GUI. Use cases for this is when you want to run LM Studio on a server or as a service on your computer.

Documentation on how this can be achieved is available on the LM Studio docs: https://lmstudio.ai/docs/api/headless

AI  LLM 

See also