Run Llama 2 And Other Open-Source LLM In Python Locally

2 min readApr 13, 2024

Install Llama 2

Install Llama 2 With Official Guide (Not Recommended)

The official guide for installing Llama 2 is complicated. You need to request access on Meta’s website, and then run a shell script to download the model. When you try to run the model, you may get the following error:

UserWarning: Attempted to get default timeout for nccl backend, but NCCL support is not compiled
warnings.warn(
    "Attempted to get default timeout for nccl backend, but NCCL support is not compiled")

It seems the solution is to go to Nvidia’s page, manually download NCCL library, and follow the documents to install it. I was stuck at this part.

Install Llama 2 With Ollama

Ollama is a tool that allows you run open-source LLM (large language models) locally. Ollama can save your days to instal and manage LLM.

Step 1. Download Ollma and install

Step 2. Pull Llama 2

After installing Ollama, you can pull the Llama 2 model using the following command. Remeber to replace the model version as needed.

ollama pull llama2: 13b-chat

Step 3. Run Llama 2

When everything is set up, just run the following command to start the Llama 2 model in the terminal.

ollama run llama2

Run Llama 2 In Python

It is super easy, right? The method to run Llama 2 in Python is also simple. The only thing you need to do is to install Ollama using pip.

pip install ollama

Then you can test Llama 2 in Python with this example

import ollama
response = ollama.chat(model='llama2', messages=[
  
    'role': 'user',
    'content': 'Why is the sky blue?',
  ,
])
print(response['message']['content'])