Run Llama 2 And Other Open-Source LLM In Python Locally
Install Llama 2
Install Llama 2 With Official Guide (Not Recommended)
The official guide for installing Llama 2 is complicated. You need to request access on Meta’s website, and then run a shell script to download the model. When you try to run the model, you may get the following error:
UserWarning: Attempted to get default timeout for nccl backend, but NCCL support is not compiled
warnings.warn(
"Attempted to get default timeout for nccl backend, but NCCL support is not compiled")
It seems the solution is to go to Nvidia’s page, manually download NCCL library, and follow the documents to install it. I was stuck at this part.
Install Llama 2 With Ollama
Ollama is a tool that allows you run open-source LLM (large language models) locally. Ollama can save your days to instal and manage LLM.
Step 1. Download Ollma and install
Step 2. Pull Llama 2
After installing Ollama, you can pull the Llama 2 model using the following command. Remeber to replace the model version as needed.
ollama pull llama2: 13b-chat
Step 3. Run Llama 2
When everything is set up, just run the following command to start the Llama 2 model in the terminal.
ollama run llama2
Run Llama 2 In Python
It is super easy, right? The method to run Llama 2 in Python is also simple. The only thing you need to do is to install Ollama using pip.
pip install ollama
Then you can test Llama 2 in Python with this example
import ollama
response = ollama.chat(model='llama2', messages=[
'role': 'user',
'content': 'Why is the sky blue?',
,
])
print(response['message']['content'])