
Table of Contents
Introduction
If you’re looking to install LLaMA 2, the next generation of Meta’s open-source large language model, you’ve come to the right place. LLaMA 2 is making significant strides in the field of Artificial Intelligence (AI), revolutionizing various fields, from customer service to content creation.
This model, available for free for research and commercial use, has been trained on 2 trillion tokens and boasts double the context length of its predecessor, LLaMA 1.
Its fine-tuned models have been trained on over 1 million human annotations, making it a powerful tool for various AI applications.
This guide will walk you through the process of installing LLaMA 2 locally, providing a step-by-step approach to help you set up and start using this powerful AI model on your own machine.
If you’re interested in exploring more about AI models, you might find our posts on ChatGPT-4 Unleashed and How to Install SuperAGI useful.
Prerequisites
Before we install LLaMA 2, ensure that you have Conda installed on your system. We will be using Conda to create a new environment for our installation. We will also be using Text Generation Web UI for the interface for this model. You can follow our instructions to install conda from our GPT-Engineer guide.
Once Conda is installed, open a Conda terminal and continue following the guide.
If you have trouble copying and pasting the code, you must enable copy and paste in the properties of the terminal. Once enabled, you can copy and paste the commands from this guide into the terminal by Crtl + Shift + C/V.

Preparing To Install LLaMA 2
Step 1: Create a New Conda Environment
The first step is to create a new Conda environment. You can do this by running the following command in your terminal:
conda create -n TextGen2 python=3.10.9
This command creates a new Conda environment named TextGen2
with Python version 3.10.9. Once the command executes, you will be prompted to install the package images. Press y
to proceed.
Step 2: Activate the New Environment
After creating the new environment, you need to activate it. This can be done with the following command:
conda activate TextGen2
You will know the environment is activated when you see TextGen2
in your terminal prompt.

Step 3: Install PyTorch
Next, we need to install PyTorch. This can be done with the following command:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
This command downloads and installs PyTorch along with torchvision and torchaudio from the specified URL. The installation may take a few minutes.
Step 4: Clone the Repository
After installing PyTorch, we must clone the Text Generation Web UI repository. This can be done with the following command:
git clone https://github.com/oobabooga/text-generation-webui.git
This command clones the repository into a new folder named text-generation-webui
.
Step 5: Change Directory
Next, change the directory to the newly cloned folder with the following command:
cd text-generation-webui
Step 6: Install Python Modules
Inside the text-generation-webui
folder, install all the required Python modules with the following command:
pip install -r requirements.txt
This command installs all the Python modules listed in the requirements.txt
file. This process may also take a few minutes.
Step 7: Start the Server
Now that all the necessary modules are installed, you can start the server with the following command:
python server.py
Once the server is running, you will see a local URL in your terminal. In my case, it is http://127.0.0.1:7860

Step 8: Access the Web UI
Copy the local URL from your terminal and paste it into your web browser. You should now see the Text Generation Web UI.

Step 9: Download the Model
Next, go to the LLaMA 2 70b chat model on Hugging Face and copy the model URL. Switch back to the Text Generation Web UI, go to the Model tab, and paste the partial URL into the “Download custom model” field in my case, that is, “TheBloke/Llama-2-70B-chat-GPTQ”.
I use the 70B model, but you can use the 13B or 7B models if your GPU can’t handle that.
Click “Download” to start the download. This process will take a significant amount of time due to the large file size of the model.

Step 10: Load the Model
Once the model is downloaded, click the blue “Reload” button at the top of the Model tab. Find the downloaded model in the list and click “Load”. Make sure to use the Transformers model loader for this process. This process may also take some time.
Step 11: Configure the Session
After loading the model, switch to the Session tab and select “Chat” from the Mode dropdown menu. Click “Apply and Restart” to apply the changes.
Step 12: Configure Parameters
Switch to the Parameters tab and max out the “New Tokens” field. Set the “Temperature” field to 0. You can adjust these settings later based on your needs.
Step 13: Test the Model
Finally, switch back to the Text Generation tab and test the model by typing something into the input field and clicking “Generate”.
And there you have it! You have successfully installed LLaMA 2 locally. Enjoy exploring the capabilities of this powerful AI model.
Conclusion
Congratulations! You have successfully installed LLaMA 2 locally. With this powerful AI model at your disposal, you can now explore various applications, from text generation to AI research. Remember, the power of AI lies not just in its capabilities but also in how we use it.
So, use LLaMA 2 responsibly, explore its capabilities, and let it assist you in your AI journey.
Remember, this guide is a comprehensive walkthrough to help you start with LLaMA 2. If you encounter any issues during installation, please seek my help by commenting below or contacting me on my social media.
If you liked this guide, check out our latest guide on Code Llama, a fine-tuned Llama 2 coding model. It is a close competitor to OpenAI’s GPT-4 coding capabilities.
For more insights into AI and related technologies, check out our posts on Tortoise Text-to-Speech and OpenAI ChatGPT Guide.
Really great walk through, easy to follow and work from. Just had a quick question, is there any easy way to benchmark tokens per minute for a given model and respective parameters?
Thanks for the comment Aidan. I don’t know of an easy way to do this, there may be a tool to test it.
i cant see the option chat on session
Once the model has downloaded, press the refresh button and then press the load button. Once the load button has been clicked it can take some time and it will show model loaded in the bottom right. Now the chat option should be available in the session tab.
I’m installing under Win11. Your install guide is a big help.Steps OK until I attempt to load the model. The loader is unable to find some path : File “C:\Users\timho\miniconda3\envs\TextGen2\lib\pathlib.py”, line 578, in _parse_args
Erroe msg is: a = os.fspath(a). Any thoughts?
Hi Tim,
I am glad the guide has been helpful. In step 2 did you activated the environment with “conda activate TextGen2”. If yes did terminal change from (base) to (TextGen2) ? Each you close the terminal you will have to make sure the environment you want is activated.
Hope this helps, if you have more questions please ask.
can i train my model with this uI
Hi Ano, I am not sure what you mean by ul. You can fine-tune the Llama 2 model.
Hi,
I followed all the steps mentioned above. I observe the following error:
ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting disable_exllama=True in the quantization config object
This is most likely caused because your GPU RAM is not enough to handle the model you are running. Try running the Llama-2-7b model.