
Table of Contents
This guide was written originally for LLaMA 2. However, you can use the same steps to install LLaMA 3 make sure to download and install LLaMA 3 instead of LLaMA 2.
If you’re looking to install LLaMA 2, the next generation of Meta’s open-source large language model, you’ve come to the right place. LLaMA 2 is making significant strides in the field of Artificial Intelligence (AI), revolutionizing various fields, from customer service to content creation.
This model, available for free for research and commercial use, has been trained on 2 trillion tokens and boasts double the context length of its predecessor, LLaMA 1.
Its fine-tuned models have been trained on over 1 million human annotations, making it a powerful tool for various AI applications.
This guide will walk you through the process of installing LLaMA 2 locally, providing a step-by-step approach to help you set up and start using this powerful AI model on your own machine.
If you’re interested in exploring more about AI models, you might find our posts on ChatGPT-4 Unleashed and How to Install SuperAGI useful.
Prerequisites
Before we install LLaMA 2, ensure that you have Conda installed on your system. We will be using Conda to create a new environment for our installation. We will also be using Text Generation Web UI for the interface for this model. You can follow our instructions to install conda from our GPT-Engineer guide.
Once Conda is installed, open a Conda terminal and continue following the guide.
If you have trouble copying and pasting the code, you must enable copy and paste in the properties of the terminal. Once enabled, you can copy and paste the commands from this guide into the terminal by Crtl + Shift + C/V.

Preparing To Install LLaMA 2 / LLaMA 3
Step 1: Create a New Conda Environment
The first step is to create a new Conda environment. You can do this by running the following command in your terminal:
conda create -n TextGen2 python=3.10.9
This command creates a new Conda environment named TextGen2 with Python version 3.10.9. Once the command executes, you will be prompted to install the package images. Press y to proceed.
Step 2: Activate the New Environment
After creating the new environment, you need to activate it. This can be done with the following command:
conda activate TextGen2
You will know the environment is activated when you see TextGen2 in your terminal prompt.

Step 3: Install PyTorch
Next, we need to install PyTorch. This can be done with the following command:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
This command downloads and installs PyTorch along with torchvision and torchaudio from the specified URL. The installation may take a few minutes.
Step 4: Clone the Repository
After installing PyTorch, we must clone the Text Generation Web UI repository. This can be done with the following command:
git clone https://github.com/oobabooga/text-generation-webui.git
This command clones the repository into a new folder named text-generation-webui.
Step 5: Change Directory
Next, change the directory to the newly cloned folder with the following command:
cd text-generation-webui
Step 6: Install Python Modules
Inside the text-generation-webui folder, install all the required Python modules with the following command:
pip install -r requirements.txt
This command installs all the Python modules listed in the requirements.txt file. This process may also take a few minutes.
Step 7: Start the Server
Now that all the necessary modules are installed, you can start the server with the following command:
python server.py
Once the server is running, you will see a local URL in your terminal.
In my case, it is http://127.0.0.1:7860

Step 8: Access the Web UI
Copy the local URL from your terminal and paste it into your web browser. You should now see the Text Generation Web UI.

Step 9: Download the Model
I use the 70B Llama models it requires a minimum of 32GB GPU RAM, but you can use the 13B or 7B models if your GPU can’t handle that.
Now, go to the LLaMA 2 70b chat model on Hugging Face and copy the model URL. Switch back to the Text Generation Web UI, go to the Model tab, and paste the partial URL into the “Download custom model” field in my case, that is, “TheBloke/Llama-2-70B-chat-GPTQ”.
Or the get latest LLaMA 3 models from Meta on Hugging Face.
Click “Download” to start the download. This process will take a significant amount of time due to the large file size of the model.

Step 10: Load the Model
Once the model is downloaded, click the blue “Reload” button at the top of the Model tab. Find the downloaded model in the list and click “Load”. Make sure to use the Transformers model loader for this process. This process may also take some time.
Step 11: Configure the Session
After loading the model, switch to the Session tab and select “Chat” from the Mode dropdown menu. Click “Apply and Restart” to apply the changes.
Step 12: Configure Parameters
Switch to the Parameters tab and max out the “New Tokens” field. Set the “Temperature” field to 0. You can adjust these settings later based on your needs.
Step 13: Test the Model
Finally, switch back to the Text Generation tab and test the model by typing something into the input field and clicking “Generate”.
And there you have it! You have successfully installed LLaMA 2 locally. Enjoy exploring the capabilities of this powerful AI model.
Conclusion
Congratulations! You have successfully installed LLaMA 2 locally. With this powerful AI model at your disposal, you can now explore various applications, from text generation to AI research. Remember, the power of AI lies not just in its capabilities but also in how we use it.
So, use LLaMA 2 responsibly, explore its capabilities, and let it assist you in your AI journey.
Remember, this guide is a comprehensive walkthrough to help you start with LLaMA 2. If you encounter any issues during installation, please seek my help by commenting below or contacting me on my social media.
If you liked this guide, check out our latest guide on Code Llama, a fine-tuned Llama 2 coding model. It is a close competitor to OpenAI’s GPT-4 coding capabilities.
For more insights into AI and related technologies, check out our posts on Tortoise Text-to-Speech and OpenAI ChatGPT Guide.
Really great walk through, easy to follow and work from. Just had a quick question, is there any easy way to benchmark tokens per minute for a given model and respective parameters?
Thanks for the comment Aidan. I don’t know of an easy way to do this, there may be a tool to test it.
i cant see the option chat on session
Once the model has downloaded, press the refresh button and then press the load button. Once the load button has been clicked it can take some time and it will show model loaded in the bottom right. Now the chat option should be available in the session tab.
I’m installing under Win11. Your install guide is a big help.Steps OK until I attempt to load the model. The loader is unable to find some path : File “C:\Users\timho\miniconda3\envs\TextGen2\lib\pathlib.py”, line 578, in _parse_args
Erroe msg is: a = os.fspath(a). Any thoughts?
Hi Tim,
I am glad the guide has been helpful. In step 2 did you activated the environment with “conda activate TextGen2”. If yes did terminal change from (base) to (TextGen2) ? Each you close the terminal you will have to make sure the environment you want is activated.
Hope this helps, if you have more questions please ask.
Pingback: How To Install Code Llama Locally: Easy Windows Guide
can i train my model with this uI
Hi Ano, I am not sure what you mean by ul. You can fine-tune the Llama 2 model.
Pingback: How To Fine-Tune Llama 2 For Amazing Results
Pingback: How To Install Database GPT: Privacy First AI Data Processing
Hi,
I followed all the steps mentioned above. I observe the following error:
ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting disable_exllama=True in the quantization config object
This is most likely caused because your GPU RAM is not enough to handle the model you are running. Try running the Llama-2-7b model.
Pingback: How To Build A Document Q&A System With Llama Index: A Step-by-Step Guide
Pingback: How To Install MemGPT: The Future Of Language Models With Built-in Memory Management
Pingback: Getting started with AI (or what I did anyway) – Rose Herden's Projects
On “Load Model” step I get Error:
“DLL load failed while importing flash_attn_2_cuda: The specified module could not be found.”
What did I break?
I think it is a known bug. Try creating a new conda envirornment, if that fails try some of the recommended fixes in the comments – https://github.com/oobabooga/text-generation-webui/issues/4344
i find difficulty in downloading of qny of the model, thiis is the error ndexError: string index out of range that appears on text generation web gui
Cannot get model to download. “requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/api/models/Llama-2-13b-chat-hf/tree/main”
I have already received permission from meta and hugging face, and have tried multiple models and received same error.
Hello, great workthrough. So far it works, but Why can’t I simply select llama-2-7B-chat.. I have to chooese between the different Qllama-2-7b-chat.Q2.K.gguf (Q3.K_L, Q3.K_M,…)?
Hi Lachie! Do you know where the Llama 2’s AI code is saved when downloaded? Not the model file I think but the actual code of the AI – say for example you wanted to edit the AI’s code to give it ability to use Google to search with a custom google search engine you set up, what / where is the file that holds the AI’s code we would edit pretty please? I’m trying to modify my local AI’s code so that when she needs to perform a web search, she calls this Python script with the appropriate search query. Great Article by the way & Thank You! XOXO