How to Install LLaMA 2 Locally on Windows

Install LLaMA 2 Locally and On Cloud


If you’re looking to install LLaMA 2, the next generation of Meta’s open-source large language model, you’ve come to the right place. LLaMA 2 is making significant strides in the field of Artificial Intelligence (AI), revolutionizing various fields, from customer service to content creation.

This model, available for free for research and commercial use, has been trained on 2 trillion tokens and boasts double the context length of its predecessor, LLaMA 1.

Its fine-tuned models have been trained on over 1 million human annotations, making it a powerful tool for various AI applications.

This guide will walk you through the process of installing LLaMA 2 locally, providing a step-by-step approach to help you set up and start using this powerful AI model on your own machine.

If you’re interested in exploring more about AI models, you might find our posts on ChatGPT-4 Unleashed and How to Install SuperAGI useful.


Before we install LLaMA 2, ensure that you have Conda installed on your system. We will be using Conda to create a new environment for our installation. We will also be using Text Generation Web UI for the interface for this model. You can follow our instructions to install conda from our GPT-Engineer guide.

Once Conda is installed, open a Conda terminal and continue following the guide.

If you have trouble copying and pasting the code, you must enable copy and paste in the properties of the terminal. Once enabled, you can copy and paste the commands from this guide into the terminal by Crtl + Shift + C/V.

Anaconda Enable Copy Paste.

Preparing To Install LLaMA 2

Step 1: Create a New Conda Environment

The first step is to create a new Conda environment. You can do this by running the following command in your terminal:

conda create -n TextGen2 python=3.10.9

This command creates a new Conda environment named TextGen2 with Python version 3.10.9. Once the command executes, you will be prompted to install the package images. Press y to proceed.

Step 2: Activate the New Environment

After creating the new environment, you need to activate it. This can be done with the following command:

conda activate TextGen2

You will know the environment is activated when you see TextGen2 in your terminal prompt.

Step 3: Install PyTorch

Next, we need to install PyTorch. This can be done with the following command:

pip3 install torch torchvision torchaudio --index-url

This command downloads and installs PyTorch along with torchvision and torchaudio from the specified URL. The installation may take a few minutes.

Step 4: Clone the Repository

After installing PyTorch, we must clone the Text Generation Web UI repository. This can be done with the following command:

git clone

This command clones the repository into a new folder named text-generation-webui.

Step 5: Change Directory

Next, change the directory to the newly cloned folder with the following command:

cd text-generation-webui

Step 6: Install Python Modules

Inside the text-generation-webui folder, install all the required Python modules with the following command:

pip install -r requirements.txt

This command installs all the Python modules listed in the requirements.txt file. This process may also take a few minutes.

Step 7: Start the Server

Now that all the necessary modules are installed, you can start the server with the following command:


Once the server is running, you will see a local URL in your terminal. In my case, it is

Step 8: Access the Web UI

Copy the local URL from your terminal and paste it into your web browser. You should now see the Text Generation Web UI.

image 15

Step 9: Download the Model

Next, go to the LLaMA 2 70b chat model on Hugging Face and copy the model URL. Switch back to the Text Generation Web UI, go to the Model tab, and paste the partial URL into the “Download custom model” field in my case, that is, “TheBloke/Llama-2-70B-chat-GPTQ”.

I use the 70B model, but you can use the 13B or 7B models if your GPU can’t handle that.

Click “Download” to start the download. This process will take a significant amount of time due to the large file size of the model.

LLaMA2 Local Model Download 70B

Step 10: Load the Model

Once the model is downloaded, click the blue “Reload” button at the top of the Model tab. Find the downloaded model in the list and click “Load”. Make sure to use the Transformers model loader for this process. This process may also take some time.

Step 11: Configure the Session

After loading the model, switch to the Session tab and select “Chat” from the Mode dropdown menu. Click “Apply and Restart” to apply the changes.

Step 12: Configure Parameters

Switch to the Parameters tab and max out the “New Tokens” field. Set the “Temperature” field to 0. You can adjust these settings later based on your needs.

Step 13: Test the Model

Finally, switch back to the Text Generation tab and test the model by typing something into the input field and clicking “Generate”.

And there you have it! You have successfully installed LLaMA 2 locally. Enjoy exploring the capabilities of this powerful AI model.


Congratulations! You have successfully installed LLaMA 2 locally. With this powerful AI model at your disposal, you can now explore various applications, from text generation to AI research. Remember, the power of AI lies not just in its capabilities but also in how we use it.

So, use LLaMA 2 responsibly, explore its capabilities, and let it assist you in your AI journey.

Remember, this guide is a comprehensive walkthrough to help you start with LLaMA 2. If you encounter any issues during installation, please seek my help by commenting below or contacting me on my social media.

If you liked this guide, check out our latest guide on Code Llama, a fine-tuned Llama 2 coding model. It is a close competitor to OpenAI’s GPT-4 coding capabilities.

For more insights into AI and related technologies, check out our posts on Tortoise Text-to-Speech and OpenAI ChatGPT Guide.

15 thoughts on “How to Install LLaMA 2 Locally on Windows

Add yours

  1. Really great walk through, easy to follow and work from. Just had a quick question, is there any easy way to benchmark tokens per minute for a given model and respective parameters?

    1. Once the model has downloaded, press the refresh button and then press the load button. Once the load button has been clicked it can take some time and it will show model loaded in the bottom right. Now the chat option should be available in the session tab.

  2. I’m installing under Win11. Your install guide is a big help.Steps OK until I attempt to load the model. The loader is unable to find some path : File “C:\Users\timho\miniconda3\envs\TextGen2\lib\”, line 578, in _parse_args
    Erroe msg is: a = os.fspath(a). Any thoughts?

    1. Hi Tim,

      I am glad the guide has been helpful. In step 2 did you activated the environment with “conda activate TextGen2”. If yes did terminal change from (base) to (TextGen2) ? Each you close the terminal you will have to make sure the environment you want is activated.

      Hope this helps, if you have more questions please ask.

  3. Hi,
    I followed all the steps mentioned above. I observe the following error:
    ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting disable_exllama=True in the quantization config object

Leave a Reply

Up ↑