How To Install LLaMA 2 / LLaMA 3 Locally On Windows

This guide was written originally for LLaMA 2. However, you can use the same steps to install LLaMA 3 make sure to download and install LLaMA 3 instead of LLaMA 2.

If you’re looking to install LLaMA 2, the next generation of Meta’s open-source large language model, you’ve come to the right place. LLaMA 2 is making significant strides in the field of Artificial Intelligence (AI), revolutionizing various fields, from customer service to content creation.

This model, available for free for research and commercial use, has been trained on 2 trillion tokens and boasts double the context length of its predecessor, LLaMA 1.

Its fine-tuned models have been trained on over 1 million human annotations, making it a powerful tool for various AI applications.

This guide will walk you through the process of installing LLaMA 2 locally, providing a step-by-step approach to help you set up and start using this powerful AI model on your own machine.

If you’re interested in exploring more about AI models, you might find our posts on ChatGPT-4 Unleashed and How to Install SuperAGI useful.

Prerequisites

Before we install LLaMA 2, ensure that you have Conda installed on your system. We will be using Conda to create a new environment for our installation. We will also be using Text Generation Web UI for the interface for this model. You can follow our instructions to install conda from our GPT-Engineer guide.

Once Conda is installed, open a Conda terminal and continue following the guide.

If you have trouble copying and pasting the code, you must enable copy and paste in the properties of the terminal. Once enabled, you can copy and paste the commands from this guide into the terminal by Crtl + Shift + C/V.

Preparing To Install LLaMA 2 / LLaMA 3

Step 1: Create a New Conda Environment

The first step is to create a new Conda environment. You can do this by running the following command in your terminal:

conda create -n TextGen2 python=3.10.9

This command creates a new Conda environment named TextGen2 with Python version 3.10.9. Once the command executes, you will be prompted to install the package images. Press y to proceed.

Step 2: Activate the New Environment

After creating the new environment, you need to activate it. This can be done with the following command:

conda activate TextGen2

You will know the environment is activated when you see TextGen2 in your terminal prompt.

Step 3: Install PyTorch

Next, we need to install PyTorch. This can be done with the following command:

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117

This command downloads and installs PyTorch along with torchvision and torchaudio from the specified URL. The installation may take a few minutes.

Step 4: Clone the Repository

After installing PyTorch, we must clone the Text Generation Web UI repository. This can be done with the following command:

git clone https://github.com/oobabooga/text-generation-webui.git

This command clones the repository into a new folder named text-generation-webui.

Step 5: Change Directory

Next, change the directory to the newly cloned folder with the following command:

cd text-generation-webui

Step 6: Install Python Modules

Inside the text-generation-webui folder, install all the required Python modules with the following command:

pip install -r requirements.txt

This command installs all the Python modules listed in the requirements.txt file. This process may also take a few minutes.

Step 7: Start the Server

Now that all the necessary modules are installed, you can start the server with the following command:

python server.py

Once the server is running, you will see a local URL in your terminal.

In my case, it is http://127.0.0.1:7860

Step 8: Access the Web UI

Copy the local URL from your terminal and paste it into your web browser. You should now see the Text Generation Web UI.

Step 9: Download the Model

I use the 70B Llama models it requires a minimum of 32GB GPU RAM, but you can use the 13B or 7B models if your GPU can’t handle that.

Now, go to the LLaMA 2 70b chat model on Hugging Face and copy the model URL. Switch back to the Text Generation Web UI, go to the Model tab, and paste the partial URL into the “Download custom model” field in my case, that is, “TheBloke/Llama-2-70B-chat-GPTQ”.

Or the get latest LLaMA 3 models from Meta on Hugging Face.

Click “Download” to start the download. This process will take a significant amount of time due to the large file size of the model.

Step 10: Load the Model

Once the model is downloaded, click the blue “Reload” button at the top of the Model tab. Find the downloaded model in the list and click “Load”. Make sure to use the Transformers model loader for this process. This process may also take some time.

Step 11: Configure the Session

After loading the model, switch to the Session tab and select “Chat” from the Mode dropdown menu. Click “Apply and Restart” to apply the changes.

Step 12: Configure Parameters

Switch to the Parameters tab and max out the “New Tokens” field. Set the “Temperature” field to 0. You can adjust these settings later based on your needs.

Step 13: Test the Model

Finally, switch back to the Text Generation tab and test the model by typing something into the input field and clicking “Generate”.

And there you have it! You have successfully installed LLaMA 2 locally. Enjoy exploring the capabilities of this powerful AI model.

Conclusion

Congratulations! You have successfully installed LLaMA 2 locally. With this powerful AI model at your disposal, you can now explore various applications, from text generation to AI research. Remember, the power of AI lies not just in its capabilities but also in how we use it.

So, use LLaMA 2 responsibly, explore its capabilities, and let it assist you in your AI journey.

Remember, this guide is a comprehensive walkthrough to help you start with LLaMA 2. If you encounter any issues during installation, please seek my help by commenting below or contacting me on my social media.

If you liked this guide, check out our latest guide on Code Llama, a fine-tuned Llama 2 coding model. It is a close competitor to OpenAI’s GPT-4 coding capabilities.

For more insights into AI and related technologies, check out our posts on Tortoise Text-to-Speech and OpenAI ChatGPT Guide.

Aidan Mison

August 22, 2023 at 4:02 pm

Really great walk through, easy to follow and work from. Just had a quick question, is there any easy way to benchmark tokens per minute for a given model and respective parameters?

Lachie
August 22, 2023 at 11:12 pm
Thanks for the comment Aidan. I don’t know of an easy way to do this, there may be a tool to test it.
Loading...

Adrian Rodriguez Garcia

August 23, 2023 at 4:55 am

i cant see the option chat on session

Lachie
August 23, 2023 at 10:21 am
Once the model has downloaded, press the refresh button and then press the load button. Once the load button has been clicked it can take some time and it will show model loaded in the bottom right. Now the chat option should be available in the session tab.
Loading...

Tim Hockswender

August 26, 2023 at 4:47 am

I’m installing under Win11. Your install guide is a big help.Steps OK until I attempt to load the model. The loader is unable to find some path : File “C:\Users\timho\miniconda3\envs\TextGen2\lib\pathlib.py”, line 578, in _parse_args
Erroe msg is: a = os.fspath(a). Any thoughts?

Lachie
August 26, 2023 at 5:33 pm
Hi Tim,
I am glad the guide has been helpful. In step 2 did you activated the environment with “conda activate TextGen2”. If yes did terminal change from (base) to (TextGen2) ? Each you close the terminal you will have to make sure the environment you want is activated.
Hope this helps, if you have more questions please ask.
Loading...

Pingback: How To Install Code Llama Locally: Easy Windows Guide

ANO

August 29, 2023 at 10:09 pm

can i train my model with this uI

Lachie
August 30, 2023 at 1:02 pm
Hi Ano, I am not sure what you mean by ul. You can fine-tune the Llama 2 model.
Loading...

Pingback: How To Fine-Tune Llama 2 For Amazing Results

Pingback: How To Install Database GPT: Privacy First AI Data Processing

September 18, 2023 at 11:09 pm

Hi,
I followed all the steps mentioned above. I observe the following error:
ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting disable_exllama=True in the quantization config object

Lachie
November 2, 2023 at 10:54 am
This is most likely caused because your GPU RAM is not enough to handle the model you are running. Try running the Llama-2-7b model.
Loading...

Pingback: How To Build A Document Q&A System With Llama Index: A Step-by-Step Guide

Pingback: How To Install MemGPT: The Future Of Language Models With Built-in Memory Management

Pingback: Getting started with AI (or what I did anyway) – Rose Herden's Projects

Gizmo Enthusiast

February 7, 2024 at 1:24 pm

On “Load Model” step I get Error:

“DLL load failed while importing flash_attn_2_cuda: The specified module could not be found.”

What did I break?

Lachie
February 7, 2024 at 1:31 pm
I think it is a known bug. Try creating a new conda envirornment, if that fails try some of the recommended fixes in the comments – https://github.com/oobabooga/text-generation-webui/issues/4344
Loading...

Gleen

February 9, 2024 at 10:25 am

i find difficulty in downloading of qny of the model, thiis is the error ndexError: string index out of range that appears on text generation web gui

Dan Leeson

February 24, 2024 at 9:01 am

Cannot get model to download. “requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/api/models/Llama-2-13b-chat-hf/tree/main”
I have already received permission from meta and hugging face, and have tried multiple models and received same error.

Florian Hackl

March 1, 2024 at 4:10 am

Hello, great workthrough. So far it works, but Why can’t I simply select llama-2-7B-chat.. I have to chooese between the different Qllama-2-7b-chat.Q2.K.gguf (Q3.K_L, Q3.K_M,…)?

Chris Hunter

March 11, 2024 at 12:37 pm

Hi Lachie! Do you know where the Llama 2’s AI code is saved when downloaded? Not the model file I think but the actual code of the AI – say for example you wanted to edit the AI’s code to give it ability to use Google to search with a custom google search engine you set up, what / where is the file that holds the AI’s code we would edit pretty please? I’m trying to modify my local AI’s code so that when she needs to perform a web search, she calls this Python script with the appropriate search query. Great Article by the way & Thank You! XOXO

How to Install LLaMA 2 / LLaMA 3 Locally on Windows

Table of Contents

Prerequisites

Preparing To Install LLaMA 2 / LLaMA 3

Step 1: Create a New Conda Environment

Step 2: Activate the New Environment

Step 3: Install PyTorch

Step 4: Clone the Repository

Step 5: Change Directory

Step 6: Install Python Modules

Step 7: Start the Server

Step 8: Access the Web UI

Step 9: Download the Model

Step 10: Load the Model

Step 11: Configure the Session

Step 12: Configure Parameters

Step 13: Test the Model

Conclusion

Like this:

Related

22 thoughts on “How to Install LLaMA 2 / LLaMA 3 Locally on Windows”

Leave a ReplyCancel reply

Table of Contents

Prerequisites

Preparing To Install LLaMA 2 / LLaMA 3

Step 1: Create a New Conda Environment

Step 2: Activate the New Environment

Step 3: Install PyTorch

Step 4: Clone the Repository

Step 5: Change Directory

Step 6: Install Python Modules

Step 7: Start the Server

Step 8: Access the Web UI

Step 9: Download the Model

Step 10: Load the Model

Step 11: Configure the Session

Step 12: Configure Parameters

Step 13: Test the Model

Conclusion

Share this:

Like this:

Related

22 thoughts on “How to Install LLaMA 2 / LLaMA 3 Locally on Windows”

Leave a ReplyCancel reply

Discover more from Lachie's Lifestyle