How to Run DeepSeek R1 Locally: A Comprehensive Guide
Jan 23, 2025A comprehensive guide on how to run the DeepSeek R1 model locally using Ollama, Hugging Face Transformers, and vLLM, including hardware considerations and configuration tips.
How to Run DeepSeek R1 Locally? A Comprehensive Guide
The DeepSeek R1 model represents a significant advancement in AI, offering impressive reasoning capabilities comparable to models like OpenAI's o1. The open-source nature of DeepSeek R1, along with its efficient performance, makes it an attractive option for developers and researchers. A common question arises: how to Run DeepSeek R1 locally? This article provides a detailed guide on how to accomplish this, drawing from available resources and best practices.
Understanding DeepSeek R1
DeepSeek R1 is a first-generation reasoning model trained using large-scale reinforcement learning (RL). It builds upon the DeepSeek-V3-Base model and showcases remarkable performance in areas like math, code, and general reasoning. One of its key features is the use of Chain of Thought (CoT) reasoning, enabling it to tackle complex tasks effectively. Notably, DeepSeek R1 comes in several distilled versions, fine-tuned for specific use cases, making it adaptable to various hardware configurations.
Methods to Run DeepSeek R1 Locally
There are several methods to Run DeepSeek R1 locally?, depending on the specific model variant and available hardware:
1. Using Ollama
Ollama is a tool designed to simplify the process of running language models locally. Here's how to use it with DeepSeek R1:
-
Install Ollama: Visit the Ollama website to download and install the tool. For Linux users, you can use the following command in your terminal:
curl -fsSL https://ollama.com/install.sh | sh
-
Run the Model: Once Ollama is installed, you can run DeepSeek R1 using the following command:
ollama run deepseek-r1
For specific distilled versions, such as the 1.5B parameter model, use:
ollama run deepseek-r1:1.5b
This will download the model and start it locally.
2. Using Hugging Face Transformers
The Hugging Face transformers
library provides another way to Run DeepSeek R1 locally?. This method is more flexible but requires some familiarity with Python and PyTorch.
-
Install Dependencies: Start by installing the necessary libraries:
!pip install transformers accelerate torch
This command installs
transformers
(for model loading and inference),accelerate
(for optimized performance), andtorch
(the PyTorch deep learning framework). -
Load the Model and Tokenizer: Load the desired DeepSeek R1 model and its corresponding tokenizer:
from transformers import pipeline messages = [ {"role": "user", "content": "Give me code for the Fibonacci nth series"}, ] pipe = pipeline("text-generation", model="deepseek-ai/DeepSeek-R1-Distill-Qwen-7B") print(pipe(messages))
This code snippet loads the
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
model and uses it to generate text based on the provided prompt.
3. Using vLLM
For more efficient inference, especially with larger models, consider using vLLM (a fast and easy-to-use library for LLM inference).
-
Install vLLM: Install vLLM following the instructions on the vLLM GitHub repository. You might need to merge a specific Pull Request (https://github.com/vllm-project/vllm/pull/4650) into your vLLM codebase.
-
Run the Model: Use the following code to start a vLLM service with DeepSeek R1:
from transformers import AutoTokenizer from vllm import LLM, SamplingParams max_model_len, tp_size = 8192, 8 model_name = "deepseek-ai/DeepSeek-V2-Chat" tokenizer = AutoTokenizer.from_pretrained(model_name) llm = LLM(model=model_name, tensor_parallel_size=tp_size, max_model_len=max_model_len, trust_remote_code=True, enforce_eager=True) sampling_params = SamplingParams(temperature=0.3, max_tokens=256, stop_token_ids=[tokenizer.eos_token_id]) messages_list = [ [{"role": "user", "content": "Who are you?"}], [{"role": "user", "content": "Translate the following content into Chinese directly: DeepSeek-V2 adopts innovative architectures to guarantee economical training and efficient inference."}], [{"role": "user", "content": "Write a piece of quicksort code in C++."}], ] prompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list] outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params) generated_text = [output.outputs[0].text for output in outputs] print(generated_text)
This code snippet initializes vLLM with the specified DeepSeek R1 model and generates responses based on the provided prompts.
Considerations for Running DeepSeek R1 Locally
-
Hardware Requirements: Running large language models like DeepSeek R1 locally can be resource-intensive. Ensure your system meets the minimum requirements, including sufficient RAM (ideally 32GB or more) and a capable GPU (if leveraging GPU acceleration). For the DeepSeek-V2 model in BF16 format, 80GB*8 GPUs are recommended.
-
Model Selection: Choose the appropriate DeepSeek R1 variant based on your hardware and performance needs. Distilled models, such as the Qwen-based versions, are smaller and require less computational power, making them suitable for less powerful systems.
-
Configuration: When running DeepSeek R1, it's recommended to set the temperature within the range of 0.5-0.7 to prevent endless repetitions or incoherent outputs. Additionally, avoid adding a system prompt; all instructions should be contained within the user prompt.
Conclusion
DeepSeek R1 offers a powerful and accessible AI model for various applications. By following the steps outlined in this guide, you can successfully Run DeepSeek R1 locally? using tools like Ollama, Hugging Face Transformers, or vLLM. Remember to consider your hardware capabilities and adjust the model and configuration accordingly to achieve optimal performance. With the right setup, you can leverage the impressive reasoning capabilities of DeepSeek R1 on your own computer.