How to Run DeepSeek R1 Locally: A Comprehensive Guide

Jan 23, 2025

A comprehensive guide on how to run the DeepSeek R1 model locally using Ollama, Hugging Face Transformers, and vLLM, including hardware considerations and configuration tips.

How to Run DeepSeek R1 Locally? A Comprehensive Guide

The DeepSeek R1 model represents a significant advancement in AI, offering impressive reasoning capabilities comparable to models like OpenAI's o1. The open-source nature of DeepSeek R1, along with its efficient performance, makes it an attractive option for developers and researchers. A common question arises: how to Run DeepSeek R1 locally? This article provides a detailed guide on how to accomplish this, drawing from available resources and best practices.

Understanding DeepSeek R1

DeepSeek R1 is a first-generation reasoning model trained using large-scale reinforcement learning (RL). It builds upon the DeepSeek-V3-Base model and showcases remarkable performance in areas like math, code, and general reasoning. One of its key features is the use of Chain of Thought (CoT) reasoning, enabling it to tackle complex tasks effectively. Notably, DeepSeek R1 comes in several distilled versions, fine-tuned for specific use cases, making it adaptable to various hardware configurations.

Methods to Run DeepSeek R1 Locally

There are several methods to Run DeepSeek R1 locally?, depending on the specific model variant and available hardware:

1. Using Ollama

Ollama is a tool designed to simplify the process of running language models locally. Here's how to use it with DeepSeek R1:

Install Ollama: Visit the Ollama website to download and install the tool. For Linux users, you can use the following command in your terminal:
```
curl -fsSL https://ollama.com/install.sh | sh
```
Run the Model: Once Ollama is installed, you can run DeepSeek R1 using the following command:
```
ollama run deepseek-r1
```
For specific distilled versions, such as the 1.5B parameter model, use:
```
ollama run deepseek-r1:1.5b
```
This will download the model and start it locally.

2. Using Hugging Face Transformers

The Hugging Face transformers library provides another way to Run DeepSeek R1 locally?. This method is more flexible but requires some familiarity with Python and PyTorch.

Install Dependencies: Start by installing the necessary libraries:
```
!pip install transformers accelerate torch
```
This command installs transformers (for model loading and inference), accelerate (for optimized performance), and torch (the PyTorch deep learning framework).

Load the Model and Tokenizer: Load the desired DeepSeek R1 model and its corresponding tokenizer:

from transformers import pipeline

messages = [
    {"role": "user", "content": "Give me code for the Fibonacci nth series"},
]

pipe = pipeline("text-generation", model="deepseek-ai/DeepSeek-R1-Distill-Qwen-7B")
print(pipe(messages))

This code snippet loads the deepseek-ai/DeepSeek-R1-Distill-Qwen-7B model and uses it to generate text based on the provided prompt.

3. Using vLLM

For more efficient inference, especially with larger models, consider using vLLM (a fast and easy-to-use library for LLM inference).

Install vLLM: Install vLLM following the instructions on the vLLM GitHub repository. You might need to merge a specific Pull Request (https://github.com/vllm-project/vllm/pull/4650) into your vLLM codebase.

Run the Model: Use the following code to start a vLLM service with DeepSeek R1:

from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

max_model_len, tp_size = 8192, 8
model_name = "deepseek-ai/DeepSeek-V2-Chat"
tokenizer = AutoTokenizer.from_pretrained(model_name)
llm = LLM(model=model_name, tensor_parallel_size=tp_size, max_model_len=max_model_len, trust_remote_code=True, enforce_eager=True)
sampling_params = SamplingParams(temperature=0.3, max_tokens=256, stop_token_ids=[tokenizer.eos_token_id])

messages_list = [
    [{"role": "user", "content": "Who are you?"}],
    [{"role": "user", "content": "Translate the following content into Chinese directly: DeepSeek-V2 adopts innovative architectures to guarantee economical training and efficient inference."}],
    [{"role": "user", "content": "Write a piece of quicksort code in C++."}],
]

prompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list]

outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params)

generated_text = [output.outputs[0].text for output in outputs]
print(generated_text)

This code snippet initializes vLLM with the specified DeepSeek R1 model and generates responses based on the provided prompts.

Considerations for Running DeepSeek R1 Locally

Hardware Requirements: Running large language models like DeepSeek R1 locally can be resource-intensive. Ensure your system meets the minimum requirements, including sufficient RAM (ideally 32GB or more) and a capable GPU (if leveraging GPU acceleration). For the DeepSeek-V2 model in BF16 format, 80GB*8 GPUs are recommended.
Model Selection: Choose the appropriate DeepSeek R1 variant based on your hardware and performance needs. Distilled models, such as the Qwen-based versions, are smaller and require less computational power, making them suitable for less powerful systems.
Configuration: When running DeepSeek R1, it's recommended to set the temperature within the range of 0.5-0.7 to prevent endless repetitions or incoherent outputs. Additionally, avoid adding a system prompt; all instructions should be contained within the user prompt.

Conclusion

DeepSeek R1 offers a powerful and accessible AI model for various applications. By following the steps outlined in this guide, you can successfully Run DeepSeek R1 locally? using tools like Ollama, Hugging Face Transformers, or vLLM. Remember to consider your hardware capabilities and adjust the model and configuration accordingly to achieve optimal performance. With the right setup, you can leverage the impressive reasoning capabilities of DeepSeek R1 on your own computer.

Unveiling Google's Imagen 3 API and its Integration with the Gemini API

Explore Google's Imagen 3 API and its integration with the Gemini API for AI-powered image generation. Discover its features, capabilities, limitations, pricing, and use cases.

Feb 5, 2025

The AI Arena: A Deep Dive into Gemini 2.0 and OpenAI's o3

A comparative analysis of Google's Gemini 2.0 and OpenAI's o3, exploring their strengths, weaknesses, and potential impact on the AI landscape.

Jan 30, 2025

Deepseek Operator: OpenAI Operator Alternative (Run Locally for FREE)

A comparison of Deepseek Operator and OpenAI Operator, exploring their capabilities, differences, and impact on the AI landscape. Deepseek offers a cost-effective, open-source alternative for reasoning, while OpenAI focuses on agentic web interaction.