The first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1, bring advanced reasoning capabilities to AI.

DeepSeek-R1-Zero: Reinforcement Learning Without Fine-Tuning

DeepSeek-R1-Zero is trained using large-scale reinforcement learning (RL) without any supervised fine-tuning (SFT) beforehand. This method allows the model to develop strong reasoning skills naturally. However, it encounters some challenges:

  • Endless repetition in responses.
  • Poor readability.
  • Mixing different languages.

DeepSeek-R1: Improved Performance

To address these issues, DeepSeek-R1 was developed. It uses a cold-start data phase before reinforcement learning, which enhances readability and overall performance. DeepSeek-R1 matches OpenAI-o1 in tasks such as math, coding, and reasoning.

A Fascinating Feature: Transparent Thinking

One of the most fascinating aspects of DeepSeek-R1 is its ability to show its thinking process step by step. When responding to a user’s query, the model outlines its reasoning clearly, making it easy to follow how it arrives at conclusions. This transparency is invaluable for understanding complex tasks like math, coding, or multi-step problem-solving.

Open-Source and Distilled Models

Both DeepSeek-R1-Zero and DeepSeek-R1 are open-source, making them freely available for researchers and developers. Additionally, six smaller distilled models, based on Llama and Qwen, are released. One standout model, DeepSeek-R1-Distill-Qwen-32B, surpasses OpenAI-o1-mini on multiple benchmarks and sets new records for dense models.

Advancing AI Research

These models are designed to push the boundaries of reasoning capabilities and are ready for exploration, adaptation, and improvement.

Task DeepSeek-R1 OpenAI-o1-1217 DeepSeek-R1-32B OpenAI-o1-mini DeepSeek-V3
AIME 2024 (Pass@1) 79.8 79.2 72.6 63.6 39.2
CodeForces (Percentile) 96.3 96.6 90.6 93.4 58.7
GPQA Diamond (Pass@1) 71.5 75.7 62.1 60.0 59.1
MATH-500 (Pass@1) 97.3 96.4 94.3 90.0 90.2
MMLU (Pass@1) 90.8 91.8 87.4 85.2 88.5
SWE-bench Verified 49.2 48.9 36.8 41.6 42.0

DeepSeek-R1 GitHub

DeepSeek R1 is a large language model (LLM) that focuses on reasoning skills. Below are some key points that highlight DeepSeek R1. Also weights are avaialble to download

Model #Total Params #Activated Params Context Length Download
DeepSeek-R1-Zero 671B 37B 128K 🤗 HuggingFace
DeepSeek-R1 671B 37B 128K 🤗 HuggingFace

DeepSeek-R1-Distill models are fine-tuned based on open-source models, using samples generated by DeepSeek-R1. With slightly change their configs and tokenizers. Use specific setting to run these models.

Model Base Model Download
DeepSeek-R1-Distill-Qwen-1.5B Qwen2.5-Math-1.5B 🤗 HuggingFace
DeepSeek-R1-Distill-Qwen-7B Qwen2.5-Math-7B 🤗 HuggingFace
DeepSeek-R1-Distill-Llama-8B Llama-3.1-8B 🤗 HuggingFace
DeepSeek-R1-Distill-Qwen-14B Qwen2.5-14B 🤗 HuggingFace
DeepSeek-R1-Distill-Qwen-32B Qwen2.5-32B 🤗 HuggingFace
DeepSeek-R1-Distill-Llama-70B Llama-3.3-70B-Instruct 🤗 HuggingFace

1. Pure Reinforcement Learning for Reasoning

Most language models use a supervised fine-tuning step (often called SFT) to improve their answers. DeepSeek R1 takes a different path by relying almost entirely on reinforcement learning. It learns how to solve problems by testing itself and getting feedback on right or wrong answers, rather than just imitating human-written examples.

  • It shows that a model can develop strong reasoning without having to be spoon-fed correct answers.
  • The model’s “chain of thought” (or “thinking”) is more explicit, as it must reason step by step to maximize its reward.

2. Open-Source and Commercial-Friendly

DeepSeek R1 is released under an MIT License, meaning anyone is free to use, modify, and redistribute it—even for commercial purposes. This stands in contrast to many proprietary models that remain closed off.

  • Developers can integrate DeepSeek R1 into their own products without restrictive terms.
  • The broader community can improve or adapt the model for specialized uses, like research or domain-specific tasks.

3. Multiple Sizes, Including Smaller “Distilled” Models

DeepSeek R1 comes in various sizes. The main versions can be very large and resource-intensive, but there are smaller “distilled” versions. These smaller variants pack much of the reasoning power into fewer parameters, making them more practical to run on personal computers.

  • You do not need expensive hardware to experiment.
  • Users can choose a smaller model if they want faster responses or a larger one if they have powerful GPUs or high RAM.

4. Strong at Math, Coding, and Step-by-Step Tasks

DeepSeek R1 and its distilled variants tend to do well on tasks that require reasoning, such as math questions and code generation. It often explains its logic more thoroughly than typical chatbots, which helps users understand how it arrives at answers.

  • Good for developers seeking a local coding assistant—one that can outline its thought process.
  • Helpful for complex problem-solving where each step matters (e.g., multi-step math problems).

5. Extended “Thinking” Outputs

When asked a question, DeepSeek R1 often writes out a detailed reasoning process before giving a short final answer. This is part of its training: it thinks through the problem in plain text tokens. The advantage is more transparent logic; the downside is that it can be wordy if you only want quick answers.

  • Great for those who value transparency in AI reasoning.
  • May not be ideal if you prefer short, direct replies every time.

6. Hosted and Local Versions

You can use DeepSeek R1 through certain cloud services, or you can run it yourself on your own system. The hosted versions may have filters or restrictions in place (such as refusing to discuss sensitive topics). The local versions, on the other hand, are under your control.

  • Hosted models can be faster or more convenient, but can include unwanted censorship or usage terms.
  • Running locally means you retain privacy, have no usage limits, and can customize behavior.

7. Limitations and Practical Notes

Despite strong benchmarks, any LLM may still produce mistakes or “hallucinations.” DeepSeek R1 can also be quite verbose in some scenarios. It may repeat or re-check its steps as it reasons. These traits are side effects of its focus on problem-solving.

  • Always verify answers if the information is critical.
  • For simpler tasks, you might prefer a model that gives shorter, more direct responses.