A Personal Journal of Learning and Discovery

Tag: rl

1 item with this tag.

  • Jan 30, 2025

    Reinforcement Learning with GRPO Fine-Tuning a Small Language Model for Chain-of-Thought Math Reasoning. Similar to Deepseek R1 training

    • llm
    • coding
    • training
    • rl
    • deepseek

  • GitHub