A Personal Journal of Learning and Discovery
Search
Search
Dark mode
Light mode
Explorer
Tag: training
1 item with this tag.
Jan 30, 2025
Reinforcement Learning with GRPO Fine-Tuning a Small Language Model for Chain-of-Thought Math Reasoning. Similar to Deepseek R1 training
llm
coding
training
rl
deepseek