Rejection Sampling

rejection sampling is a technique from probability and statistics used to generate samples from a target distribution $p(x)$ when direct sampling is difficult. Instead, you sample from a simpler “proposal” distribution $q(x)$ and decide whether to accept or reject each proposed point based on a criterion involving the ratio $\frac{p(x)}{q(x)}$.

Procedure

Propose a sample $x$ from a simpler distribution $q(x)$.
Compute a threshold:
\[T = \frac{p(x)}{M \, q(x)}\]
where $M$ is a constant chosen so that $p(x) \leq M \, q(x)$ for all $x$.
Accept or reject:
Draw a random number $u$ from $[0,1]$. If $u \leq T$, accept $x$; otherwise, reject $x$.

When accepted, $x$ can be viewed as a valid sample from $p(x)$. If rejected, $x$ is discarded, and a new sample is proposed from $q(x)$.

Why Use Rejection Sampling?

It lets you sample from complicated distributions if you can bound them by a simpler distribution.
It is a quite straightforward method, though it can become inefficient if many samples are rejected (which happens when $M$ must be large or if $p(x)$ and $q(x)$ are very different).

Improving LLM Training: Balancing Rejection Sampling and Efficiency

We can use rejection sampling in the training of large language models (LLMs) by generating candidate outputs from a proposal mechanism (e.g., a preliminary version of the same model or a secondary model) and rejecting those that fail to meet specified criteria—retaining only those that align with the target distribution or objective function.

In practice, training LLMs involves more nuanced techniques (such as Reinforcement Learning from Human Feedback or iterative refinement) because standard rejection sampling can become inefficient if too many samples end up discarded. If we do employ a secondary LLM to produce multiple candidate answers, we should consider the trade-off between thorough filtering and computational cost. Paying attention to grammar, style, and clear logical flow is crucial for the final output.

Some projects that likely used this https://github.com/NovaSky-AI/SkyThought

Kid’s Corner: Rejection Sampling Explained

Let’s imagine you have a big bowl of mixed candies. Some candies are your favorite, and some you really don’t like. You want to eat only the candies you like, but you can’t see which ones are which—maybe they’re all wrapped in the same color!

Grab a Candy (Propose a Sample):
You reach into the bowl (the “proposal distribution”) and randomly pick one candy. You don’t know if it’s your favorite yet.
Check if You Want to Keep It (Accept or Reject):
Now you unwrap the candy to see which kind it is. If it’s a candy you really like, you keep it. If it’s not one of your favorites, you throw it back or set it aside.
Repeat:
Keep doing this—picking a candy, deciding if you like it or not—until you have a nice pile of candies that you enjoy.

In grown-up terms, the “favorite candy” part is like having ( p(x) ), the distribution you really want. The “bowl of candies” is ( q(x) ), the simpler distribution you’re picking from. Sometimes you end up picking candies you don’t want (rejecting them), but eventually, you collect plenty of the “good” ones. That’s Rejection Sampling!

Why It Can Be Slow:
If you don’t like most candies in the bowl, you’ll throw a lot of them away, taking more time to get enough favorites. The same idea happens in math—if ( p(x) ) is very different from ( q(x) ), you end up throwing away many of your samples.