i post-trained a model to reliably roll a die
A model was post-trained to reliably roll a die, with each number coming up roughly 1/6 of the time. This is a toy problem for exploring model behavior and strategies, and a blog post is available on the work.
- Post-trained model reliably rolls a die with each number coming up roughly 1/6 of the time.
- Toy problem for exploring model behavior and strategies.
- Blog post available on the work