ReproRepo: Scaling Reproducibility Audits with GitHub Repository Issues
Researchers introduce ReproRepo, a framework that uses GitHub issues to scale reproducibility audits for LLM agents. This approach aims to overcome the manual effort required in existing benchmarks. ReproRepo leverages human-raised issues as supervision signals for realistic reproduction blocks. You can apply this framework to assess LLM agents' reproducibility capabilities.
- ReproRepo uses GitHub issues for scalable reproducibility evaluation.
- Framework leverages human-raised issues as supervision signals.
- Aims to assess LLM agents' reproducibility capabilities.