Introducing a new automated tool for role-playing benchmarks.
RPBench-Auto is a newly developed automatic evaluation pipeline. This innovative tool is aiming to set a standard for assessing the capabilities of large language models (LLMs) within role-playing environments. It operates under two distinct frameworks – one that focuses on characters, and another on scenes. RPBench-Auto also employs a judging system that gauges performance providing essential feedback. The tool features a dynamic leaderboard, reflecting the latest results. Nonetheless, it acknowledges a degree of bias and is under continuous development, especially regarding its judgment component. The goal is to refine RPBench-Auto so it becomes a universal standard. For further details or to explore the tool’s personalization features, get in touch via api@boson.ai.
Read more:
Boson AI