Paper List
Search
Search
Dark mode
Light mode
Explorer
Tag: llm_judge
2 items with this tag.
May 01, 2026
CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmarking of Large Language Models in Mental Health Question Answering
mental_health_safety
expert_evaluation
llm_judge
May 01, 2026
EigenBench: A Comparative Behavioral Measure of Value Alignment
value_alignment
eigen_trust
llm_judge