KIT - Künstliche Intelligenz für Sprachtechnologien - Studium und Lehre - Abschlussarbeiten - Evaluating LLM Performance on University Exams Given Course Materials

Evaluating LLM Performance on University Exams Given Course Materials

Forschungsthema:Large Language Models
Typ:Masterarbeit
Betreuung:
Tu Anh Dinh / Lukas Hilgert
Bearbeitung:Philipp Schumacher
Zusatzfeld:
Dataset provided in the paper: SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading

https://arxiv.org/pdf/2406.10421

In the paper, some LLMs are evaluated on university exams and compared to students' performance. However, the comparison is quite biased, since students have the advantage of knowing the course materials and have studied for the course. This thesis aims to incorporate course material when querying the LLMs as well for better comparison.

Contact: tu.dinh@kit.edu