KIT - AI4LT Lectures - Theses - Evaluating LLM Performance on University Exams Given Course Materials

Evaluating LLM Performance on University Exams Given Course Materials

Subject:Large Language Models
Type:Masterarbeit
Supervisor:
Tu Anh Dinh / Lukas Hilgert
Person in Charge:Philipp Schumacher
Add on:
Dataset provided in the paper: SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading

https://arxiv.org/pdf/2406.10421

In the paper, some LLMs are evaluated on university exams and compared to students' performance. However, the comparison is quite biased, since students have the advantage of knowing the course materials and have studied for the course. This thesis aims to incorporate course material when querying the LLMs as well for better comparison.

Contact: tu.dinh@kit.edu