Evaluating LLM Performance on University Exams Given Course Materials

  • Forschungsthema:Large Language Models
  • Typ:Masterarbeit
  • Betreuung:

    Tu Anh Dinh / Lukas Hilgert

  • Bearbeitung:Philipp Schumacher
  • Zusatzfeld:

    Dataset provided in the paper: SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading

    https://arxiv.org/pdf/2406.10421

     

    In the paper, some LLMs are evaluated on university exams and compared to students' performance. However, the comparison is quite biased, since students have the advantage of knowing the course materials and have studied for the course. This thesis aims to incorporate course material when querying the LLMs as well for better comparison.

     

    Contact: tu.dinh@kit.edu