VEHME: A Vision-Language Model For Evaluating Handwritten Mathematics Expressions
Published in EMNLP Main Conference, 2025
The paper introduces VEHME, a vision-language model for automated grading of handwritten math solutions. It uses a two-phase training pipeline (supervised fine-tuning + reinforcement learning) and an Expression-Aware Visual Prompting Module to handle complex, unstructured math expressions. Evaluated on AIHub and FERMAT, VEHME achieves state-of-the-art accuracy among open-source models, and approaching proprietary models performance, making it a scalable tool for math assessment.
Recommended citation: Thu Phuong Nguyen, Duc M. Nguyen, Hyotaek Jeon, Hyunwook Lee, Hyunmin Song, Sungahn Ko, and Taehwan Kim. 2025. VEHME: A Vision-Language Model For Evaluating Handwritten Mathematics Expressions. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 31781–31801, Suzhou, China. Association for Computational Linguistics.
Download Paper