NCES held the NAEP Math Automated Scoring Challenge in spring 2023, inviting anyone eligible for NCES secure data access to use algorithms and automated scoring techniques to predict the scores given by human raters to open-ended responses to 10 released NAEP mathematics assessment items. The results of this challenge (as well as those of the prior challenge using reading assessment items) are now available in a brief preprint.
Automated scoring is routinely conducted with reading and writing assessment items, but it has limited use for open-ended math items, which combine symbolic and conceptual information. The challenge served as an exploration of the “existing capabilities, accuracy metrics, underlying validity evidence of assigned scores, and efficiencies of using automated scoring for mathematics responses.” Of the more than a dozen teams that participated, two received grand prizes (UMass Amherst, led by Dr. Andrew Lan; and Vanderbilt University, led by Dr. Scott Crossley) and one received a runner-up prize (University of Oregon, led by Dr. Cengiz Zopluoglu). The challenge was judged based on the accuracy of teams’ score predictions and observed lack of bias.
The high performance of LLM approaches was a key takeaway from the challenge. As in the NAEP Reading Automated Scoring Challenge, the winners of the NAEP Math Automated Scoring Challenge all utilized large language models (LLMs). Math challengers used contemporary LLMs, including Math-RoBerta, Flan-T5, and DeBERTa. Interestingly, the only final submission that did not utilize an LLM was unable to accurately score any items. It was found that the results in the math challenge did not exhibit bias, in contrast to the reading challenge where some items showed significant bias for the English learner (EL) subpopulation. The items included in the math challenge were also found to be consistent in their scoring difficulty across teams. Successful teams incorporated data beyond the response text, doing extensive pre-processing and utilizing process data.
To read more about the results and analysis, check out the preprint “Results of NAEP Math Item Automated Scoring Data Challenge & Comparison between Reading & Math Challenges.” Also check out the preprint for the reading challenge, “Results of NAEP Reading Item Automated Scoring Data Challenge (Fall 2021).” Learn more about the challenges themselves on the NAEP Math Automated Scoring Challenge page and the NAEP Reading Automated Scoring Challenge page.
Stay tuned to the R&D Hub and subscribe to the R&D program’s mailing list to stay up to date on all the NAEP research, workshops, internships, and other opportunities available to our community.