Published on Friday, September 20, 2024

NAEP Researchers and NAEP Doctoral Internship Alumni at IAEA Annual Conference

The 49th International Association for Educational Assessment (IAEA) Annual Conference will be held in Philadelphia in just a few days, on September 22–25, 2024. This year’s theme explores the growing interest in artificial intelligence by asking, “How Can AI Help Improve Educational Assessments?” In this post, we highlight upcoming IAEA presentations from NAEP researchers and NAEP Doctoral Student Internship Program alumni.

LLM-Assisted Passage Generation for Reading and Science Items

As part of the 4:30–6:30 p.m. paper session on Monday, September 23, Bhashithe Abeysinghe (NAEP researcher and NAEP Doctoral Student Internship Program alumnus), Ruhan Circi (NAEP expert and principal data scientist at the American Institutes for Research [AIR]), and Emmanuel Sikali (acting chief of NCES’s Reporting and Dissemination branch) will co-present their paper, “LLM-Assisted Passage Generation for Reading and Science Items.” Check out the summary below for a preview of the presentation:

The innovative capabilities of large language models (LLMs) have made text generation easy. LLMs can be used for educational assessments, including reading comprehension and science tasks. However, the usability of generated passages depends on the controllable aspects of the generation process.

Our investigation examines LLMs’ ability to control text difficulty, length, and content. While instruction tuning can effectively manage content and length, regulating difficulty is challenging due to vague complexity definitions. We use the Flesch–Kincaid grade level to guide readability, a method widely used in legal and health care documents.

In this work, we evaluate GPT-3.5, GPT-4, and GPT-4o by generating passages using a database from NAEP reading and science assessments for grades 4, 8, and 12. These passages help calibrate model difficulty using few-shot learning, with the Flesch–Kincaid grade level as a guide. We also present our ongoing work to build a writing assistant with a multi-agentic LLM system.

AI-Powered Innovations in the U.S. National Assessment of Educational Progress

Ruhan Circi will also participate in the 10:30–11:30 a.m. symposium on Wednesday, September 25, “AI-Powered Innovations in the U.S. National Assessment of Educational Progress,” She will discuss algorithmic bias in AI/ML applications within NAEP assessments. Dr. Circi will explain how bias—both as a social and mathematical concept—can arise from data selection, model choices, and architecture, leading to unfair outcomes across different implementations. Drawing from both technical and nontechnical perspectives, she will emphasize the growing body of work on fairness in AI and the need for transparency and accountability in addressing these issues in the context of NAEP.

Introducing Ask NAEP and Its Evaluation: A Generative AI Chatbot About the NAEP Assessment

AIR data scientist Luke Patterson and NAEP expert and AIR senior researcher Ting Zhang, alongside their team of co-presenters (Magdalen Beiting-Parrish, Paul Bailey, Blue Webb, Bhashithe Abeysinghe, and Emmanuel Sikali), will present their paper, “Introducing Ask NAEP and Its Evaluation: A Generative AI Chatbot That Informs Stakeholders About the NAEP Assessment,” as part of the 12:00–1:30 p.m. session on Tuesday, September 24. See the summary below for a preview:

This work presents a multidimensional evaluation framework to evaluate an information retrieval chatbot, Ask NAEP. Developed by AIR for NCES, the chatbot uses the Retrieval-Augmented Generation (RAG) technique to provide accurate and comprehensive responses to queries about publicly available information from the National Assessment of Educational Progress (NAEP).

This research involved conducting a series of experiments to explore the impact of incorporating a retrieval component into GPT-3.5 and GPT-4o large language models (LLMs). We evaluated the combined retrieval and generative processes using a multidimensional framework, focusing on correctness, completeness, and communication. The findings revealed that GPT-4o consistently outperformed GPT-3.5, with statistically significant improvements across all dimensions. Incorporating retrieval into the pipeline further enhanced performance, with the RAG approach resulting in high-quality responses. Ask NAEP reduced the occurrence of hallucinations, increasing the correctness measure from 85.5% to 92.7% of responses.

The study demonstrates that leveraging LLMs like GPT-4o, along with a robust RAG technique, significantly improves the quality of responses generated by the Ask NAEP chatbot. These enhancements can help users navigate the extensive NAEP documentation more effectively by providing accurate responses to their queries.

You can still register for the 2024 IAEA Annual Conference and find a program with session details here. If you’re interested in joining the ranks of our NAEP Doctoral Internship alumni, which includes AI as its newest topic area, you can learn more about the program here. Keep following the NAEP R&D Hub to stay up to date and be the first to apply for the next available cohort!

Comments (0)Number of views (164)

R&D Hub

NAEP Researchers and NAEP Doctoral Internship Alumni at IAEA Annual Conference

More links

Categories

Apply Now: Summer 2025 NAEP Doctoral Student Internship Program

EdSurvey 4.0.7 Released!

Connect

Opportunities

Featured Work

R&D Hub

NAEP Researchers and NAEP Doctoral Internship Alumni at IAEA Annual Conference

More links

Categories

Apply Now: Summer 2025 NAEP Doctoral Student Internship Program

EdSurvey 4.0.7 Released!

Tags