NAEP Research and Development (R&D) Virtual Showcase

NAEP RD virtual showcase poster.
On December 8, 2020, the NAEP Research and Development Program hosted a virtual showcase where awardees presented the work funded under the Cooperative Agreements Program.

About the Cooperative Agreements

The Cooperative Agreement program is part of the NAEP Research and Development Program and funded by the National Center for Education Statistics (NCES). This cooperative agreement opportunity is one avenue in which NAEP continues to pursue its goal of accurately measuring student progress and advancing the field of large-scale assessment. The goal of the NAEP Cooperative Agreements is to provide opportunities to address known challenges and bring new ideas and innovations to NAEP.

AIR, on behalf of NCES, solicited applications for funding this work, which was conducted collaboratively with NCES NAEP staff and their contractors. Funds under this program support development, research, and analysis to advance NAEP’s testing administration; scoring; analysis and reporting; secondary data analysis; and product development.


Each presentation included a brief Q&A from an expert panel. Click below to read more about each presentation, watch the webinar, and download Q&A.

Logo of measurement incorporate

Presenter(s): Dr. Derek Justice and Dr. Corey Palermo

11:35 am-11:50 am ET

This study investigates the efficacy of automated scoring of NAEP constructed-response items. Three subject areas were examined (Civics, Reading, and U.S. History) in two grades (4 and 8). A total of 24 items with short textual responses and acceptable traditional human scoring agreement metrics were chosen for evaluation. Sample sizes ranging from 1,500 to 2,000 human scored responses were obtained and used to train item-specific models with Measurement Incorporated's Project Essay Grade (PEG) automated scoring engine. Roughly 15 percent of the sample was excluded from training and used as a validation sample to evaluate the performance of each item model. Of the 24 items modeled, 10 passed a set of standard acceptance criteria proposed by Williamson et al. (2012). The accepted models were found to provide a cost-effective alternative to human scoring. Planned future work includes building stakeholder confidence in the accepted models, improving the rejected models, and expanding the automated scoring application to other subject areas where PEG is known to be effective, such as writing.

Williamson, D. M., Xi, X., & Breyer, F. J. (2012). A framework for evaluation and use of automated scoring. Educational Measurement: Issues and Practice, 31(1), 2-13.

Logo of IA GRIVA CONSULTING Photo of Dr. Igor

Presenter(s): Dr. Igor Griva

11:55am-12:10 pm ET

Assessment personnel often must create parallel student test forms. This presentation introduces TestBuilder, a software program that automates the parallel form creation. The program has a user interface for entering the requirements for the test, with a back-end optimization engine that chooses optimal items based on their Item Response Theory (IRT) parameters and test design constraints. TestBuilder outputs the test forms along with their information functions and can place the forms into test booklets.

headshot of David Kaplan

Presenter(s): Dr. David Kaplan, Professor, University of Wisconsin–Madison

12:15–12:30 pm ET

Of critical importance to education policy is monitoring trends in education outcomes over time. In the United States, NAEP has provided data since 1969, with trend data since 1990 at the national level and since 1992 at the state level. In addition, since 2002, selected urban districts (on a trial basis) have also participated. Thus, NAEP can provide important monitoring and forecasting information regarding population-level academic performance. The purpose of this study is to provide a “proof-of-concept” that state NAEP assessments can be used to (1) specify cross-state growth regressions; and (2) develop Bayesian probabilistic predictive models that can be used to forecast trends across states in important educational outcomes while accounting for uncertainty in every step of the modeling process.

logo of ACT headshot of Yigal Rosen headshot of Kristin Stoeffler

Presenter(s): Dr. Yigal Rosen and Ms. Kristin Stoeffler

12:35–12:50 pm ET

Due to the low-stakes assessment context of NAEP, there are long-standing questions about the level of engagement and effort of the participating students and, consequently, about the validity of the reported results. The ACT research team partnered with NCES and AIR to identify potential concerns for student (dis)engagement and test potential changes to increase student engagement in the pilot Interactive Computer Tasks (ICT) in science for grades 4 and 8. In this presentation, Dr. Yigal Rosen and Ms. Kristin Stoeffler will share (1) promising methods to detect student (dis)engagement in pilot ICT tasks; and (2) information about the current effort to validate engagement augmentation techniques with students in virtual cognitive interviews.

title of plugged in news

The Summer 2024 NAEP Data Training Workshop - Applications Open


Applications are now open for the summer 2024 NAEP Data Training Workshop! This workshop is for quantitative researchers with strong statistical skills who are interested in conducting data analyses using NAEP data. For the first time, participants in this year's training will get an introduction to COVID data collections. Learn more here!

EdSurvey e-book now available!


Analyzing NCES Data Using EdSurvey: A User's Guide is now available for input from the research community online here.  Check it out and give the team your feedback.