On December 8, 2020, the NAEP Research and Development Program hosted a
virtual showcase where awardees presented the work funded under the
Cooperative Agreements Program.
ABOUT THE COOPERATIVE AGREEMENTS
The Cooperative Agreement program is part of the NAEP Research and Development
Program and funded by the National Center for Education Statistics (NCES).
This cooperative agreement opportunity is one avenue in which NAEP continues
to pursue its goal of accurately measuring student progress and advancing the
field of large-scale assessment. The goal of the NAEP Cooperative Agreements
is to provide opportunities to address known challenges and bring new ideas
and innovations to NAEP.
AIR, on behalf of NCES, solicited applications for funding this work, which
was conducted collaboratively with NCES NAEP staff and their contractors.
Funds under this program support development, research, and analysis to
advance NAEP’s testing administration; scoring; analysis and reporting;
secondary data analysis; and product development.
Agenda
Each presentation included a brief Q&A from an expert panel. Click below to
read more about each presentation, watch the webinar, and
download Q&A.
Automated Scoring of NAEP Constructed Response Items | Measurement Incorporated
Presenter(s): Dr. Derek Justice and Dr. Corey Palermo
11:35 am-11:50 am ET
This study investigates the efficacy of automated scoring of NAEP constructed-response items. Three subject areas were examined (Civics, Reading, and U.S. History) in two grades (4 and 8). A total of 24 items with short textual responses and acceptable traditional human scoring agreement metrics were chosen for evaluation. Sample sizes ranging from 1,500 to 2,000 human scored responses were obtained and used to train item-specific models with Measurement Incorporated's Project Essay Grade (PEG) automated scoring engine. Roughly 15 percent of the sample was excluded from training and used as a validation sample to evaluate the performance of each item model. Of the 24 items modeled, 10 passed a set of standard acceptance criteria proposed by Williamson et al. (2012). The accepted models were found to provide a cost-effective alternative to human scoring. Planned future work includes building stakeholder confidence in the accepted models, improving the rejected models, and expanding the automated scoring application to other subject areas where PEG is known to be effective, such as writing.
Williamson, D. M., Xi, X., & Breyer, F. J. (2012). A framework for evaluation and use of automated scoring. Educational Measurement: Issues and Practice, 31(1), 2-13.
TestBuilder: A Test Design Assisting Tool | IAGriva Consulting
Presenter(s): Dr. Igor Griva
11:55am-12:10 pm ET
Assessment personnel often must create parallel student test forms. This presentation introduces TestBuilder, a software program that automates the parallel form creation. The program has a user interface for entering the requirements for the test, with a back-end optimization engine that chooses optimal items based on their Item Response Theory (IRT) parameters and test design constraints. TestBuilder outputs the test forms along with their information functions and can place the forms into test booklets.
Bayesian Probabilistic Forecasting with State NAEP Data | Dr. David Kaplan, University of Wisconsin–Madison
Presenter(s): Dr. David Kaplan, Professor, University of Wisconsin–Madison
12:15–12:30 pm ET
Of critical importance to education policy is monitoring trends in education outcomes over time. In the United States, NAEP has provided data since 1969, with trend data since 1990 at the national level and since 1992 at the state level. In addition, since 2002, selected urban districts (on a trial basis) have also participated. Thus, NAEP can provide important monitoring and forecasting information regarding population-level academic performance. The purpose of this study is to provide a “proof-of-concept” that state NAEP assessments can be used to (1) specify cross-state growth regressions; and (2) develop Bayesian probabilistic predictive models that can be used to forecast trends across states in important educational outcomes while accounting for uncertainty in every step of the modeling process.
Enhancing the Validity of NAEP Interactive Computer Tasks through Detection of Student (Dis)engagement and Augmentation | ACT
Due to the low-stakes assessment context of NAEP, there are long-standing questions about the level of engagement and effort of the participating students and, consequently, about the validity of the reported results. The ACT research team partnered with NCES and AIR to identify potential concerns for student (dis)engagement and test potential changes to increase student engagement in the pilot Interactive Computer Tasks (ICT) in science for grades 4 and 8. In this presentation, Dr. Yigal Rosen and Ms. Kristin Stoeffler will share (1) promising methods to detect student (dis)engagement in pilot ICT tasks; and (2) information about the current effort to validate engagement augmentation techniques with students in virtual cognitive interviews.