How do you want to get involved?
Included are abstracts from the most recently awarded cooperative agreements under the NAEP Research and Development Program.
NAEP Reading and Mathematics Assessments have been scored by human readers since the introduction of such items into the program. Advances in automated scoring have reached the point that scoring responses to content-specific constructed response items using an automated scoring engine may produce benefits over hand scoring. The proposal is to conduct a study using hand-scored responses to recently administered Reading and Mathematics NAEP items to (1) determine whether an automated scoring engine can accurately predict the scores assigned by human readers, (2) compare the costs and benefits of automated scoring vs. human scoring, and (3) establish a protocol for future automated scoring research.
The investigators will employ the automated scoring engine Project Essay Grade (PEG®) to create and validate scoring models for Reading and Mathematics items. Specifically, the investigators will draw samples of student responses in both content areas from recent administrations of NAEP. Using these samples, models will be constructed to replicate scores assigned by human readers. The models will then be cross validated by applying them to independent samples of student responses. The team will compare scores assigned by PEG with the scores assigned by human readers, reporting results in terms of raw score agreement rates, correlation coefficients, and kappa coefficients. A cost-benefit analysis of automated scoring will be conducted. The team will compare not only the reliability of automated scores to that of human scorers but also the cost of deriving the scores.
Having completed these studies, the team will work with NAEP staff and officials to map out a program of research in which to apply automated scoring to remaining content areas.
NAEP is the only comparative assessment of academic competencies regularly administered to nationally representative samples of students enrolled in Grades 4, 8, and 12. However, due to a low-stakes assessment context in NAEP, there are long-standing questions about the level of engagement and effort of the participating students and, consequently, about the validity of the reported results. For assessment results to be valid, students should be sufficiently engaged with the assessment tasks to reflect their knowledge and skills accurately. The proposed project investigates the effects of engagement-enhanced features (EEFs) on the performance of 8th graders on Science and TEL (Technology and Engineering Literacy) assessments closely modeled on the most recent NAEP test in order to evaluate the promise of EEFs in terms of reducing the likelihood that scores obtained at future administrations of NAEP assessment underestimate students’ knowledge and skills. More specifically, this project is aimed to identify problematic items in NAEP assessments based on analysis of process data from a sample of current NAEP assessments and empirically explore ways to increase student engagement in future NAEP assessments. Four classes of EEFs will be developed for selected NAEP items focusing on the following engagement facets: relevance, authenticity, cognitive complexity, and self-assessment. The EEF and original NAEP items will be tested and validated with students through cognitive labs using multimodal measures of student engagement.
The project team will cooperate with NCES project staff on all the activities including prioritization of NAEP domains and item-types to be included in the project; identifying data from most recent Science and TEL assessments; development and validation of the EEF through cognitive labs with students; overall analysis and reporting; and collaborative work on providing operational recommendations for further research and development in NAEP.
An accurate assessment of educational progress remains a challenging task mainly due to a various levels of student’s preparedness and a large number of various individual learning and expression styles of students. For example, some students score better with a large number of multiple choice problems and perform poorly with a small number of constructed responses. Other students are just the opposite: they can make excellent and thorough constructed responses, while suffer from their inability to quickly switch attention among a large number of simple testing problems. Therefore, when testing students’ knowledge an exam must have an adequate number of problems with various specifications such as subject areas, difficulty levels, and formats (multiple choice or not), etc. As a result, in the absence of a systematic and scientific approach to test building, exams and tests are often suffer from inability to reconcile multiple goals and criteria. They may focus excessively on testing some subject areas while only scarcely covering the others. Alternatively they may be too focused on some particular format, or be too difficult or simple.
The project goal is to improve the quality of tests capable of accurate assessment of student knowledge by providing scientifically sound methodology and software for test development. The proposed methodology is based on optimization algorithms. The algorithms provide the best possible examination design based on specified goals, constraints, and psychometric data. A novel user friendly software that implements the developed methodology will be delivered. The software will place mathematically challenging modeling and implementation issues at the "backend", and provide a user friendly interface, or the “frontend” that will control specifications of tests that need to be built.
As a result of the proposed effort, the education community will have an easy to use tool TestDesign capable of assisting specialists in preparation of high quality tests to advance goals of National Assessment of Educational Progress.