This week, we’re able to share a new working paper, Do Survey Questions Have a Mood? Linking Sentiment to Student Test-Taking Behaviors, from NAEP researchers and co-authors Soo Lee, Ruhan Circi, and Adam Hearn. The working paper explores how test-taking behaviors are impacted by the sentiment of assessment questions, as well as how AI-powered sentiment analysis can help to analyze survey item responses such as those collected by NAEP survey questionnaires. The full text of the working paper is included below; subscribe to the NAEP R&D mailing list to stay up to date as this research develops and to access more opportunities to learn about NAEP data analysis.
Do Survey Questions Have a Mood? Linking Sentiment to Student Test-Taking Behaviors
Survey questions are critical for collecting data on student characteristics, perceptions, and opinions in large-scale assessments. They provide both quantitative and qualitative insights, helping educators and researchers identify trends, attitudes, and behavioral patterns. One of the strengths of survey questions is their ability to capture nuanced responses, offering insights into students' emotional states and cognitive engagement during assessments.
Schulz and Carstens (2020) noted that large-scale assessments, whether paper-based or digital, often rely on written surveys for their cost-effectiveness and ease of administration. These surveys are the primary means by which students are guided regarding the type of information sought. The phrasing of questions is typically shaped by the subject matter being assessed, such as “I enjoy thinking about new solutions to problems” in mathematics. As Hooper (2021) highlights, designing these questionnaires involves a complex process and requires balancing multiple objectives and stakeholder interests.
Survey items inherently convey sentiment. Natural Language Processing (NLP) techniques, specifically sentiment analysis, provide a means to extract and quantify the sentiment embedded in these questions. This might help stakeholders better understand students' reactions to these questions, measuring concepts and indices in different subject domains. Furthermore, exploring the relationship between sentiment scores, student performance, and test-taking behaviors might provide valuable insights that can improve survey design and enhance its effectiveness as a formative assessment tool.
Combining survey items’ sentiment values and recent methodological advancements opens new opportunities. It enables us to examine how the sentiment embedded in domain-specific questions, such as those in mathematics, relates to students’ performance and response processes in assessments like NAEP. This intersection of language models and foundational data from large-scale assessment(s) can offer deeper insights to improve survey item development and enhance our understanding of how students engage with these items.
Role of NLP for Survey Item Sentiments
Sentiment analysis has gained significant popularity and become one of the most widely applied and powerful NLP tools for uncovering human sentiment, making it invaluable across various fields, including education. It involves extracting sentiments from text to understand opinions, views, and impressions (e.g., Ligthart et al., 2021; Mishra & Singh, 2021).
Wankhade et al. (2024) provide a comprehensive review of sentiment analysis, categorizing it by analysis levels, data collection methods, feature embeddings, methodologies, and evaluation approaches. Their work highlights sentiment analysis at multiple levels, including document, sentence, phrase, and aspect. Methods range from manual approaches using predefined lexicons to dictionary-based methods like TextBlob (Loria et al., 2014) and VADER (Hutto & Gilbert, 2014). More recently, techniques such as BERT (Devlin et al., 2018) and GPT have gained prominence (e.g., Kheiri & Karimi, 2023). Evaluation methods typically involve tools like the confusion matrix for classification tasks, categorizing sentiments as negative, neutral, or positive.
The shift from traditional methods to AI-powered sentiment analysis has improved precision in labeling student feedback, as evidenced in recent studies. For instance, Shaik et al. (2023) explored both lexicon-based and corpus-based approaches for analyzing sentiment in unsupervised survey data, demonstrating the potential of these techniques for extracting valuable insights. Wen et al. (2014) also showcased the utility of sentiment analysis in Massive Open Online Courses (MOOCs), using it to better understand student emotions through discussion forums and course reviews, ultimately enhancing the overall learning experience.
The National Assessment of Educational Progress (NAEP) survey questionnaires have long been used to investigate how student attitudes and behaviors relate to academic achievement. In this context, extracting sentiment at the sentence level and linking it with Likert scale responses provides valuable insights. Sentiment scores can be generated using dictionary-based approaches or pre-trained models, allowing for further exploration of their relationship with students’ test-taking behaviors and performance. Sentiment analysis offers a powerful method for understanding these connections more deeply. Research has shown that responses to negatively worded questions often carry negative sentiment (Oh et al., 2012; Olson et al., 2019), highlighting how language can shape student feedback.
Building on this intersection, our upcoming study analyzes the sentiment scores of math-specific ‘non-cognitive’ items and examines their relationship to student behavior and performance.
Stay tuned to the NAEP R&D Hub to learn more about our findings as we explore the role of sentiment analysis in survey item responses!
References
Devlin, J. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint. https://arxiv.org/abs/1810.04805
Hooper, M. (2021). Dilemmas in Developing Context Questionnaires for International Large-Scale Assessments. In T. Nilsen, A. Stancel-Piątak, & J.-E. Gustafsson (Eds.), International Handbook of Comparative Large-Scale Studies in Education. Springer International Handbooks of Education. Springer. https://doi.org/10.1007/978-3-030-38298-8_29-1
Hutto, C., & Gilbert, E. (2014). VADER: A parsimonious rule-based model for sentiment analysis of social media text. Proceedings of the International AAAI Conference on Web and Social Media, 8(1), 216–225. https://ojs.aaai.org/index.php/ICWSM/article/view/14550
Kheiri, K., & Karimi, H. (2023). SentimentGPT: Exploiting GPT for advanced sentiment analysis and its departure from current machine learning. SentimentGPT [PDF file]. https://arxiv.org/abs/2307.10234
Ligthart, A., Catal, C., & Tekinerdogan, B. (2021). Systematic reviews in sentiment analysis: A tertiary study. Artif Intell Rev, 54, 4997–5053. https://doi.org/10.1007/s10462-021-09973-3
Loria, S., Keen, P., Honnibal, M., Yankovsky, R., Karesh, D., & Dempsey, E. (2014). TextBlob: Simplified Text Processing. https://textblob.readthedocs.io/en/dev/
Mishra, A., & Singh, V. K. (2021). Applying sentiment analysis in educational contexts: A literature review. Journal of Educational Technology Development and Exchange, 14(1), 1–15.
Oh, J. H., Torisawa, K., Hashimoto, C., Kawada, T., De Saeger, S., & Wang, Y. (2012, July). Why question answering using sentiment analysis and word classes. In Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning (pp. 368–378).
Olson, K., Smyth, J. D., & Ganshert, A. (2019). The effects of respondent and question characteristics on respondent answering behaviors in telephone interviews. Journal of Survey Statistics and Methodology, 7(2), 275–308. https://doi.org/10.1093/jssam/smy006
Schulz, W., & Carstens, R. (2020). Questionnaire Development in International Large-Scale Assessment Studies. In Wagemaker, H. (Ed.), Reliability and Validity of International Large-Scale Assessment (IEA Research for Education, vol. 10). Springer. https://doi.org/10.1007/978-3-030-53081-5_5
Shaik, T., Tao, X., Dann, C., Xie, H., Li, Y., & Galligan, L. (2023). Sentiment analysis and opinion mining on educational data: A survey. Natural Language Processing Journal, 2, 100003.
Wankhade, M., Kulkarni, C., & Rao, A. C. S. (2024). A survey on aspect base sentiment analysis methods and challenges. Applied Soft Computing, 112249.
Wen, M., Yang, D., & Rosé, C. P. (2014). Wen, M., Yang, D. and Rose, C. P. 2014. Sentiment Analysis in MOOC Discussion Forums: What does it tell us? In the Proceedings of the 7th International Conference on Educational Data Mining, 130-137.