Published on Friday, October 18, 2024

NAEP Researchers and NAEP Doctoral Internship Program Alumni at the FCSM Research and Policy Conference

The 2024 Federal Committee on Statistical Methodology (FCSM) Research and Policy Conference will be held October 22–24, 2024, in Hyattsville, Maryland. This year’s theme, “The Relevance, Timeliness, and Integrity of Federal Statistics,” reflects the FCSM’s role in providing a forum for statisticians to discuss issues of statistical methodology as they relate to federal agencies and statistical programs. In this post, we provide an overview of upcoming presentations at the FCSM conference from NAEP researchers and NAEP Doctoral Internship Program alumni.

Chatbot Evaluation: Methods and Challenges

As part of the 1:45 p.m. Session J-1: Keeping AI out of Trouble: Guardrails and Applications for Federal Data on Thursday, October 24, Ruhan Circi (NAEP expert and principal data scientist at the American Institutes for Research [AIR]) and Bhashithe Abeysinghe (NAEP researcher at AIR and NAEP Doctoral Student Internship Program alumnus) will co-present “Chatbot Evaluation: Methods and Challenges.” Check out the summary below for a preview of their presentation:

Chatbot development is rapidly evolving due to advances in Large Language Model (LLM) APIs, with federal agencies launching LLM-based applications, as exemplified in Executive Order 13960. Creating reliable LLM-powered applications requires careful consideration of performance and ethical standards, as highlighted by authors like Srivastava et al. (2023). Evaluating these applications is crucial, incorporating both technical assessment and trust-oriented frameworks.

LLMs, fundamental to many applications, pose unique challenges in chatbot development, such as hallucination and tone issues. Given the widespread adoption of LLM-based generative applications like chatbots, robust evaluation systems are therefore essential. Two main evaluation approaches, automated metrics, and human evaluation, are commonly used in tandem. Considering time and cost constraints, balancing evaluation procedures without sacrificing accuracy is vital.

Human evaluation, a crucial aspect that often needs to be well-detailed in research papers, remains integral in the context of chatbot development. Additionally, diverse metrics assess different aspects of chatbot responses, yet the specificity of testing data is often overlooked. Incorporating various question types informed by cognitive psychology frameworks could enhance systematic evaluation.

In our research, we propose improvements in evaluation techniques, informed by a cognitive psychology framework, to enhance chatbot response assessment. We will present our findings on an experimental chatbot utilizing Retrieval Augmented Generation (RAG) and Vector Databases, which contribute to the ongoing advancements in chatbot development.

Algorithmic Bias: Developing a Multidimensional Framework

In the same 1:45 p.m. session, Ruhan Circi will return, joined by co-presenter Juanita Hicks (NAEP senior researcher at AIR and NAEP Doctoral Student Internship Program alumna), to present “Algorithmic Bias: Developing a Multidimensional Framework.” See the summary below for a preview:

In our rapidly advancing technological era, algorithms are used to extract meaning from data and play an increasingly important role in decision-making processes across federal agencies. The potential for the use of data is undeniable in any industry with a growing number of innovations in methods and technology. Although machine learning (ML)/Artificial intelligence (AI) has the promise to improve the outputs and reduce inequity (e.g., in health, Feehan et al., 2021), in practice, they are also prone to the risk of propagating existing gaps or having unintended harm by creating a feedback loop between data, algorithms, and users (e.g., Hooker, 2021).

Researchers approach algorithmic bias from two perspectives: a) technical and b) non-technical. Technical perspectives focus on data and model choices, architecture, and loss functions (e.g., Lalor et al., 2024). Nontechnical perspectives cover transparency, interpretability, accountability, and ethical aspects (e.g., Cecere et al., 2024). Algorithmic bias is a multidimensional issue requiring an effective framework that adopts a diverse and multidimensional perspective.

Our framework synthesizes insights from the growing literature on algorithmic bias by developing an index that takes both technical and non-technical approaches. We examine its implications and identify strategies to mitigate associated risks in the context of survey practices. We discuss our framework, which recognizes the complex nature of algorithmic bias and ensures the promotion of fairness throughout ML/AI systems' lifecycles. We illustrate the application of our framework using a real federal dataset on decision-making algorithms.

You can still register for the 2024 FCSM Research and Policy Conference and find a program with session details here. If you’re interested in joining the ranks of our NAEP Doctoral Internship alumni, which includes AI as its newest topic area, you can learn more about the program here. Keep following the NAEP R&D Hub and subscribe to our mailing list to stay up to date and be the first to apply for the next available cohort!

Comments (0)Number of views (130)

R&D Hub

NAEP Researchers and NAEP Doctoral Internship Program Alumni at the FCSM Research and Policy Conference

More links

Categories

Apply Now: Summer 2025 NAEP Doctoral Student Internship Program

EdSurvey 4.0.7 Released!

Connect

Opportunities

Featured Work

R&D Hub

NAEP Researchers and NAEP Doctoral Internship Program Alumni at the FCSM Research and Policy Conference

More links

Categories

Apply Now: Summer 2025 NAEP Doctoral Student Internship Program

EdSurvey 4.0.7 Released!

Tags