Building Next-Generation STEM Assessments using Machine Learning Methodologies
This project is important because it illustrates a successful approach to building robust next-generation STEM assessments. It responds to a core gap in STEM assessment by expanding the diversity of tools that STEM educators have to measure what students know and whether our instructional methods are impacting learning. Specifically, this project developed a new tool--EvoGrader--which is a free, online, on-demand formative assessment service designed for use in undergraduate biology classrooms. EvoGraderﾒs web portal is powered by Amazonﾒs Elastic Cloud and run with LightSIDE Labﾒs open-source machine-learning tools. The EvoGrader web portal allows biology instructors to upload a response file (.csv) containing unlimited numbers of evolutionary explanations written in response to 86 different ACORNS (Assessing COntextual Reasoning about Natural Selection) instrument items. The system automatically analyzes the responses and provides detailed information about the scientific and naive concepts contained within each studentﾒs response, as well as overall student (and sample) reasoning model types. Graphs and visual models provided by EvoGrader summarize class-level responses; downloadable files of raw scores (in .csv format) are also provided for more detailed analyses.
The overarching goal of this project was to build a new model of biology assessment grounded in cognitive principles that employed machine learning technologies to interpret written assessment items. The activities included: (1) building assessment items grounded in current understandings of cognition; (2) using Rasch methods to build measurement models; (3) developing machine-learning models for the analysis of text; (4) building an online system to automatically analyze and report assessment results.
Supervised machine learning is the core of EvoGrader. EvoGrader uses machine learning methods to extract key
concept scores, na�ve idea scores, and holistic reasoning
model scores from text responses. An integral part of
supervised machine learning in this case is a large corpus of
explanations previously scored by domain experts. This
corpus (i.e. training set) helps the software ﾓlearnﾔ what to
look for in the written explanations, and lies at the heart of EvoGrader portal. In order to test how well the portal works, a series of experiments were performed. Comparisons were made among: (1) human-scored written explanations, (2) a widely used multiple-choice test, and (3) clinical oral interviews with students. Rasch analyses of scores indicated that computer-scored written explanation measures
(1) have the strongest correspondence to oral interview
measures; (2) are capable of capturing studentsﾒ
normative scientific and naive ideas as accurately as
human-scored explanations, and (3) more validly detect
understanding than the multiple-choice assessment. These
findings demonstrate the great potential of machine-learning
tools for assessing key scientific practices.
The key outcomes of this project include: (1) A better understanding of the cognitive processes that undergird thinking about a core idea in biology; (2) A new assessment tool (ACORNS) to measure these cognitive processes; (3) A proof of concept of the utility of machine learning tools for STEM educators though the development of EvoGrader; and (4) A free to use portal for biology educators to more effectively measure student reasoning.
In the past year the first article introducing the website was published (Moherrari et al. 2014). This project currently has instructors from eight institutions using the new assessment tool, and many more are uploading data to the website. More than 10,000 student responses have been analyzed in the 2 years since the website was built. The website provides one of the first free, online tools for assessing text-based scientific explanations in undergraduate students.
Working across disciplinary boundaries--cognitive psychology, psychometrics, computer science, and biology education--is a remarkably difficult task. Differences in disciplinary language, research norms, and project conceptualizations necessitate large amounts of time to build unified understanding among stakeholders. Extensive communication and numerous meetings (much more than anticipated) was essential to creating shared vision and achieving project outcomes. It can take a year to 'line up the gears' on a large, interdisciplinary project, but once they are lined up amazing innovation can emerge. Without sufficient time for stakeholders to understand each other it is unlikely that innovation will emerge.
Moharreri, K. Ha, M., Nehm, R. H. (2014). EvoGrader: An Online Formative Assessment Tool for Automatically Evaluating Written Evolutionary Explanations. Evolution: Education and Outreach.7(1), 15: doi:10.1186/s12052-014-0015-2
Beggrow, E. P., Ha, M., Nehm, R.H. , Pearl, D., & Boone, W. J. (2014 ). Assessing scientific practices using machine-learning methods: How closely do they match clinical interview performance? Journal of Science Education and Technology . 23, 1, pp 160-182; DOI: 10.1007/s10956-013-9461-9 (Science ﾓEditorﾒs Choiceﾔ; feature article New Republic ).
Opfer, J., Nehm, R.H., Ha, M. (2012). Cognitive Foundations for Science Assessment Design:Knowing What Students Know about Evolution. Journal of Research in Science Teaching. 49(6): 744ﾖ777