Building a Big Data Analytics Workforce in iSchools

Project No.
PI Name
Jungwoo Ryoo
The Pennsylvania State University

Abstract 1

Building a Big Data Analytics Workforce in iSchools

Presentation Type
Jungwoo Ryoo, Penn State Soo-yong Byun, Penn State


The significance and importance of this project resides in the introduction of big data analytics into the education landscape. There is increasing demand for skilled personnel in big data industries, but existing big data curricula at the university level focus primarily on students with a strong computational background, ignoring a large segment of students who might otherwise pursue education and training in this vital area, but who will be faced with big data issues in the workplace.


This project aims at addressing the national demand for professionals with knowledge in big data and broadening the pool for a big data analytics workforce. Part of this effort will involve research as to whether the newly developed learning modules are more effective at increasing students' big data competencies, e.g., knowledge, skills, and analysis.


This project will develop three innovative learning modules. These modules will be designed to: (i) utilize both group-based and contextualized learning methods and (ii) be applicable and accessible to students majoring in disciplines outside, but related to main-stream computer science (e.g., iSchools).


The first module will involve digital exercises where students will be asked to develop their own narratives about the relevance and significance of big data in solving real-life problems and will be expected to become knowledgeable of, and proficient with, big data concepts and applications. The second module will be more technical in nature and will allow students to discover the efficacy of big data concepts in solving practical problems in information security. Finally, the third module will introduce more advanced topics in big data mining, such as examining a large amount of complex data to unearth important patterns and knowledge, and introducing how to interpret the results to arrive at appropriate decisions in a specific context. Analysis of the research question surrounding the learning effectiveness will employ quasi-experimental designs that use pretests and posttests with control groups. Students will choose a course section without knowledge of which section will include the new learning modules. In analyzing the data, a hierarchical linear modeling approach will be used to evaluate the effectiveness of the intervention.

Broader Impacts

The project will have a direct impact on increasing the number and diversity of undergraduate students with computational competencies and big data skills. Initially, our proposal will positively contribute to training undergraduate students in 4-year iSchools and 2year community colleges. Eventually, the pedagogical findings and developed teaching materials will have a positive influence on general computing education with the focus on big data analytics in broader scientific and engineering communities throughout the nation. The developed big data analytics materials through this proposal will be freely available from our project web site (e.g., We will also make a conscious effort to build a community of likemined big data educators through both online and offline forums. While our proposal initially draws ideas and materials from the NSF-initiated CS 10k project, the success of our proposal will in return contribute to increasing the effectiveness of computing education in high schools (e.g., contributing developed materials to the CS 10k community).

Unexpected Challenges

Not applicable


Not applicable

Project Page