Written by Grace Li
The capacity of computational infrastructures has exploded over the past few decades, propelling the volume and nuance of data collection. Data mining, the application of algorithms to extract insights from these large reservoirs of data, has advanced industries like commerce and the applied sciences; however, the applications of data mining in the fields of pedagogy and education are limited.. In particular, the applications of Education Data Mining (EDM) in traditional classroom settings have long been under-researched and overlooked. This literature review systematically evaluates current applications of EDM to traditional education settings. Additionally, it identifies the core data sources in traditional classrooms, evaluates the leading EDM techniques, outlines the realistic applications of these techniques, and uncovers underdeveloped domains to be considered for future areas of research.
KEYWORDS: Data mining, education technology, systematic review
This paper was made possible through the support and guidance of Mr. Christopher Walleck, project lead at Microsoft, and Dr. Casey Roehrig, project lead at HarvardX and Harvard preceptor. Mr. Walleck provided extremely valuable insights into the nuances behind implementing data mining technologies. He graciously guided the contents of the paper to clearly articulate its conclusion to the general public. Dr. Roehrig expressed exclusive insights through an interview surrounding the status quo of technology use in classrooms, particularly for higher-level institutions. Sincerest thank you to both parties, without whom this research would not have been possible.
Data mining techniques have been deployed over the past half-century to revolutionise the worlds of commerce and science. The modern-day advent of big data makes it possible to extract increasingly granular insights from large volumes of data generated from student learning . These data points range from simple summative information to nuanced qualitative data. While ventures like detailed customer profiling and accurate drug discovery have brought about cutting-edge developments in their respective realms, the potential applications of data mining in the field of education have remained largely theoretical.
Educational data mining (EDM), the application of data mining techniques to pedagogy (teaching)is an interdisciplinary approach that weaves together machine learning, statistical analysis, and data mining to evaluate knowledge retention and cognitive psychology . The applications of EDM can be broadly grouped into three realms:
- Traditional Education: physical classroom settings
- e-Learning: virtual lessons, pre-recorded sessions, and/or Massive Open Online Courses (MOOCs)
- Intelligent Tutoring Systems (ITS): AI-powered ‘smart’ tutoring systems that mimic human tutors
While there is no shortage of research evaluating EDM potential for e-Learning environments and ITS, very few researchers focus on the applications of EDM in traditional classroom settings . This little understood intermediary position, however, may actually be the most important step to bridge the present and the future of education technology. Therefore, this paper aims to analyse this intermediary bridge and summarise the applications of EDM in the average classroom.
The specific applications of EDM are tailored to the end-user who, generally speaking, can belong to one of four distinct groups:
- Students and/or learners
- Teachers and/or instructors
- Course/curriculum developers
- Institutions and/or administrators
This literature review focuses on how EDM can assist course developers and instructors. Through analysis of the most common EDM techniques, filtering methodology, and future research opportunities, this literature review seeks to answer the following research question:
How can EDM help educators and course developers maximise the effectiveness of pedagogy in traditional classrooms?
To answer this research question, this paper explores sources of data, promising data mining algorithms, and promising EDM applications from said data mining techniques.
REVIEW OF LITERATURE
This paper outlines detailed search procedures to ensure the rigour and validity of the literature review presented. Google Scholar was used as the primary data source in order to include both published and pre-printed peer-reviewed studies.
A. Established Search Terminology
The systematic review was conducted to understand the state-of-the-art through the use of key search terms such as Educational Data Mining, traditional education systems, clustering, classification, optimization, regression, association rule mining, and classroom applications to locate relevant papers. A total of 221 papers were identified in this manner.
B. Searching Strategy
To ensure holistic representation, synonyms and plural forms of the key search terms were also considered. Boolean operators such as NOT, AND, and OR, as well as wildcards, were used as tools to specify target search results.
C. Inclusion Criteria
To be included in the literature review, the article, journal, book, or webpage must detail how EDM can facilitate instructors and curriculum planners in traditional classrooms.
D. Exclusion Criteria
Any articles, journals, books, or webpages from the review which analysed how EDM could enhance ITS, e-learning systems, or any other topics outside the scope of the research question were rejected from consideration in this literature review.
The process of identifying primary sources is outlined in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) model, as seen in Fig 1. below .
Fig 1. PRISMA Model of Literature Review
The process of understanding how educators can use data mining algorithms to extract hidden information from big data can be broken down by first identifying possible data sources, then analysing EDM techniques, and finally explaining likely applications in traditional classrooms .
- Identifying Data Sources
- Educational Data Platforms
Web-based platforms such as Google Classroom, Moodle, and Canvas allow teachers to aggregate multimedia course resources into one accessible location. Rather than a platform to teach students content, these sites are designed to help classrooms keep their resources organized. Over the past decade, these technologies have been integrated into mainstream education systems. They have the potential to collect massive volumes of data by analysing students’ clickstreams: identifying the most popular resources, navigation streams, times of login, reading speed, most highlighted text, length of stay, etc. Included under this category of digital platforms are digital textbooks such as Kognity, CourseSmart, and Inkling. These resources can likewise track user clickstream and accumulate statistics around how users interact with the digital media to internalise new material.
- Short Answer Assessments
The second group of data sources for EDM analysis are course assessments. Digital or physical evaluations in the form of quizzes, tests, or exams give valuable insights into the way students answer questions under pressure. As such, they can also be analysed to extract underlying patterns. This is true of digital assessments in particular because they allow for more granular data extraction: users’ eye movement, mouse patterns, and answer switches, for example, can all be evaluated.
- Text or Speech data
In-class long-answer responses in oral or written format can also be used to evaluate learners’ comprehension. This is accomplished through text mining, which is an interdisciplinary field of linguistics, statistics, and data mining .
- Explicit Student Surveys
While student surveys are a widely established medium to collect learner feedback, they are a much more involved approach than the three methods identified above and result in a smaller volume of data collected. In addition to this handicap, surveys also only represent the student’s self-perceived interaction with their learning environment. Rather than reporting students’ unfiltered responses to educational material, survey responses are inevitably censored. Therefore, their potential utility is not discussed in this literature review.
B. Exploring Data Mining Techniques
Data mining is an essential component of the Knowledge Discovery in Databases (KDD) work-frame. As defined by Fayyad, Piatetsky-Shapiro, and Smythview, while KDD refers to the entire workflow of data preparation, selection, cleansing, and interpretation, data mining refers only to applying algorithms to extract relevant patterns from data . In other words, data mining is simply the process of implementing algorithms to search for insights in data: scavenging for the needle in the haystack.
Fig. 2 Visual description of the Knowledge Discover in Database process.
The most popular data mining algorithms include association rule mining, classification, clustering, regression, decision trees, artificial neural networks, and nearest neighbors . Below, the fundamentals of each are succinctly explained to create a complete understanding of the EDM process.
- Association Rule Mining
Association rule mining is a data mining technique that extracts information phrased in the following manner: “If transaction T includes A, then the transaction T also likely includes B,” where A and B are all an element of the same database D. The confidence in this probability is called the Rule of Confidence, and it is summarised by the following conditional probability: p(Y C T | X C T) . This powerful A=>B algorithm is one of the most commonly used methods of EDM because it identifies strong correlations within elements of transactions with a clear minimum confidence threshold [9, 10].
Classification, another popular and well-established data mining technique, uses labelled data to train machine learning algorithms to accurately sort large volumes of data into groups . This means that data points with clearly pre-defined characteristics are given to a machine to sort and classify, often in the form of a training set; the further the algorithm is trained with quality training data,, the more accurate the algorithm becomes in classifying new data .
While classification groups data points into predefined classes, clustering algorithms group together data points that are similar . Clustering is an unsupervised learning technique where the dataset is not pre-labelled with clear characteristics; this is in contrast to classification, which exploits labeled data. In this regard, clustering can be less labour-intensive and thus more practical than classification algorithms .
Regression algorithms find relationships between one or more independent variables and a dependent variable. Thus, when given the independent variable(s), a regression can predict the likelihood of occurrence of a dependent variable.
- Decision Trees
When regression models are combined with classification models, they create Classification and Regression Trees (CART) . These tree-shaped algorithms can be used to deduce optimal multistage decision making pathways .
- Artificial Neural Networks
Neural networks, a subset of machine learning techniques, are modelled loosely after the human brain and have the incredible ability to detect patterns in data that are too nuanced for humans to understand and label . After training on a dataset, neural networks can extract detailed features from the dataset to later sort and classify large volumes of data, which can then be used in future to make accurate predictions based on inputs .
- K-Nearest Neighbour
The nearest neighbour algorithm is a data mining technique that classifies new cases based on how similar they are to stored cases . This mode of pattern recognition is a non-parametric approach that implements distance functions to classify and predict new categories .
C. Pinpointing Physical Classroom Applications
Data mining applied to Education (EDM) can be found in five main forms:
- Evaluate and improve current teaching methods
EDM techniques have been shown to effectively evaluate instructors’ teaching approaches. Logistic regression, for example, is a classification model that predicts the probability of categorical events based on one or more independent variables. It is used to compare and contrast the effect of teachers’ pedagogical techniques . These learning interventions include assessing the effectiveness of scaffolding, hints, and delayed instruction .
In addition to providing feedback on active teaching, EDM can also inform teachers of specific student learning patterns from their homework and test completion data. For example, web services such as the IBM WebSphere Portal allows teachers to record students’ testing and homework performance. In return, the platform can alert teachers to noteworthy performance patterns, such as students performing well on homework but poorly on tests. Such alerts can quantitatively indicate to teachers whether pivots are necessary for their homework, testing, or other pedagogical frameworks .
Likewise, a decision tool powered by clustering and association rule mining aids teachers in improving their teaching techniques based on inputs of historical student successes and failure rates . These input data vary in granularity; ranging from quizzes, to unit assessments, to final exams, to provide a multidimensional analysis of teachers’ instructional impacts [24, 25].
- Evaluate and improve current assessment methods
Going further than interpreting the meaning of test results in the context of student performance, EDM also allows teachers to assess the effectiveness of the testing models themselves. Hierarchical clustering algorithms have shown great potential in assessing the relationships between tested concepts based on multiple-choice answers, as well as grouping the test questions accordingly . These algorithms can create concept models that illustrate the relationships between tested concepts, such as objectively measuring the similarity between concepts .
More than interpreting test questions, EDM can also save instructors administrative work by generating entire unit assessments automatically . Clustering algorithms can analyse the underlying relationships between questions and their respective topics to create scoring matrices. These can then create unit assessments that are representative of semester content.
Written and digital assessments aside, data mining techniques like text mining and content analysis can examine spoken responses to identify target responses. This can aid teachers in objectively grading the content of oral assignments, where assessments of the students’ message can often be unconsciously influenced by separate superficial qualities such as speaking style, appearance, or surrounding distractions .
- Organising teaching materials (e-learning platforms, classwork)
In tandem with improving instruction and evaluation methods, EDM has also shown potential in aiding teachers to organise their content module for optimal learning effectiveness . Either on resource portals such as Canvas, Moodle, and Google Classroom or the order of homework and assessments, data mining techniques such as cluster and association rule can indicate to teachers the most effective organisation technique, as judged by students’ test performances .
- Organising new students
Optimisation algorithms like genetic clustering algorithms and association rule mining can aid in the process of grouping students into classes . This time-saving technique can help teachers and administrators with course scheduling and project allocation by using students’ learning levels as inputs to generate the optimal course schedule to suit students’ needs .
- Managing extra-help students
Finally, classification algorithms, neural networks, decision trees, and logistic regression have all been shown to be extremely effective in predicting student dropout [34, 35, 36, 37]. This gives teachers buffer time to allocate more time and attention towards students who need it, as well as to select students for remedial classes . K-means clustering algorithms also allow teachers to compare and contrast the work habits of stronger versus weaker students, which allows them to offer particularly insightful learning tips to those who are struggling in the class .
A. Barriers to Application
Despite these promising potential applications of education data mining to the future of classrooms, traditional learning environments have largely moved at a snail’s pace in terms of implementation. Due to a historical combination of poor management, psychological aversion to change, and lack of institutional funding, the integration of technology in classrooms is often by teachers as a daunting process that is simply not worth the trouble . This apprehension is extremely significant because teachers’ perceptions of education technology drives their implementation of integrated learning systems .
According to the literature, teachers’ concerns range from a personal concern of being ‘inadequate to meet the demands of new technology’ to worry for student consequences and the potential negative effects ‘too much technology’ can have on students39. Studies have also shown that it is difficult to change educators’ perspectives on technology: teachers must spend a minimum of three years immersed in a technology-rich teaching sphere in order to change their preconceived notions . In essence, technological integration in classrooms means different things to different teachers, therefore the process of its introduction must also be personalised.
B. Future Research
At present, there exists a wealth of research surrounding the theoretical applications and potential success of educational data mining in classrooms. The problem lies in the great shortage of research concerning how we can help introduce these technologies into classrooms so that both students and educators do not feel threatened by or apprehensive about the new technology. Further research concerning the psychology of change in classrooms, educator workshops, and introductory educational data mining training for teachers and students alike deserve a great deal more attention in order to make technology-integrated systems reality.
In this study, the measurement of student success was assigned based on assessment grades and retention of content. Though this is a fairly objective and quantitative metric, it begs a larger question: is this a good indication of student success? Teachers may choose to implement education technology in classrooms in different ways depending on what they think the answer to this question is. In other words, the applications of education technology in this literature review were biased towards measuring success in terms of course marks, and consequently, helping students to improve these.
As for the scope of this research paper, a systematic review methodology was used to maximise thoroughness and evaluation of all available studies. The number of research articles evaluated, however, is still limited because there are research studies that are not available through the Google Scholar search engine. Despite this barrier, this research paper believes that the rigorous PRISMA model used made the quality, volume, and variety of papers an adequately robust representation of the status quo of educational data mining in classrooms.
This paper conducted a detailed literature review of the applications of Educational Data Mining (EDM) to the future of traditional classroom learning. Through detailed search methodology and thorough rejection/acceptance criteria, the literature review is robustly representative of the major development in the field through the last few decades. In summary, the literature review has identified that the principal modes of data collection in traditional classrooms include mainstream educational data platforms, short answer assessments, and speech/voice assessments. The main leading EDM techniques include association rule mining, classification, clustering, regression, decision trees, artificial neural networks, and nearest neighbors. Application of these algorithms can grant educators insights into evaluating the effectiveness of employed pedagogical techniques, and help organise teaching materials, categorise students, allocate resources to extra-help students, and evaluate and improve assessment metrics.
 West DM. Big data for education: Data mining, data analytics, and web dashboards. Governance studies at Brookings 2012;4(1):1-10. https://doi.org/10.1016/b978-0-12-417319-4.00005-3
 Romero C, Ventura S. Educational data mining: a review of the state of the art. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 2010;40(6):601-618. https://doi.org/10.1109/tsmcc.2010.2053532
 PRISMA. (n.d.). Retrieved August 30, 2020, from http://www.prisma-statement.org https://doi.org/10.1107/s0108768104025947/bm5015sup2.hkl
 Mohamad SK, Tasir Z. Educational data mining: A review. Procedia-Social and
Behavioral Sciences 2013;97(2013):320-324. https://doi.org/10.1016/j.sbspro.2013.10.240
 Meyer D, Hornik K, Feinerer I. Text mining infrastructure in R. Journal of statistical software 2008;25(5):1-54. https://doi.org/10.18637/jss.v025.i05
 Baker R. Data mining for education. International encyclopedia of education 2010;7(3):112-118. https://doi.org/10.1016/b978-0-08-044894-7.01318-x
 Baradwaj BK, Pal S. Mining educational data to analyze students’ performance. arXiv preprint arXiv:1201.3417 2012. https://doi.org/10.2172/1044932
 Hipp J, Güntzer U, Nakhaeizadeh G. Algorithms for association rule mining—a general survey and comparison. ACM sigkdd explorations newsletter 2000;2(1):58-64. https://doi.org/10.1145/360402.360421
 Borkar S, Rajeswari K. Predicting students academic performance using education data mining. International Journal of Computer Science and Mobile Computing 2013;2(7):273-279. https://doi.org/10.5120/15022-3310
 Anwar M, Ahmed N. Knowledge mining in supervised and unsupervised assessment data of students’ performance. 2011. https://doi.org/ 10.1007/s10618-011-0234-x
 Agarwal S, Pandey G, Tiwari M. Data mining in education: data classification and decision tree approach. International Journal of e-Education, e-Business, e-Management and e-Learning 2012;2(2):140. https://doi.org/10.7763/ijeeee.2012.v2.97
 Abbas OA. Comparisons Between Data Clustering Algorithms. International Arab Journal of Information Technology (IAJIT) 2008;5(3). https://doi.org/ 10.34028/iajit/17/1/14
 Arabie P, De Soete G. Clustering and classification. World Scientific; 1996. https://doi.org/10.1142/1930
 Lewis RJ. An introduction to classification and regression tree (CART) analysis. 2000. https://doi.org/ 10.7717/peerj.5365/table-2
 Safavian SR, Landgrebe D. A survey of decision tree classifier methodology. IEEE transactions on systems, man, and cybernetics 1991;21(3):660-674. https://doi.org/10.1109/21.97458
 Hassoun MH. Fundamentals of artificial neural networks. MIT press; 1995. https://doi.org/ 10.1145/272874.1067696
 Chua LO, Yang L. Cellular neural networks: Theory. IEEE Transactions on circuits and systems 1988;35(10):1257-1272. https://doi.org/ 10.1109/31.7600
 Peterson LE. K-nearest neighbor. Scholarpedia 2009;4(2):1883. https://doi.org/ 10.4249/scholarpedia.1883
 Dudani SA. The distance-weighted k-nearest-neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics 1976(4):325-327. https://doi.org/ 10.1109/tsmc.1976.5408784
 Kleinbaum DG, Dietz K, Gail M, Klein M, Klein M. Logistic regression. Springer; 2002. https://doi.org/ 10.1002/0471203076.emm0743
 Feng M, Beck JE, Heffernan NT. Using Learning Decomposition and Bootstrapping with Randomization to Compare the Impact of Different Educational Interventions on Learning. International Working Group on Educational Data Mining 2009. https://doi.org/10.1002/9781118998205.ch6
 Singley MK, Lam RB. The classroom sentinel: supporting data-driven decision-making in the classroom. 2005. p 315-321. https://doi.org/10.1145/1060745.1060793
 Vranic M, Pintar D, Skocir Z. The use of data mining in education environment. 2007. IEEE. p 243-250. https://doi.org/ 10.1109/contel.2007.381878
. Selmoune N, Alimazighi Z. A decisional tool for quality improvement in higher education. 2008. IEEE. p 1-6. https://doi.org/10.1109/ictta.2008.4530368
 Chen C-M, Chen M-C. Mobile formative assessment tool based on data mining techniques for supporting web-based learning. Computers & Education 2009;52(1):256-273. https://doi.org/ 10.1016/j.compedu.2008.08.005
. Barnes T. The Q-matrix method: Mining student response data for knowledge. 2005. Pittsburgh, PA: AAAI Press. p 1-8. https://doi.org/ 10.7554/elife.38992.009
 Madhyastha T, Hunt E. Mining Diagnostic Assessment Data for Concept Similarity. Journal of Educational Data Mining 2009;1(1):72-90.
 Spacco J, Winters T, Payne T. Inferring use cases from unit testing. 2006. p 1-7.
 Zhang K, Cui L, Wang H, Sui Q. An improvement of matrix-based clustering method for grouping learners in e-learning. 2007. IEEE. p 1010-1015. https://doi.org/10.1109/cscwd.2007.4281577
 Bin Ramli AA. Web usage mining using apriori algorithm: UUM learning care portal case. 2001. p 1-19. https://doi.org/10.5120/5584-7820
 Shen R, Han P, Yang F, Yang Q, Huang JZ. Data mining and case-based reasoning for distance learning. International Journal of Distance Education Technologies (IJDET) 2003;1(3):46-58. https://doi.org/10.4018/jdet.2003070104
 Zukhri Z, Omar K. Solving new student allocation problem with genetic algorithm: a hard problem for partition based approach. Int. J. Soft Comput. Appl 2008;2:6-15. https://doi.org/10.1109/scored.2007.4451368
 Wang Y-T, Cheng Y-H, Chang T-C, Jen S. On the application of data mining technique and genetic algorithm to an automatic course scheduling system. 2008. IEEE. p 400-405. https://doi.org/10.1109/iccis.2008.4670852
 Kotsiantis SB, Pierrakeas C, Pintelas PE. Preventing student dropout in distance learning using machine learning techniques. 2003. Springer. p 267-274. https://doi.org/10.1007/978-3-540-45226-3_37
 Bresfelean VP, Bresfelean M, Ghisoiu N, Comes C-A. Determining students’ academic failure profile founded on data mining methods. 2008. IEEE. p 317-322. https://doi.org/10.1109/iti.2008.4588429
 Superby J-F, Vandamme J, Meskens N. Determination of factors influencing the achievement of first-year university students using data mining methods. 2006. Citeseer. p 234. https://doi.org/10.1080/09645290701409939
 Dekker GW, Pechenizkiy M, Vleeshouwers JM. Predicting Students Drop Out: A Case Study. International Working Group on Educational Data Mining 2009. https://doi.org/10.7554/elife.38992.009
 Ma Y, Liu B, Wong CK, Yu PS, Lee SM. Targeting the right students using data mining. 2000. p 457-464. https://doi.org/10.7717/peerj.1247/supp-7
 Perera D, Kay J, Koprinska I, Yacef K, Zaïane OR. Clustering and sequential pattern mining of online collaborative learning data. IEEE Transactions on Knowledge and Data Engineering 2008;21(6):759-772. https://doi.org/10.1109/tkde.2008.138
 Mills SC. Integrating computer technology in classrooms: Teacher concerns when implementing an integrated learning system. 1999. Association for the Advancement of Computing in Education (AACE). p 1429-1434. https://doi.org/10.18122/td/1525/boisestate
 Levin T, Wadmany R. Teachers’ beliefs and practices in technology-based classrooms: A developmental view. Journal of research on technology in education 2006;39(2):157-181. https://doi.org/10.1080/15391523.2006.10782478
About the author
Grace is a 16-year-old high school student who is fascinated by the world of Education Technology. She is constantly looking for new ways to learn and explore the world around her through the use of cutting edge novel technologies. In her free time, she loves watching Ted Talks, listening to audiobooks, and French baking.