J. Today’s Ideas - Tomorrow’s Technol.

A Review on Prediction of Academic Performance of Students at-Risk Using Data Mining Techniques

Preet Kamal, Sachin Ahuja


Data Mining (DM), Educational data mining (EDM), Education System

PUBLISHER The Author(s) 2017. This article is published with open access at www.chitkara.edu.in/publications

Educational data mining is the procedure of converting raw data collected from educational databases into some useful information. It can be helpful in designing and answering research questions like performance prediction of students in academics, factors that affect the students’ performance, help the teachers in understanding the problems faced by the students to understand the course content and complexity of the subject taken so that the teachers can take timely action to control the dropout rate. This also includes improving the teaching learning process so that the interventions can be taken at the right time to improve the performance of the student. This paper is the review of the research work done in the field of educational data mining for the prediction of students’ performance. The factors that influence the performance of the students i.e. the type of classrooms they attend such as traditional or on-line, socio-economic, educational background of the family, attitude toward studies and challenges faced by the students during course progress. These factors leads to the categorization of the students into three groups “Low-Risk”: who have High probability of succeeding, “Medium-Risk”: who may succeed in their examination, “High-Risk”: who have High probability of failing or drop-out. It elaborates the different ways to improve the teaching learning process by providing the students personal assistance, notes, class-assignments and special class tests. The most efficient techniques that are used in educational data mining are also reviewed such as; classification, regression, clustering and and prediction.


Educational Data Mining (EDM) is the application of Data Mining (DM) and its objective is to analyze the different types of data in order to resolve educational research issues [16]. Data Mining is the process of extracting useful and important information from data sets. It is being used by organizations, scientists and governments from last so many years to collect data like airline passenger records and record of census data [2]. The volume of educational data has increased with advancement of technologies. It can be handled using Data Mining techniques. The educational institutes are also getting automated with the help of advanced technologies.

The educational research in Data Mining also contributes a lot to the predictive technologies. Data Mining is set up on the theory that the historic data retains the hidden and unknown information observed as a challenging task in data prediction. Data analysis is one way of forecasting the growth or decline in academic performance. The use of internet and e-learning in the field of education has facilitated the students. On the other hand offline education- is the medium to exchange knowledge and develop skills by faceto-face interaction. The tutor can easily understand the behavior of the student towards his studies. The data mining techniques can be applied to such data like students’ behavior towards his studies, performance in their academics, family background and the data collected form students in classroom interactions. Such data help to create student models. E-learning and Learning Management System (LMS) is the combination of online instruction and communication that collaborates administration and reporting tools. Intelligent Tutoring (ITS) and Adaptive Educational Hypermedia System (AEHS) acquire background knowledge about teaching strategy and student behavior are few examples of student models. [17].

Page(s) 30–39
URL http://dspace.chitkara.edu.in/jspui/bitstream/123456789/5/1/jotitt.2017.51002.pdf
ISSN Print : 2321-3906, Online : 2321-7146
DOI 10.15415/jotitt.2017.51002

The experiments have been carried out in open universities in countries other than India. The research work done in this field is especially related to subjects like psychology, mathematics, history and home science. Very few focused on the technical courses like : computers. Majority of the studies conducted are predicting the performance of the students based on the demographic, academic and social factors alien to the Indian environment. Since Indian culture and living style is different, it demands a different study to exactly relate to its educational system. There is a need to further explore the Indian education system so that the factors that affect the students performance can be studied according to the Indian scenario.

  • Razzaq, Leena, and Neil T. Heffernan. “Scaffolding vs. hints in the Assistment System” International Conference on Intelligent Tutoring Systems. Springer Berlin Heidelberg, 2006.
  • Han, J., Kamber, M. (2006). Data Mining: Concepts and Techniques, Morgan Kaufmann Publisher.
  • Feng, Mingyu, and Neil T. Heffernan. “Informing teachers live about student learning: Reporting in the assistment system” Technology Instruction Cognition and Learning 3.1/2 pp. 63 (2006).
  • Tahir, Syed, and S. R. Naqvi. “FACTORS AFFECTING STUDENTS’PERFORMANCE.” Bangladesh e-journal of sociology 3, no. 1 pp. 2 (2006).
  • Mierswa, Ingo, Michael Wurst, Ralf Klinkenberg, Martin Scholz, and Timm Euler. “Yale: Rapid prototyping for complex data mining tasks.” In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 935-940. ACM, 2006.
  • Nghe, Nguyen Thai, Paul Janecek, and Peter Haddawy. “A comparative analysis of techniques for predicting academic performance.” In 2007 37th Annual Frontiers In Education Conference-Global Engineering: Knowledge Without Borders, Opportunities Without Passports, pp. T2G-7. IEEE, 2007. http://dx.doi.org/10.1109/FIE.2007.4417993.
  • Superby, Juan-Francisco, J. P. Vandamme, and N. Meskens. “Determination of factors influencing the achievement of the first-year university students using data mining methods.” In Workshop on Educational Data Mining, vol. 32, p. 234. 2006.
  • Romero, Cristóbal, Sebastián Ventura, Pedro G. Espejo, and César Hervás. “Data mining algorithms to classify students.” In Educational Data Mining 2008.
  • Feng, Mingyu, Joseph Beck, Neil Heffernan, and Kenneth Koedinger. “Can an Intelligent Tutoring System Predict Math Proficiency as Well as a Standardized Test?.” In Educational Data Mining 2008.
  • Antunes, Cláudia. “Acquiring background knowledge for intelligent tutoring systems.” In Educational Data Mining 2008.
  • Amershi, Saleema, and Cristina Conati. “Combining Unsupervised and Supervised Classification to Build User Models for Exploratory.” JEDM-Journal of Educational Data Mining 1, no. 1 pp.18-71 (2009).
  • Baker, Ryan SJD, and Kalina Yacef. “The state of educational data mining in 2009: A review and future visions.” JEDM-Journal of Educational Data Mining 1, no. 1 pp. 3-17 (2009).
  • Ayers, Elizabeth, Rebecca Nugent, and Nema Dean. “A Comparison of Student Skill Knowledge Estimates.” International Working Group on Educational Data Mining (2009).
  • Hall, Mark, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. “The WEKA data mining software: an update.” ACM SIGKDD explorations newsletter 11, no. 1 pp. 10-18 (2009).
  • Barnes, T., M. Desmarais, C. Romero, and S. Ventura. “Educational Data Mining 2009: 2nd International Conference on Educational Data Mining.” Proceedings Cordoba, Spain (2009).
  • Romero, Cristóbal, and Sebastián Ventura. “Educational data mining: a review of the state of the art.” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 40, no. 6 pp. 601-618 (2010).
  • Razzaq, Leena, Jozsef Patvarczki, Shane F. Almeida, Manasi Vartak, Mingyu Feng, Neil T. Heffernan, and Kenneth R. Koedinger. “The Assistment Builder: Supporting the life cycle of tutoring system content creation.” IEEE Transactions on Learning Technologies 2, no. 2 pp. 157-166. (2009). http://dx.doi.org/10.1109/TLT.2009.23.
  • Ramaswami, M., and R. Bhaskaran. “A CHAID based performance prediction model in educational data mining.” arXiv preprint arXiv:1002.1144(2010).
  • Kumar, S. Anupama, and M. N. Vijayalakshmi. “Efficiency of decision trees in predicting students’ academic performance.” In First International Conference, on Computer Science, Engineering and Applications, CS and IT, vol. 2, pp. 335-343. (2011).
  • Kumar, Varun, and Anupama Chadha. “An empirical study of the applications of data mining techniques in higher education.” International Journal of Advanced Computer Science and Applications 2, no. 3 (2011). http://dx.doi.org/10.14569/IJACSA.2011.020314.
  • Shih, Benjamin, Kenneth R. Koedinger, and Richard Scheines. “A response time model for bottom-out hints as worked examples.” Handbook of educational data mining pp. 201-212 (2011).
  • Yadav, Surjeet Kumar, Brijesh Bharadwaj, and Saurabh Pal. “Data mining applications: A comparative study for predicting students’ performance.” arXiv preprint arXiv:1202.4815 (2012).
  • Yadav, Surjeet Kumar, Brijesh Bharadwaj, and Saurabh Pal. “Data mining applications: A comparative study for predicting students’ performance.” arXiv preprint arXiv:1202.4815 (2012).
  • Kabakchieva, Dorina. “Student performance prediction by using data mining classification algorithms.” International Journal of Computer Science and Management Research 1, no. 4 pp. 686-690 (2012).
  • Tair, Mohammad M. Abu, and Alaa M. El-Halees. “Mining educational data to improve students’ performance: a case study.” International Journal of Informational 2, no.2 (2012).
  • Asiri, Mahdi, and Behrouz Minaei. “Predicting GPA and academic dismissal in LMS using educational data mining: A case mining.” In 6th National and 3rd International conference of e-Learning and e-Teaching, pp. 53-58. IEEE, 2012.
  • Wolff, Annika, Zdenek Zdrahal, Andriy Nikolov, and Michal Pantucek. “Improving retention: predicting at-risk students by analysing clicking behaviour in a virtual learning environment.” In Proceedings of the third international conference on learning analytics and knowledge, pp. 145-149. ACM, 2013.
  • Demšar, Janez, and Blaž Zupan. “Orange: Data Mining Fruitful and Fun-A Historical Perspective.” Informatica 37, no. 1 (2013).
  • Wolff, Annika, Zdenek Zdrahal, Drahomira Herrmannova, and Petr Knoth. “Predicting student performance from combined data sources.” In Educational Data Mining, pp. 175-202. Springer International Publishing, 2014.
  • Pandey, Mrinal, and Vivek Kumar Sharma. “A decision tree algorithm pertaining to the student performance analysis and prediction.” International Journal of Computer Applications 61, no. 13 (2013).
  • Huang, Shaobo, and Ning Fang. “Predicting student academic performance in an engineering dynamics course: A comparison of four types of predictive mathematical models.” Computers & Education 61 pp. 133-145. (2013). https://doi.org/10.1016/j.compedu.2012.08.015.
  • Romero, Cristóbal, Manuel-Ignacio López, Jose-María Luna, and Sebastián Ventura. “Predicting students’ final performance from participation in on-line discussion forums.” Computers & Education 68 pp. 458-472. (2013). https://doi.org/10.1016/j.compedu.2013.06.009.
  • Marquez-Vera, Carlos, Cristóbal Romero Morales, and Sebastián Ventura Soto. “Predicting school failure and dropout by using data mining techniques.” IEEE Revista Iberoamericana de Tecnologias del Aprendizaje8, no. 1 pp. 7-14. (2013). https://doi.org/10.1109/ RITA.2013.2244695.
  • Hlosta, Martin, Drahomira Herrmannova, Lucie Vachova, Jakub Kuzilek, Zdenek Zdrahal, and Annika Wolff. “Modelling student online behaviour in a virtual learning environment.” (2014).
  • Wolff, Annika, Zdenek Zdrahal, Drahomira Herrmannova, Jakub Kuzilek, and Martin Hlosta. “Developing predictive models for early detection of at-risk students on distance learning modules.” (2014).
  • Patil, Priyanka Anandrao, and R. V. Mane. “Prediction of Students Performance Using Frequent Pattern Tree.” In Computational Intelligence and Communication Networks (CICN), 2014 International Conference on, pp. 1078-1082. IEEE, 2014.
  • Ahmed, Abeer Badr El Din, and Ibrahim Sayed Elaraby. “Data Mining: A prediction for Students’ Performance Using Classification Method.” World Journal of Computer Application and Technology 2, no. 2 pp. 43-47 (2014).
  • Kuzilek, Jakub, Martin Hlosta, Drahomira Herrmannova, Zdenek Zdrahal, and Annika Wolff. “OU Analyse: analysing at-risk students at The Open University.” Learning Analytics Review pp. 1-16 (2015).
  • Ahmad, Fadhilah, Nur Hafieza Ismail, and Azwa Abdul Aziz. “The Prediction of Students’ Academic Performance Using Classification Data Mining Techniques.” Applied Mathematical Sciences 9, no.129 pp. 6415-6426. (2015).
  • Saxena, Ritika. “Educational data Mining: Performance Evaluation of Decision Tree and Clustering Techniques using WEKA Platform. “” International Journal of Computer Science and Business Informatics (2015).
  • Deshpande, Akshay, Prashant Pimpare, Shashank Bhujbal, Abhishek Kommwar, and Jagruti Wagh. “Student Performance Analysis, Visualization and Prediction Using Data Mining Techniques.” Imperial Journal of Interdisciplinary Research 2, no. 5 (2016).
  • Puyalnithi, Thendral, V. Madhu Viswanatham, and Ashmeet Singh. “Comparison of Performance of Various Data Classification Algorithms with Ensemble Methods Using RAPIDA MINER.” International Journal 6, no. 5 (2016).
  • Rana, Shiwani, and Roopali Garg. “Evaluation of Students’ Performance of an Institute Using Clustering Algorithms.” International Journal of Applied Engineering Research 11, no. 5 pp. 3605-3609. (2016).
  • Ostrow, Korinn S., and Neil T. Heffernan. “Studying Learning at Scale with the ASSISTments TestBed.” In Proceedings of the Third (2016) ACM Conference on Learning@ Scale, pp. 333-334. ACM, 2016.
  • Asif, R., Haider, N. G., & Ali, S. A. (2016). Prediction of UndergraduateStudents’ Performance using Data Mining Methods. International Journal of Computer Science and Information Security, 14(5), 374.
  • Kumar, M., Shambhu, S., & Aggarwal, P. (2016). Recognition of Slow Learners Using Classification Data Mining Techniques. Imperial Journal of Interdisciplinary Research, 2(12).
  • Ferrara, S., & Way, D. (2016). 2 Design and Development of End-of-Course Tests for Student Assessment and Teacher Evaluation. In Meeting the Challenges to Measurement in an Era of Accountability (p. 11). Routledge.
  • Alcala-Fdez, J., Garcia, S., Fernandez, A., Luengo, J., Gonzalez, S., Saez, J. A., ... & Herrera, F. (2016). Comparison of KEEL versus open source Data Mining tools: Knime and Weka software.
  • Read, J., Reutemann, P., Pfahringer, B., & Holmes, G. (2016). Meka: a multi-label/multi-target extension to weka. Journal of Machine Learning Research, 17(21), 1-5.
  • Ostrow, K. S., & Heffernan, N. T. (2016, April). Studying Learning at Scale with the ASSISTments TestBed. In Proceedings of the Third (2016) ACM Conference on Learning@ Scale (pp. 333-334). ACM.
  • Atinaf, W., & Petros, P. (2016). Socio Economic Factors Affecting Female Students Academic Performance at Higher Education. Health Care: Current Reviews, 4(163), 2.
  • Feng, M., & Roschelle, J. (2016, April). Predicting Students’ Standardized Test Scores Using Online Homework. In Proceedings of the Third (2016) ACM Conference on Learning@ Scale (pp. 213-216). ACM.