EDUCATIONAL DATA MINING
Identifying hidden student engagement patterns via clustering and dimensionality reduction.
This project focuses on the xAPI-Edu-Data dataset, tracking student interactions with an LMS to identify hidden behavioral patterns that correlate with academic success or failure, providing actionable insights for educational intervention.
Clusterability Check: t-SNE visualization confirms the dataset possesses clear, globular clustering structures, validating the use of density-based or medoid-based algorithms.
Noise & Outlier Diagnostics: Left - k-Distance graph used for DBSCAN parameter tuning. Right - Single Linkage Dendrogram revealing that "outliers" are actual students with unique habits rather than system noise.
Research Motivation
By clustering students based on engagement metrics (e.g., hand-raising, resource visits) and family support levels, we help educators design targeted intervention strategies for “at-risk” learners who might otherwise be overlooked.
Technical Methodology
- Outlier Analysis: Combined KNN Distance Plots and Single Linkage Dendrograms to ensure distinct behavioral signatures were preserved for intervention analysis.
- Noise Identification: Used DBSCAN as a diagnostic tool to confirm the absence of system-level noise.
- Data Scaling: Applied Z-score Standardization while maintaining categorical integrity for mixed-data algorithms.
- Clustering Optimization: Implemented Gower Distance for mixed-type data, followed by K-Medoids (PAM) and K-Prototypes.
Technical Stack
Algorithms: K-Medoids, K-Prototypes, DBSCAN, t-SNE
Metrics: Gower Distance, Z-score Standardization
Visualization: Seaborn, SciPy, Matplotlib