EDUCATIONAL DATA MINING

Identifying hidden student engagement patterns via clustering and dimensionality reduction.

This project focuses on the xAPI-Edu-Data dataset, tracking student interactions with an LMS to identify hidden behavioral patterns that correlate with academic success or failure, providing actionable insights for educational intervention.

Clusterability Check: t-SNE visualization confirms the dataset possesses clear, globular clustering structures, validating the use of density-based or medoid-based algorithms.
Noise & Outlier Diagnostics: Left - k-Distance graph used for DBSCAN parameter tuning. Right - Single Linkage Dendrogram revealing that "outliers" are actual students with unique habits rather than system noise.

Research Motivation

By clustering students based on engagement metrics (e.g., hand-raising, resource visits) and family support levels, we help educators design targeted intervention strategies for “at-risk” learners who might otherwise be overlooked.

Technical Methodology

  • Outlier Analysis: Combined KNN Distance Plots and Single Linkage Dendrograms to ensure distinct behavioral signatures were preserved for intervention analysis.
  • Noise Identification: Used DBSCAN as a diagnostic tool to confirm the absence of system-level noise.
  • Data Scaling: Applied Z-score Standardization while maintaining categorical integrity for mixed-data algorithms.
  • Clustering Optimization: Implemented Gower Distance for mixed-type data, followed by K-Medoids (PAM) and K-Prototypes.

Technical Stack

Algorithms: K-Medoids, K-Prototypes, DBSCAN, t-SNE
Metrics: Gower Distance, Z-score Standardization
Visualization: Seaborn, SciPy, Matplotlib