An Explainable AI Framework for Voice Command Classification

Dr. S. Devibala; M.S.Chandhana Pandi

doi:10.71366/ijwos03022659307

Authors

Dr. S. Devibala Assistant Professor, Sri Ramakrishna College of Arts & Science
Author
M.S.Chandhana Pandi Student, Sri Ramakrishna College of Arts & Science
Author

DOI:

https://doi.org/10.71366/ijwos03022659307

Keywords:

Explainable AI, Voice Command Classification, SHAP, LIME, Grad-CAM, MFCC, Attention Mechanism, CNN, Phoneme Alignment, Interpretability, Deep Learning

Abstract

A feeling of discomfort tends to arise when working with something whose inner workings remain unclear. Voice-based systems have mostly escaped such scrutiny - since they function adequately most times, few tend to probe deeper. Yet those designing high-stakes applications must dig further; passive acceptance carries too much risk. It wasn't about proving neural nets can recognize spoken words correctly - we already had evidence for that - it was about peering into the model afterward, extracting reasoning clear enough for doctors, engineers, or regulators to act upon. Our approach used a convolutional network enhanced with attention mechanisms, fed with MFCC features pulled from Google Speech Commands v2, which holds over one hundred thousand utterances across thirty-five labels, achieving 97.3% precision. Floating above came SHAP, then LIME, followed by Grad-CAM - each serving as interpretability tools. A separate stage mapped timing-based relevance onto individual speech sounds. Results stayed true to model behavior while making sense in terms of spoken language. Yet beyond expectation emerged strong divergence among techniques when examples blurred category lines.