The Power of Small Data

Advancing Voice-Based Machine Learning for Laryngeal Disease Diagnosis

Authors

  • Mohammadjavad Sayadi Department of Computer Engineering, Faculty of Electrical and Computer Engineering, Technical and Vocational University (TVU), Tehran, Iran
  • Seyed Ali Fatemi Aghda Student Research and Technology Committee, School of Health Management and Information Sciences, Iran University of Medical Sciences, Tehran, Iran Corresponding Author
    https://orcid.org/0000-0002-4099-1879
    afatamy@yahoo.com
  • Malihe Sadeghi College of Health Solutions, Arizona State University, Phoenix, USA
  • Behnoosh Valipour Msc Student in Medical Informatics, Department of Health Information Management, School of Health Management and Information Sciences, Iran University of Medical Sciences, Tehran, Iran
  • Vijayakumar Varadarajan Visiting Professor, School of Engineering, University of Diponegoro, Indonesia

Keywords:

Voice Disorders, Laryngeal Diseases, Speech Acoustics, Machine Learning, Transfer Machine Learning

Abstract

Introduction: Voice-based diagnosis of laryngeal diseases has emerged as a promising non-invasive approach in medical technology. However, clinical practice often suffers from limited datasets, making it difficult to train robust machine learning models. This study investigates the role of small data in enabling accurate and efficient detection of laryngeal disorders through voice analysis.

Material and Methods: A comprehensive machine learning framework was developed, incorporating feature extraction techniques such as mel-frequency cepstral coefficients, jitter, shimmer, harmonics-to-noise ratio, and spectral analysis. To overcome small-data limitations, data augmentation strategies, transfer learning from pre-trained speech models, and robust cross-validation were applied. The system was trained and evaluated on limited voice samples collected from patients with diverse laryngeal conditions and healthy controls.

Results: Despite the restricted dataset size, the proposed models achieved competitive performance. The CNN with transfer learning reached an average accuracy of 86%, F1-score of 83%, and AUC of 0.90, outperforming classical approaches such as SVM and Random Forest. Augmentation improved generalization and minority class detection, while feature engineering highlighted the discriminative power of voice quality parameters. Error analysis revealed challenges in detecting mild disorders and borderline cases, but overall results confirmed the feasibility of small-data approaches.

Conclusion: This research underscores the transformative role of small data in advancing voice-based machine learning for laryngeal disease diagnosis. By demonstrating that effective diagnostic systems can be built with limited samples, the study opens new pathways for clinical applications where large datasets are impractical. The approach contributes to democratizing AI-driven healthcare solutions, making them more accessible, scalable, and clinically relevant in real-world medical contexts.

References

1. Idrisoglu A, Dallora AL, Anderberg P, Berglund JS. Applied machine learning techniques to diagnose voice-affecting conditions and disorders: Systematic literature review. J Med Internet Res. 2023: 25: e46105. PMID: 37467031 DOI: 10.2196/46105

2. Di Cesare MG, Perpetuini D, Cardone D, Merla A. Assessment of voice disorders using machine learning and vocal analysis of voice samples recorded through smartphones. BioMed Informatics. 2024; 4(1): 549-65.

3. Abdul Latiff NM, Al-Dhief FT, Md Sazihan NFS, Baki MM, Nik Abd. Malik NN, Abbood Albadr MA, et al. Voice pathology detection using machine learning algorithms based on different voice databases. Results in Engineering. 2025; 25: 103937.

4. Sayadi M, Langarizadeh M, Torabinezhad F, Bayazian G. Voice as an indicator for laryngeal disorders using data mining approach. Frontiers in Health Informatics. 2024; 13: 205.

5. Sindhu I, Sainin MS. Automatic speech and voice disorder detection using deep learning: A systematic literature review. IEEE Access. 2024; 12: 49667-81.

6. Schuller BW, Batliner A, Bergler C, Mascolo C, Han J, Lefter I, et al. The INTERSPEECH 2021 computational paralinguistics challenge: COVID-19 cough, COVID-19 speech, escalation & primates. arXiv Preprint. 2021; 210213468.

7. Little M, Mcsharry P, Roberts S, Costello D, Moroz I. Exploiting nonlinear recurrence and fractal scaling properties for voice disorder detection. Biomed Eng Online. 2007; 6: 23. PMID: 17594480 DOI: 10.1186/1475-925X-6-23

8. Zhuang F, Qi Z, Duan K, Xi D, Zhu Y, Zhu H, et al. A comprehensive survey on transfer learning. Proceedings of the IEEE. 2020; 109(1): 43-76.

9. Maharana K, Mondal S, Nemade B. A review: Data pre-processing and data augmentation techniques. Global Transitions Proceedings. 2022; 3(1): 91-9.

10. Xu M, Yoon S, Fuentes A, Park DS. A comprehensive survey of image augmentation techniques for deep learning. Pattern Recognition. 2023; 137: 109347.

11. Ko T, Peddinti V, Povey D, Khudanpur S. Audio augmentation for speech recognition. Interspeech; 2015.

12. Baevski A, Zhou Y, Mohamed A, Auli M. wav2vec 2.0: A framework for self-supervised learning of speech representations. Conference on Neural Information Processing Systems. 2020; 33: 12449-60.

13. Zhao L, Zhang Z. A improved pooling method for convolutional neural networks. Scientific Reports. 2024; 14(1): 1589.

14. Chinta SV, Wang Z, Palikhe A, Zhang X, Kashif A, Smith MA, et al. AI-driven healthcare: Fairness in AI healthcare: A survey. PLOS Digit Health. 2025; 4(5): e0000864. PMID: 40392801 DOI: 10.1371/journal.pdig.0000864

15. Bagheri M, Bagheritaba M, Alizadeh S, Parizi MS, Matoufinia P, Luo Y. AI-driven decision-making in healthcare information systems: A comprehensive review. PrePrint. 2024.

16. Nobel SN, Swapno SMR, Islam MR, Safran M, Alfarhood S, Mridha M. A machine learning approach for vocal fold segmentation and disorder classification based on ensemble method. Scientific Reports. 2024; 14(1): 14435.

17. Quamar D, Ambeth Kumar V, Rizwan M, Bagdasar O, Kadar M. Voice-based early diagnosis of Parkinson’s disease using spectrogram features and AI models. Bioengineering (Basel). 2025; 12(10): 1052. PMID: 41155050 DOI: 10.3390/bioengineering12101052

18. Baldini C, Azam MA, Sampieri C, Ioppi A, Ruiz-Sevilla L, Vilaseca I, et al. An automated approach for real-time informative frames classification in laryngeal endoscopy using deep learning. Eur Arch Otorhinolaryngol. 2024; 281(8): 4255-64. PMID: 38698163 DOI: 10.1007/s00405-024-08676-z

19. Madha SKR, Satya Narayana Reddy K, Rohit TD, Prasad Reddy P, Jyothish Lal G. Vocal fold cancer diagnosis: Leveraging nonlinear and linear features for accurate detection. International Conference on Communication and Intelligent Systems. Springer; 2024.

20. Guvenir H, Burak A, Muderrisoglu H, Quinlan R. Arrhythmia dataset: UCI machine learning repository [Internet]. 1997 [cited: 10 Mar 2025]. Available from: https://archive.ics.uci.edu/dataset/5/arrhythmia

21. Farazi S, Shekofteh Y. Voice pathology detection on spontaneous speech data using deep learning models. International Journal of Speech Technology. 2024; 27(3): 739-51.

22. Pham TD, Holmes SB, Zou L, Patel M, Coulthard P. Diagnosis of pathological speech with streamlined features for long short-term memory learning. Comput Biol Med. 2024; 170: 107976. PMID: 38219647 DOI: 10.1016/j.compbiomed.2024.107976

23. Lin Y-S, Chen H-Y, Huang M-L, Hsieh T-Y. Data augmentation for voiceprint recognition using generative adversarial networks. Algorithms. 2024; 17(12): 583.

24. Regondi S, Donvito G, Frontoni E, Kostovic M, Minazzi F, Bratières S, et al. Artificial intelligence empowered voice generation for amyotrophic lateral sclerosis patients. Scientific Reports. 2025; 15(1): 1361.

25. Zhang X, Zhang X, Chen W, Li C, Yu C. Improving speech depression detection using transfer learning with wav2vec 2.0 in low-resource environments. Scientific Reports. 2024; 14(1): 9543.

26. Kunešová M, Zajíc Z, Šmídl L, Karafiát M. Comparison of wav2vec 2.0 models on three speech processing tasks. International Journal of Speech Technology. 2024; 27(4): 847-59.

27. Klempíř O, Krupička R. Analyzing Wav2Vec 1.0 embeddings for cross-database Parkinson’s disease detection and speech features extraction. Sensors (Basel). 2024; 24(17): 5520. PMID: 39275431 DOI: 10.3390/s24175520

28. Nudelman CJ, Tardini V, Bottalico P. Artificial intelligence to detect voice disorders: An AI-supported systematic review of accuracy outcomes. J Voice. 2025; S0892-1997(25): 00389-3. PMID: 41047306 DOI: 10.1016/j.jvoice.2025.09.021

29. Popover JL, Wallace SP, Feldman J, Chastain G, Kalathia C, Imam A, et al. Artificial intelligence in medicine: A specialty-level overview of emerging AI trends. JSLS. 2025; 29(3): e2025.00041. PMID: 40917162 DOI: 10.4293/JSLS.2025.00041

Published

2025-12-30

Issue

Section

Original Research Articles

How to Cite

1.
Sayadi M, Fatemi Aghda SA, Sadeghi M, Valipour B, Varadarajan V. The Power of Small Data: Advancing Voice-Based Machine Learning for Laryngeal Disease Diagnosis. Adv Med Inform [Internet]. 2025 Dec. 30 [cited 2026 Feb. 12];1:8. Available from: https://aimi.quantechquest.com/index.php/AIMI/article/view/16