<?xml version="1.0"?>
<!DOCTYPE ArticleSet PUBLIC "-//NLM//DTD PubMed 2.0//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/static/PubMed.dtd">
<ArticleSet>
  <Article>
    <Journal>
      <PublisherName>Quan Tech Quest Ltd.</PublisherName>
      <JournalTitle>Advances in Medical Informatics</JournalTitle>
      <Issn>2819-8298</Issn>
      <Volume>1</Volume>
      <PubDate PubStatus="epublish">
        <Year>2025</Year>
        <Month>12</Month>
        <Day>30</Day>
      </PubDate>
    </Journal>
    <ArticleTitle>The Power of Small Data</ArticleTitle>
    <FirstPage>8</FirstPage>
    <LastPage>8</LastPage>
    <Language>eng</Language>
    <AuthorList>
      <Author>
        <FirstName>Mohammadjavad </FirstName>
        <LastName>Sayadi</LastName>
        <Affiliation>Department of Computer Engineering, Faculty of Electrical and Computer Engineering, Technical and Vocational University (TVU), Tehran, Iran</Affiliation>
      </Author>
      <Author>
        <FirstName>Seyed Ali</FirstName>
        <LastName>Fatemi Aghda</LastName>
        <Affiliation>Student Research and Technology Committee, School of Health Management and Information Sciences, Iran University of Medical Sciences, Tehran, Iran</Affiliation>
        <Identifier Source="ORCID">0000-0002-4099-1879</Identifier>
      </Author>
      <Author>
        <FirstName>Malihe</FirstName>
        <LastName>Sadeghi</LastName>
        <Affiliation>College of Health Solutions, Arizona State University, Phoenix, USA</Affiliation>
      </Author>
      <Author>
        <FirstName>Behnoosh</FirstName>
        <LastName>Valipour</LastName>
        <Affiliation>Msc Student in Medical Informatics, Department of Health Information Management, School of Health Management and Information Sciences, Iran University of Medical Sciences, Tehran, Iran</Affiliation>
      </Author>
      <Author>
        <FirstName>Vijayakumar </FirstName>
        <LastName>Varadarajan</LastName>
        <Affiliation>Visiting Professor, School of Engineering, University of Diponegoro, Indonesia</Affiliation>
      </Author>
    </AuthorList>
    <History>
      <PubDate PubStatus="received">
        <Year>2025</Year>
        <Month>10</Month>
        <Day>10</Day>
      </PubDate>
      <PubDate PubStatus="accepted">
        <Year>2025</Year>
        <Month>12</Month>
        <Day>05</Day>
      </PubDate>
    </History>
    <Abstract>
Introduction: Voice-based diagnosis of laryngeal diseases has emerged as a promising non-invasive approach in medical technology. However, clinical practice often suffers from limited datasets, making it difficult to train robust machine learning models. This study investigates the role of small data in enabling accurate and efficient detection of laryngeal disorders through voice analysis.


Material and Methods: A comprehensive machine learning framework was developed, incorporating feature extraction techniques such as mel-frequency cepstral coefficients, jitter, shimmer, harmonics-to-noise ratio, and spectral analysis. To overcome small-data limitations, data augmentation strategies, transfer learning from pre-trained speech models, and robust cross-validation were applied. The system was trained and evaluated on limited voice samples collected from patients with diverse laryngeal conditions and healthy controls.


Results: Despite the restricted dataset size, the proposed models achieved competitive performance. The CNN with transfer learning reached an average accuracy of 86%, F1-score of 83%, and AUC of 0.90, outperforming classical approaches such as SVM and Random Forest. Augmentation improved generalization and minority class detection, while feature engineering highlighted the discriminative power of voice quality parameters. Error analysis revealed challenges in detecting mild disorders and borderline cases, but overall results confirmed the feasibility of small-data approaches.


Conclusion: This research underscores the transformative role of small data in advancing voice-based machine learning for laryngeal disease diagnosis. By demonstrating that effective diagnostic systems can be built with limited samples, the study opens new pathways for clinical applications where large datasets are impractical. The approach contributes to democratizing AI-driven healthcare solutions, making them more accessible, scalable, and clinically relevant in real-world medical contexts.
</Abstract>
  </Article>
</ArticleSet>
