Application of KNN algorithm for predicting celiac disease using clinical and serological variables
Abstract
Celiac disease is an autoimmune condition with a global prevalence close to 1%, often underdiagnosed due to low clinical suspicion, which increases both morbidity and mortality. In this context, the application of the K-Nearest Neighbors (KNN) algorithm emerged as a predictive model to support the detection of this disease using clinical and serological variables. A supervised model was developed using the KNN algorithm and clinical and serological data extracted from an academic dataset containing 2,206 records. To address class imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) was applied. The data were split for training and validation, optimizing the classification parameter through cross-validation. In addition, a web platform was developed to support data input, analysis, and output, allowing the uploading, processing, and generation of medical reports with role-based access and diagnostic probability estimation. The model achieved 94% accuracy, 97% precision, and 91% sensitivity. The algorithm proved to be effective for predicting celiac disease based on clinical and serological data, and its web-based implementation enables practical integration in clinical environments.
Downloads
References
S. Jabeen et al., “Disease specific symptoms indices in patients with celiac disease—A hardly recognised entity,” Front Nutr, vol. 9, Sep. 2022, doi: 10.3389/fnut.2022.944449.
B. Lebwohl and A. Rubio-Tapia, “Epidemiology, Presentation, and Diagnosis of Celiac Disease,” Gastroenterology, vol. 160, no. 1, pp. 63–75, Jan. 2021, doi: 10.1053/j.gastro.2020.06.098.
A. F. Syam, A. P. Utari, N. H. Hasanah, A. Rizky, and M. Abdullah, “Prevalence and factors associated with celiac disease in high-risk patients with functional gastrointestinal disorders,” PLoS One, vol. 19, no. 6, p. e0297605, Jun. 2024, doi: 10.1371/journal.pone.0297605.
Ö. Aydemir et al., “Polymorphisms in Intron 1 of HLA-DRA Differentially Associate with Type 1 Diabetes and Celiac Disease and Implicate Involvement of Complement System Genes C4A and C4B,” Jun. 20, 2023. doi: 10.1101/2023.06.12.23291280.
E. Crehuá-Gaudiza et al., “Diagnóstico de enfermedad celiaca en la práctica clínica: presente y futuro,” An Pediatr (Engl Ed), vol. 94, no. 4, pp. 223–229, Apr. 2021, doi: 10.1016/j.anpedi.2020.07.008.
T. Ben Houmich and B. Admou, “Celiac disease: Understandings in diagnostic, nutritional, and medicinal aspects,” Int J Immunopathol Pharmacol, vol. 35, Jan. 2021, doi: 10.1177/20587384211008709.
P. Singh, A. D. Singh, V. Ahuja, and G. K. Makharia, “Who to screen and how to screen for celiac disease,” World J Gastroenterol, vol. 28, no. 32, pp. 4493–4507, Aug. 2022, doi: 10.3748/wjg.v28.i32.4493.
A. H. Abend et al., “Estimation of prevalence of autoimmune diseases in the United States using electronic health record data.,” J Clin Invest, vol. 135, no. 4, Dec. 2024, doi: 10.1172/JCI178722.
D. Andari, R. Hanna-Wakim, S. Khafaja, and N. Yazbeck, “Clinical presentations and outcomes of celiac disease in children and adolescents at a tertiary care center in Lebanon,” Front Pediatr, vol. 13, Jan. 2025, doi: 10.3389/fped.2025.1527114.
C. M. Trovato et al., “Clinical Presentations of Celiac Disease: Experience of a Single Italian Center,” Nutrients, vol. 17, no. 1, p. 129, Dec. 2024, doi: 10.3390/nu17010129.
A. A. Esmail et al., “Celiac Disease among Outpatient Attendees with Gastrointestinal Complaints at a Tertiary Care Hospital in Sana’a City, Yemen: A Four-Year Retrospective Study,” University of Science and Technology Journal for Medical Sciences, vol. 3, Feb. 2025, doi: 10.59222/ustjms.3.3.
H. Wieser, C. Ciacci, C. Soldaini, C. Gizzi, and A. Santonicola, “Gastrointestinal and Hepatobiliary Manifestations Associated with Untreated Celiac Disease in Adults and Children: A Narrative Overview,” J Clin Med, vol. 13, no. 15, p. 4579, Aug. 2024, doi: 10.3390/jcm13154579.
M. Alfawaz et al., “Clinical Characteristics of Celiac Disease Patients in Qassim Region,” J Family Med Prim Care, vol. 13, no. 3, pp. 827–832, Mar. 2024, doi: 10.4103/jfmpc.jfmpc_895_23.
G. Ortiz et al., “A multicenter study: New cut‐off values of antitransglutaminase antibodies processed by chemiluminescence in children with suspected celiac disease,” JPGN Rep, vol. 6, no. 2, pp. 107–112, May 2025, doi: 10.1002/jpr3.12169.
A. Shatnawei, A. H. AlNababteh, R. D. Govender, S. Al-Shamsi, A. AlJarrah, and R. H. Al-Rifai, “Mode of presentation and performance of serology assays for diagnosing celiac disease: A single-center study in the United Arab Emirates,” Front Nutr, vol. 10, Apr. 2023, doi: 10.3389/fnut.2023.1107017.
R. Sghiri, H. Ben Hassine, A. Almogren, Z. Shakoor, and M. Alswayyed, “Diagnostic performances of celiac disease serological tests among Saudi patients,” Saudi Journal of Gastroenterology, vol. 29, no. 1, pp. 31–38, Jan. 2023, doi: 10.4103/sjg.sjg_280_22.
G. Losurdo et al., “Serologic diagnosis of celiac disease: May it be suitable for adults?,” World J Gastroenterol, vol. 27, no. 42, pp. 7233–7239, Nov. 2021, doi: 10.3748/wjg.v27.i42.7233.
N. Conrad et al., “Incidence, prevalence, and co-occurrence of autoimmune disorders over time and by age, sex, and socioeconomic status: a population-based cohort study of 22 million individuals in the UK,” The Lancet, vol. 401, no. 10391, pp. 1878–1890, Jun. 2023, doi: 10.1016/S0140-6736(23)00457-9.
S. K. Dooraki, “Early detection of celiac disease through its common symptoms using machine learning algorithms,” Journal of Clinical Images and Medical Case Reports, vol. 5, no. 3, Mar. 2024, doi: 10.52768/2766-7820/2915.
A. F. A. H. Alnuaimi and T. H. K. Albaldawi, “An overview of machine learning classification techniques,” BIO Web Conf, vol. 97, no. Conferencia Web BIO. Volumen 97, 2024 Quinta Conferencia Científica Internacional de la Universidad Alkafeel (ISCKU 2024, p. 00133, Apr. 2024, doi: https://doi.org/10.1051/bioconf/20249700133.
C.-A. Stoleru, E. H. Dulf, and L. Ciobanu, “Automated detection of celiac disease using Machine Learning Algorithms.,” Sci Rep, vol. 12, no. 1, p. 4071, Mar. 2022, doi: 10.1038/s41598-022-07199-z.
H. Wang, P. Xu, and J. Zhao, “Improved KNN Algorithm Based on Preprocessing of Center in Smart Cities,” Complexity, vol. 2021, no. 1, Jan. 2021, doi: 10.1155/2021/5524388.
F. Piccialli et al., “Precision medicine and machine learning towards the prediction of the outcome of potential celiac disease,” Sci Rep, vol. 11, no. 1, p. 5683, Mar. 2021, doi: 10.1038/s41598-021-84951-x.
M. Torres-Vásquez, J. Hernández-Torruco, B. Hernández-Ocaña, and O. Chávez-Bosquez, “Impact of oversampling algorithms in the classification of Guillain-Barré syndrome main subtypes,” Ingenius, vol. 2021, no. 25, pp. 20–31, 2021, doi: 10.17163/ings.n25.2021.02.
A. Rodríguez Vico, F. Sánchez Hernández, L. López Mesonero, B. García Cenador, and M. N. Moreno García, “Predictors of the post-stroke status in the discharge from the hospital. Importance in nursing,” Enfermería Global, vol. 22, no. 1, pp. 1–37, Jan. 2023, doi: 10.6018/eglobal.530591.
C. L. Vidal-Silva, A. Sánchez-Ortiz, J. Serrano, and J. M. Rubio, “Experiencia académica en desarrollo rápido de sistemas de información web con Python y Django,” Formación universitaria, vol. 14, no. 5, pp. 85–94, Oct. 2021, doi: 10.4067/S0718-50062021000500085.
C. Egas Acosta and E. Revelo Vizcaino, “Implementación de un sistema para evaluar la cobertura de la red sigfox en el interior de edificaciones,” Enfoque UTE, Nov. 2022, doi: 10.29019/enfoqueute.859.
M. Linares Barbero, M. D. GALLEGO PEREIRA, and S. BUENO AVILA, “ANALYZING THE CRITICAL SUCCESS FACTORS IN THE ONLINE-GAME DEVELOPMENT: A FRAMEWORK PROPOSAL USING THE DELPHI-TOPSIS METHOD,” Dyna (Medellin), vol. 98, no. 5, pp. 504–510, Sep. 2023, doi: 10.6036/10829.
D. N. and N. P. K. S., “Design and Development of We-CDSS Using Django Framework: Conducing Predictive and Prescriptive Analytics for Coronary Artery Disease,” IEEE Access, vol. 10, pp. 119575–119592, 2022, doi: 10.1109/ACCESS.2022.3220899.
C. M. Castillo Estrada, K. Cancino Villatoro, V. Benavides García, and A. de la Cruz Vázquez, “Diseño de un Sistema web para el control de Curriculum Vitae Electrónico de personal docente basado en una arquitectura orientada a servicios (API REST),” Revista de Investigación en Tecnologías de la Información, vol. 10, no. 20, pp. 28–42, Dec. 2022, doi: 10.36825/RITI.10.20.003.
A. Ehsan, M. A. M. E. Abuhaliqa, C. Catal, and D. Mishra, “RESTful API Testing Methodologies: Rationale, Challenges, and Solution Directions,” Applied Sciences, vol. 12, no. 9, p. 4369, Apr. 2022, doi: 10.3390/app12094369.
C. Molina and V. Bonilla, “Aplicación de la Metodología CRISP-DM en el Análisis de Gases Disueltos en Aceite Dieléctrico de Transformadores Eléctricos del Sector Eléctrico Ecuatoriano,” Revista Técnica “energía,” vol. 21, no. 1, pp. 12–21, Jul. 2024, doi: 10.37116/revistaenergia.v21.n1.2024.635.
J. Carreras, “Artificial Intelligence Analysis of Celiac Disease Using an Autoimmune Discovery Transcriptomic Panel Highlighted Pathogenic Genes including BTLA,” Healthcare, vol. 10, no. 8, p. 1550, Aug. 2022, doi: 10.3390/healthcare10081550.
Copyright (c) 2026 Innovation and Software

This work is licensed under a Creative Commons Attribution 4.0 International License.
The authors exclusively grant the right to publish their article to the Innovation and Software Journal, which may formally edit or modify the approved text to comply with their own editorial standards and with universal grammatical standards, prior to publication; Likewise, our journal may translate the approved manuscripts into as many languages as it deems necessary and disseminates them in several countries, always giving public recognition to the author or authors of the research.











