top of page

Publications

The following publications have been made available on medRxiv and bioRxiv, respectively.

sheafml_medRxiv.jpg

Integrating Infection Burden and Multimodal Biomarkers for Early Detection of Alzheimers Disease:
A Sheaf-ML Framework

Abstract

Alzheimers disease (AD) remains a major global health challenge, with growing evidence linking chronic infections, immune aging, and neurodegeneration. Grounded in the Antimicrobial Protection Hypothesis, this study introduces a sheaf-theoretic machine learning framework, Sheaf-ML, for integrating multimodal health data and assessing infectionrelated cognitive risk. Sheaf-ML constructs a unified patient-level representation that coherently combines diverse data streamsincluding serological infection markers, cognitive assessments, cardiovascular and metabolic measures, nutritional and behavioral evaluationswhile preserving the intrinsic structure and relationships of each modality. Applying this framework to the Harmonized LASI-DAD dataset (N = 6168), we modeled six clinically motivated domains (Infection, Cognition, Mental Health, Cardiovascular, Nutrition, and D emographics) and integrated them into a topologically consistent representation using learnable cross-domain mappings and consistency constraints. The sheaf-integrated embeddings revealed clinically meaningful interactions: infection burden was linked with cardiovascular, nutritional, and cognitive outcomes, highlighting system-level coordination across modalities. Using these embeddings, Sheaf-ML produced interpretable patient-level predictions and identified the most influential features both globally and individually. We further derived an Infection Burden Index (IBI), which quantified patient-level infection-related risk. Patients exceeding the 80th percentile were flagged as early-warning cases, corresponding to approximately 20% of the cohort, demonstrating actionable stratification for clinical monitoring. This study provides the first empirical evidence that sheaf-based architectures can integrate multimodal health data in a clinically interpretable manner, uncover biologically meaningful interactions, and support patient-specific risk prediction. By linking population-level patterns with individualized insights, Sheaf-ML establishes a foundation for scalable, interpretable, and equitable precision models of infection-related cognitive decline in Alzheimers disease.

MAP-PRS: Multi-Ancestry Portfolio-Based Polygenic Risk Scores

Abstract

Polygenic Risk Scores (PRS) are emerging tools for predicting an individuals genetic risk for complex diseases. However, their usefulness in clinical practice remains limited because most existing models are based on data from people of European ancestry, leading to reduced accuracy and stability in other populations. This imbalance restricts the equitable use of PRS in precision medicine. To overcome these limitations, we introduce Multi-Ancestry Portfolio-Based Polygenic Risk Scores (MAP-PRS) a new framework that combines mathematical modeling and data science principles to improve both fairness and reliability in genetic risk prediction across populations. MAP-PRS treats each ancestry-specific PRS as part of a portfolio, similar to how investments are managed in finance, balancing two key aspects: predictive return (how well the score predicts disease) and risk complexity (how uncertain or ancestry-specific the prediction is). By jointly optimizing these factors, MAP-PRS identifies the best combination of ancestry-informed PRS models that maximize predictive accuracy while minimizing bias and instability. This approach also uses advanced computational tools such as Bayesian modeling, machine learning, and generative neural networks to refine risk estimates, incorporate environmental and lifestyle factors, and increase representation from under-studied populations. In doing so, MAP-PRS supports more inclusive, equitable, and interpretable precision medicine. As an initial demonstration, MAP-PRS has been applied to predict Type 2 Diabetes (T2D) risk in European ancestry populations, establishing a foundation for broader, multi-ancestry implementation. Future extensions will include additional diseases, such as cervical cancer and HPV susceptibility, endometrioid ovarian cancer, and Alzheimers disease bringing us closer to clinically actionable and globally equitable genetic risk prediction.

map-prs.png
Microbeyt_I_edited.jpg
qr_img.png

Interpretable Machine Learning and Comparative Genomics Reveal Microbial Plastic-Degrading (Microbeyt) Potential

Abstract

Plastic pollution poses a critical environmental threat, and microbial enzymes represent a sustainable strategy for polymer degradation. We present a computational pipeline that integrates orthogroup-based genomic analysis with machine learning and interpretable feature importance to identify microbial strains with high plastic-degrading potential. Using presence or absence matrices and SHAP-derived feature contributions to the MTP visualization, the workflow highlights conserved gene modules driving predictive classification. Application to a single genus revealed strains harboring versatile enzymatic repertoires capable of targeting diverse polymers, including polyethylene, polyethylene terephthalate, polyurethane, and polyhydroxyalkanoates. These findings provide a rational framework for prioritizing candidate strains for experimental validation and bioremediation strategies. Overall, this study demonstrates how integrating comparative genomics with interpretable machine learning can guide the systematic discovery of microbial solutions to plastic pollution.

ci-fGBD: Cluster-Integrated Fast Generalized Bruhat Decomposition for Multimodal Data Clustering in Alzheimer's Disease. 

Abstract

Multimodal biomedical datasets, such as those from neurodegenerative disease cohorts, present significant challenges in stratifying heterogeneous patient populations due to missing values, high dimensionality, and modality-specific biases. Traditional clustering methods often require extensive preprocessing and fail to integrate heterogeneous data types effectively. We introduce ci-fGBD(Cluster-Integrated Fast Generalized Bruhat Decomposition), a novel matrix factorization and clustering framework that natively operates on block-structured, multimodal datasets. ci-fGBD extends the classical Bruhat decomposition by jointly learning latent representations and patient clusters while automatically harmonizing contributions across diverse modalities, including neuroimaging, cognitive assessments, genomics, wearable sensors, and environmental exposures. Benchmarking against standard methods on real datasets demonstrates that ci-fGBD consistently identifies clinically meaningful subgroups, capturing subtle biological, cognitive, and demographic heterogeneity in Alzheimer disease cohorts with superior interpretability and robustness.

ci_fGBD_website_edited_edited.jpg

Mathematical
Diseases-theraReCliNostic-Research Institute

foundation

© MDtRI 2025.

MDtRI, registered under Sections 8, 12A, and 80G, invites you to join us in addressing healthcare challenges. Your tax-deductible support under Section 80G fuels our mission for a healthier world. Partner with us to drive impactful change.

Banjara Kiran Block-C, Door No. 8-2-703/BK-C1, Banjara Hills Road No. 12, Hyderabad, Telangana, INDIA.

bottom of page