Identification of early genetic markers in the peripheral blood of newborns with bronchopulmonary dysplasia using bioinformatics analysis and machine learning algorithms

Arailym Abilbayeva 1 * , Anel Tarabayeva 1, Iskander Isgandarov 1, Dana Yerbolat 1, Nuray Shaktay 1, Dinara Yelyubayeva 2
More Detail
1 Asfendiyarov Kazakh National Medical University
2 Center for Perinatology and Pediatric Cardiac Surgery
* Corresponding Author
J CLIN MED KAZ, In press.
OPEN ACCESS 25 Views 0 Downloads

ABSTRACT

Objective.  To identify and characterize a unique set of biomarker genes in peripheral blood on day 5 of life in preterm infants with subsequent development of bronchopulmonary dysplasia.
Materials and Methods. The open dataset GSE32472 was used to analyze gene expression profiles. The sample included preterm infants with a gestational age of less than 32 weeks and a body weight of ≤1500g. Peripheral blood samples collected within a 5-day window were analyzed to identify early transcriptional changes. Differential gene expression analysis was performed using the limma package with thresholds of |log_2FC|>0.5 and p.adj<0.05. Coexpression network analysis and functional enrichment were used to narrow the pool of potential target genes. At the final stage, three independent machine learning algorithms, such as SVM-RFE, LASSO, and Random Forest, were used to select the most prognostically significant genes. The diagnostic model based on the selected genes was visualized as a nomogram, and its predictive ability was assessed using ROC analysis and decision curve analysis.
Results. Comparative analysis identified 451 differentially expressed genes in the BPD group on day 5 of life. WGCNA analysis identified the MEpink module as the one most significantly correlated with the BPD phenotype. The intersection of DEGs and genes within the MEpink module identified 106 common genes, functional analysis of which revealed significant enrichment for processes associated with T-cell activation and T-cell receptor signaling pathways. A combined machine learning approach reliably identified four key hub genes with high prognostic significance: BTN3A3, AK5, NOG, and GCSAM. A diagnostic model based on these four genes demonstrated high predictive ability (AUC = 0.888) and clinical utility. Regulatory analysis revealed that the AK5, BTN3A3, and GCSAM genes are central nodes in the miRNA–mRNA regulatory network.
Conclusion.   The identified day 5 gene window reflects key pathophysiological processes: developmental impairment, cellular stress, and inflammatory immune dysregulation. These results shift the focus from BPD diagnosis to early risk prediction. Day 5 molecular signatures possess high prognostic value and can be used to develop targeted therapeutic interventions.

CITATION

Abilbayeva A, Tarabayeva A, Isgandarov I, Yerbolat D, Shaktay N, Yelyubayeva D. Identification of early genetic markers in the peripheral blood of newborns with bronchopulmonary dysplasia using bioinformatics analysis and machine learning algorithms. J Clin Med Kaz. 2026.