Cross-Modal Self-Attention Fusion for Breast Cancer Subtype Classification Using Multi-Omics Data
Kurmash Zhumagozhayev 1,
Tomiris Zhaksylyk 1 * ,
Beibit Abdikenov 1 * ,
Temirlan Karibekov 2,
Liliya Skvortsova 3,
Adil Faizullin 1 More Detail
1 Science and Innovation Center "Artificial Intelligence", Astana IT University, Astana, Kazakhstan
2 Science and Innovation Center "MedTech", Astana IT University, Astana, Kazakhstan
3 Laboratory of Molecular Genetics, Institute of Genetics and Physiology, Committee of Science of the Ministry of Science and Higher Education, Almaty, Kazakhstan
* Corresponding Author
J CLIN MED KAZ, Volume 23, Issue 3, pp. 40-51.
https://doi.org/10.23950/jcmk/18568
OPEN ACCESS
313 Views
20 Downloads
Author Contributions: Conceptualization, K. Zh., T. Zh.; methodology / planning and organization, K. Zh.; validation, K. Zh., T. Zh., A. F.; formal analysis, T. Zh., A. F.; investigation, K. Zh., T. Zh.; clinical interpretation of the results, medical relevance of the study design, and validation the findings from a translational oncology perspective, T. K.; guidance on biological interpretation of omics features, validation of the relevance of selected molecular markers to breast cancer subtypes, L. S.; resources, B. A.; software, K. Zh.; data curation, K. Zh.; writing – original draft preparation, K. Zh.; writing – review and editing, K. Zh., T. Zh.; visualization, K. Zh.; supervision, B. A.; project administration, A. F., B. A.; funding acquisition, A. F., B. A. All authors have read and agreed to the published version of the manuscript.
Data availability statement: The corresponding author can provide the data supporting the study's conclusions upon request. Due to ethical and privacy constraints, the data are not publicly accessible.
Artificial Intelligence (AI) Disclosure Statement: The authors declare no AI Tools used for preparation of this work.
ABSTRACT
Background: Accurate classification of breast cancer subtypes is essential for personalized therapy and prognosis. Traditional subtype classification basically relies on gene expression profiling, usually overlooking other genomic signals like copy-number alterations (CNA) and mutations. At the same time most of the multi-omics models often rely on early or late fusion strategies, which do not capture complex inter-modality interactions.
Methods: This study proposes a cross-modal transformer-based approach that integrates gene expression, copy number alterations, and mutation data for robust breast cancer subtype classification. Each omics modality is encoded as a separate sequence and projected into a shared embedding space. Gene expression is treated as the primary modality and enriched through cross-modal self-attention mechanisms with CNA and mutation features. The final enriched embeddings are flattened and passed through a residual-connected MLP classifier. We evaluate performance on the METABRIC dataset using ElasticNet-selected top-K features (K = 300, 500, 1000, 1500) and mostly focus on macro F1-score, weighted F1-score, and ROC AUC due to class imbalance.
Results: Integrating copy-number and mutation data with expression features improved subtype classification across most feature set sizes. The tri-omic model (EXP+CNA+MUT) achieved the best performance for smaller feature sets (K = 300–500), whereas for larger feature sets (K = 1000) the highest scores were obtained by the bi-omic model (EXP+CNA) with macro-F1 = 0.859, weighted F1 = 0.868, accuracy = 0.866 and ROC AUC = 0.969. Paired statistical tests across five folds showed that differences between modality configurations did not reach significance at any K (all p > 0.09), whereas feature-set size did.
Within the EXP+CNA configuration alone, macro-F1 increased significantly from K = 300 to K = 500 (paired t-test, p = 0.012) and from K = 300 to K = 1000 (p = 0.036); and in the higher-powered pooled analysis across all three modality configurations (n = 15 paired folds), K = 1000 also outperformed K = 300 (p = 0.030).
Conclusion: This pipeline demonstrates an application of cross-modal attention for omics integration in subtype classification task, offering a scalable and biologically grounded alternative to traditional fusion approaches.
CITATION
Zhumagozhayev K, Zhaksylyk T, Abdikenov B, Karibekov T, Skvortsova L, Faizullin A. Cross-Modal Self-Attention Fusion for Breast Cancer Subtype Classification Using Multi-Omics Data. J CLIN MED KAZ. 2026;23(3):40-51.
https://doi.org/10.23950/jcmk/18568
REFERENCES
- Akhmedullin R, Aimyshev T, Zhakhina G, Yerdessov S, Beyembetova A, Ablayeva A, Biniyazova A, Seyil T, Abdukhakimova D, Segizbayeva A, Semenova Y, Gaipov A. In-depth analysis and trends of cancer mortality in Kazakhstan: a joinpoint analysis of nationwide healthcare data 2014–2022. BMC Cancer. 2024;24:1340. https://doi.org/10.1186/s12885-024-13128-2
- Beyembetova A, Ablayeva A, Akhmedullin R, Abdukhakimova D, Biniyazova A, Gaipov A. National Electronic Oncology Registry in Kazakhstan: Patient’s Journey. Epidemiol Health Data Insights. 2025;1(1):ehdi004. https://doi.org/10.63946/ehdi/16385
- Midlenko A, Mussina K, Zhakhina G, Sakko Y, Rashidova G, Saktashev B, Adilbay D, Shatkovskaya O, Gaipov A. Prevalence, incidence, and mortality rates of breast cancer in Kazakhstan: data from the Unified National Electronic Health System, 2014–2019. Front Public Health. 2023;11:1132742. https://doi.org/10.3389/fpubh.2023.1132742
- Chuvakova E, Zaripova L, Segizbayeva A, Baigenzhin A, Yegembay A, Idrissova D. Visualization of Breast Cancer and Safety: Review. J Clin Med Kaz. 2025;22(2):4–11. https://doi.org/10.23950/jcmk/16273
- Iztleuov Y, Mutigulina G, Almagambetova A, Iztleuova G. Prognostic Role of Breast Architecture in Imaging, Histopathology, and Breast Cancer Outcome. J Clin Med Kaz. 2025; 22(5):73–79. https://doi.org/10.23950/jcmk/16879
- Tombak Y, Umay EK, Unkazan FN, Karaahmet OZ, Sezer MK, Akyuz EU, Gurcay E. The Effect of Breast Cancer History on Bone Mineral Density in the Treatment of Postmenopausal Osteoporosis: One-Year Follow-Up Results. J Clin Med Kaz. 2024;21(6):85–90. https://doi.org/10.23950/jcmk/15703
- Tlegenova Z, Balmagambetova S, Zholdin B, Kurmanalina G, Talipova I, Koyshybaev A, Nurmanova D, Sultanbekova G, Baspayeva M, Madinova S, Kubenova K, Urazova A. Stratifying breast cancer patients by baseline risk of cardiotoxic complications linked to chemotherapy. J Clin Med Kaz. 2023;20(3):75-81. https://doi.org/10.23950/jcmk/13325
- Oladosu TA, Okafor CP, Nwosu PC, Ibukunoluwa AE, Monica UI, Aderanti TA. The Role of Liquid Biopsies in Tracking Tumor Evolution and Overcoming Therapeutic Resistance in Cancer. Oncol Nucl Med Transplantol. 2025;1(1):onmt006. https://doi.org/10.63946/onmt/17244
- Beykikhoshk A, Quinn TP, Lee SC, Tran T, Venkatesh S. DeepTRIAGE: interpretable and individualised biomarker scores using attention mechanism for the classification of breast cancer sub-types. BMC Med Genomics. 2020;13(Suppl 3):20. https://doi.org/10.1186/s12920-020-0658-5
- Sørlie T, Tibshirani R, Parker J, Hastie T, Marron JS, Nobel A, Deng S, Johnsen H, Pesich R, Geisler S, Demeter J, Perou CM, Lønning PE, Brown PO, Borresen-Dale A-L, Botstein D. Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci. 2003;100(14):8418-8423. https://doi.org/10.1073/pnas.0932692100
- Parker JS, Mullins M, Cheang MCU, Leung S, Voduc D, Vickery T, Davies S, Fauron C, He X, Hu Z, Quackenbush JF, Stijleman IJ, Palazzo JP, Marron JS, Nobel AB, Mardis E, Nielsen TO, Ellis MJ, Perou CM, Bernard PS. Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol. 2009;27(8):1160-1167. https://doi.org/10.1200/JCO.2008.18.1370
- Prat A, Parker JS, Karginova O, Fan C, Livasy C, Herschkowitz JI, He X, Perou CM. Phenotypic and molecular characterization of the claudin-low intrinsic subtype of breast cancer. Breast Cancer Res. 2010;12(5):R68. https://doi.org/10.1186/bcr2635
- Curtis C, Shah SP, Chin SF, Turashvili G, Rueda OM, Dunning MJ, Speed D, Lynch AG, Samarajiwa S, Yuan Y, Gräf S, Ha G, Haffari G, Bashashati A, Russell R, McKinney S, METABRIC Group, Langerød A, Green A, Provenzano E, Wishart G, Pinder S, Watson P, Markowetz F, Murphy L, Ellis I, Purushotham A, Børresen-Dale AL, Brenton JD, Tavaré S, Caldas C, Aparicio S. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature. 2012;486(7403):346-352. https://doi.org/10.1038/nature10983
- Nasser M, Yusof UK. Deep Learning Based Methods for Breast Cancer Diagnosis: A Systematic Review and Future Direction. Diagnostics. 2023;13(1):161. https://doi.org/10.3390/diagnostics13010161
- Cristovao F, Cascianelli S, Canakoglu A, Carman M, Nanni L, Pinoli P. Investigating deep learning-based breast cancer subtyping using pan-cancer and multi-omic data. IEEE/ACM Trans Comput Biol Bioinform. 2022;19:121-134. https://doi.org/10.1109/TCBB.2020.3042309
- Lin Y, Zhang W, Cao H, Li G, Du W. Classifying Breast Cancer Subtypes Using Deep Neural Networks Based on Multi-Omics Data. Genes. 2020;11(8):888. https://doi.org/10.3390/genes11080888
- Islam MM, Huang S, Ajwad R, Chi C, Wang Y, Hu P. An integrative deep learning framework for classifying molecular subtypes of breast cancer. Comput Struct Biotechnol J. 2020;18:2185-2199. https://doi.org/10.1016/j.csbj.2020.08.005
- Li X, Ma J, Leng L, Han M, Li M, He F, Zhu Y. MoGCN: A Multi-Omics Integration Method Based on Graph Convolutional Network for Cancer Subtype Analysis. Front Genet. 2022;13:806842. https://doi.org/10.3389/fgene.2022.806842
- Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is All You Need. Adv Neural Inf Process Syst. 2017;30. Available at: https://papers.nips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
- Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, Way GP, Ferrero E, Agapow PM, Zietz M, Hoffman MM, Xie W, Rosen GL, Lengerich BJ, Israeli J, Lanchantin J, Woloszynek S, Carpenter AE, Shrikumar A, Xu J, Cofer EM, Lavender CA, Turaga SC, Alexandari AM, Lu Z, Harris DJ, DeCaprio D, Qi Y, Kundaje A, Peng Y, Wiley LK, Segler MHS, Boca SM, Swamidass SJ, Huang A, Gitter A, Greene CS. Opportunities and obstacles for deep learning in biology and medicine. J R Soc Interface. 2018;15(141):20170387. https://doi.org/10.1098/rsif.2017.0387
- Abdikenov B, Zhaksylyk T, Shortanbaiuly O, Orazayev Y, Makhanov N, Karibekov T, Suvorov V, Imasheva A, Zhumagozhayev K, Seitova A. Future of Breast Cancer Diagnosis: A Review of DL and ML Applications and Emerging Trends for Multimodal Data. IEEE Access. 2025;13:136101–136143. https://doi.org/10.1109/ACCESS.2025.3585377
- Choi JM, Chae H. moBRCA-net: a breast cancer subtype classification framework based on multi-omics attention neural networks. BMC Bioinformatics. 2023;24(1):169. https://doi.org/10.1186/s12859-023-05273-5
- Liu J, Su R, Zhang J, Wei L. Classification and gene selection of triple-negative cancer subtype using ensemble learning and mutual information-based selection. Brief Bioinform. 2021;22(5):1-12. https://doi.org/10.1093/bib/bbaa395
- Guo J, Jin M, Chen Y, Liu J. An embedded gene selection method using knockoffs optimizing neural network. BMC Bioinformatics. 2020;21:414. https://doi.org/10.1186/s12859-020-03717-w
- Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Series B Stat Methodol. 2005;67(2):301-320. https://doi.org/10.1111/j.1467-9868.2005.00503.x
- Molaei S, Cirillo S, Solimando G. Cancer Detection Using a New Hybrid Method Based on Pattern Recognition in MicroRNAs Combining Particle Swarm Optimization Algorithm and Artificial Neural Network. Big Data Cogn Comput. 2024;8(3):33. https://doi.org/10.3390/bdcc8030033
- Anđelić N, Šegota SB. Development of symbolic expressions ensemble for breast cancer type classification using genetic programming symbolic classifier and decision tree classifier. Cancers. 2023;15(1):3411. https://doi.org/10.3390/cancers15133411
- Bruno P, Calimeri F, Kitanidis AS, Momi E. Data reduction and data visualization for automatic diagnosis using gene expression and clinical data. Artif Intell Med. 2020;107:101884. https://doi.org/10.1016/j.artmed.2020.101884
- Arafa A, El-Fishawy N, Badawy M, Radad M. RN-Autoencoder: Reduced Noise Autoencoder for classifying imbalanced cancer genomic data. J Biol Eng. 2023;17(1):7. https://doi.org/10.1186/s13036-022-00319-3
- Mostavi M, Chiu Y-C, Huang Y, Chen Y. Convolutional neural network models for cancer type prediction based on gene expression. BMC Med Genomics. 2020;13(Suppl 5):44. https://doi.org/10.1186/s12920-020-0677-2
- Mohamed T, Ezugwu A, Fonou-Dombeu JV, Ikotun AM, Mohammed M. A bio-inspired convolution neural network architecture for automatic breast cancer detection and classification using RNA-Seq gene expression data. Sci Rep. 2023;13:14644. https://doi.org/10.1038/s41598-023-41731-z
- Arya N, Saha S. Multi-modal classification for human breast cancer prognosis prediction: Proposal of deep-learning based stacked ensemble model. IEEE/ACM Trans Comput Biol Bioinform. 2022;19:1032-1041. https://doi.org/10.1109/TCBB.2020.3018467
- Tanvir R, Islam M, Sobhan M, Luo D, Mondal AM. MOGAT: A Multi-Omics Integration Framework Using Graph Attention Networks for Cancer Subtype Prediction. Int J Mol Sci. 2024;25:2788. https://doi.org/10.3390/ijms25052788
- cBioPortal for Cancer Genomics. METABRIC Breast Cancer Study – cBioPortal. 2024. Available at: https://www.cbioportal.org/study?id=brca_metabric
- Zhang MH, Man HT, Zhao XD, Dong N, Ma SL. Estrogen receptor-positive breast cancer molecular signatures and therapeutic potentials. Biomed Rep. 2014;2(1):41-52. https://doi.org/10.3892/br.2013.187