Cross-Modal Self-Attention Fusion for Breast Cancer Subtype Classification Using Multi-Omics Data

Kurmash Zhumagozhayev 1, Tomiris Zhaksylyk 1 * , Beibit Abdikenov 1 * , Temirlan Karibekov 1, Liliya Skvortsova 2, Adil Faizullin 1
More Detail
1 Science and Innovation Center "Artificial Intelligence", Astana IT University, Astana 010000, Kazakhstan.
2 Institute of Genetics and Physiology, Committee of Science of the Ministry of Science and Higher Education, Almaty 050060, Kazakhstan.
* Corresponding Author
J CLIN MED KAZ, In press. https://doi.org/10.23950/jcmk/18568
OPEN ACCESS 43 Views 0 Downloads

ABSTRACT

Background: Accurate classification of breast cancer subtypes is essential for personalized therapy and prognosis. Traditional subtype classification basically relies on gene expression profiling usually overlooking other genomic signals like copy-number alterations (CNA) and mutations. At the same time most of the multi-omics models often rely on early or late fusion strategies, which do not capture complex inter-modality interactions.  
Methods: This study proposes a cross-modal transformer-based approach that integrates gene expression, copy number alterations, and mutation data for robust breast cancer subtype classification. Each omics modality is encoded as a separate sequence and projected into a shared embedding space. Gene expression is treated as the primary modality and enriched through cross-modal self-attention mechanisms with CNA and mutation features. The final enriched embeddings are flattened and passed through a residual-connected MLP classifier. We evaluate performance on the METABRIC dataset using ElasticNet-selected top-K features (K = 300, 500, 1000, 1500) and mostly focus on macro F1-score, weighted F1-score, and ROC AUC due to class imbalance.  
Results: Integrating copy-number and mutation data with expression features improved subtype classification across most feature set sizes. The tri-omic model (EXP+CNA+MUT) achieved the best performance for smaller feature sets (K = 300–500), whereas for larger feature sets (K = 1000) the highest scores were obtained by the bi-omic model (EXP+CNA) with 
macro-F1 = 0.859, weighted F1 = 0.868, accuracy = 0.866 and ROC AUC = 0.969. 
Conclusion: This pipeline demonstrates an application of cross-modal attention for omics integration in subtype classification task, offering a scalable and biologically grounded alternative to traditional fusion approaches.  
 

CITATION

Zhumagozhayev K, Zhaksylyk T, Abdikenov B, Karibekov T, Skvortsova L, Faizullin A. Cross-Modal Self-Attention Fusion for Breast Cancer Subtype Classification Using Multi-Omics Data. J Clin Med Kaz. 2026. https://doi.org/10.23950/jcmk/18568