Feature Selection and Biomarker Identification in Ovarian Cancer Clinical Datasets Using Supervised Algorithms

Authors

DOI:

https://doi.org/10.26713/cma.v16i4.3185

Keywords:

Ovarian Cancer, Machine Learning, Biomarkers

Abstract

Ovarian Cancer is one of the most common diseases in females. It is about the unusual growth of cancer cells in the ovaries. Even with improvements in medical research and treatment, it still plays a big role in deaths caused by cancer. Early prediction and detection of this disease may save many lives. In this study, three datasets have been considered to find the features which might be useful in the prediction of this disease. In this study, an ensemble of Random Forest Classifier, XGBoost and Mutual Information Gain was applied and then voting technique was used to find the most important features. The important features from each dataset were compared with each other to get the resultant features MAF, PAX8, SERINC1, SFN, SPON1, CREBL2, ST13, and INTS5. Four machine learning techniques were used namely Random Forest, eXtreme Gradient Boosting, Light Gradient Boosting and ANN to train and evaluate using Accuracy, Precison, Recall and F1 Score. The cross-validation across independent datasets enhances the generalizability of the identified biomarkers.

Downloads

Download data is not yet available.

References

M. Abd-elnaby, M. Alfonse and M. Roushdy, A hybrid mutual information-lasso-genetic algorithm selection approach for classifying breast cancer, in: Digital Transformation Technology, D. A. Magdi, Y. K. Helmy, M. Mamdouh and A. Joshi (editors), Lecture Notes in Networks and Systems, Volume 224, Springer, Singapore (2022), DOI: 10.1007/978-981-16-2275-5_36.

M. M. Ahamad, S. Aktar, M. J. Uddin, T. Rahman, S. A. Alyami, S. Al-Ashhab, H. F. Akhdar, A. K. M. Azad and M. A. Moni, Early-stage detection of ovarian cancer based on clinical data using machine learning approaches, Journal of Personalized Medicine 12(8) (2022), 1211, DOI: 10.3390/jpm12081211.

A. Anaissi, M. Goyal, D. R. Catchpoole, A. Braytee and P. J. Kennedy, Ensemble feature learning of genomic data using support vector machine, PLoS ONE 11(6) (2016), e0157330, DOI: 10.1371/journal.pone.0157330.

S. M. Ayyoubzadeh, M. Ahmadi, A. B. Yazdipour, F. Ghorbani-Bidkorpeh and M. Ahmadi, Prediction of ovarian cancer using artificial intelligence tools, Health Science Reports 7(7) (2024), e2203, DOI: 10.1002/hsr2.2203.

F. Hamidi, N. Gilani, R. A. Belaghi, H. Yaghoobi, E. Babaei, P. Sarbakhsh and J. Malakouti, Identifying potential circulating miRNA biomarkers for the diagnosis and prediction of ovarian cancer using machine-learning approach: Application of Boruta, Frontiers in Digital Health 5 (2023), 1187578, DOI: 10.3389/fdgth.2023.1187578.

F. Hamidi, N. Gilani, R. A. Belaghi, P. Sarbakhsh, T. Edgünlü and P. Santaguida, Exploration of potential miRNA biomarkers and prediction for ovarian cancer using artificial intelligence, Frontiers in Genetics 12 (2021), Article 724785, DOI: 10.3389/fgene.2021.724785.

M. T. Hira, M. A. Razzaque and M. Sarker, Ovarian cancer data analysis using deep learning: A systematic review, Engineering Applications of Artificial Intelligence 138 Part A (2024), 109250, DOI: 10.1016/j.engappai.2024.109250.

F. H. Juwono, W. K. Wong, H. T. Pek, S. Sivakumar and D. D. Acula, Ovarian cancer detection using optimized machine learning models with adaptive differential evolution, Biomedical Signal Processing and Control 77 (2022), 103785, DOI: 10.1016/j.bspc.2022.103785.

J. Liu, L. Liu, P. A. Antwi, Y. Luo and F. Liang, Identification and validation of the diagnostic characteristic genes of ovarian cancer by bioinformatics and machine learning, Frontiers in Genetics 13 (2022), Article 858466, DOI: 10.3389/fgene.2022.858466.

M. Liu, Z. Tong, C. Ding, F. Luo, S. Wu, C. Wu, S. Albeituni, L. He, X. Hu, D. Tieri, E. C. Rouchka, M. Hamada, S. Takahashi, A. A. Gibb, G. Kloecker, H. G. Zhang, M. Bousamra, B. G. Hill, X. Zhang and J. Yan, Transcription factor c-Maf is a checkpoint that programs macrophages in lung cancer, The Journal of Clinical Investigation 130(4) (2020), 2081 – 2096, DOI: 10.1172/JCI131335.

M. Mandal, P. K. Singh, M. F. Ijaz, J. Shafi and R. Sarkar, A tri-stage wrapper-filter feature selection framework for disease classification, Sensors 21(16) (2021), 5571, DOI: 10.3390/s21165571.

S. K. Mohapatra, A. Jain, Anshika and P. Sahu, Comparative approaches by using machine learning algorithms in breast cancer prediction, in: 2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE, Greater Noida, India, 2022), pp. 1874 – 1878, (2022) DOI: 10.1109/ICACITE53722.2022.9823470.

T. Shehzadi, A. Majid, M. Hameed, A. Farooq and A. Yousaf, Intelligent predictor using cancerrelated biologically information extraction from cancer transcriptomes, in: 2020 International Symposium on Recent Advances in Electrical Engineering & Computer Sciences (RAEE & CS, Islamabad, Pakistan, 2020), pp. 1 – 5, (2020), DOI: 10.1109/RAEECS50817.2020.9265692.

A. A. Soriano, T. de Cristofaro, T. Di Palma, S. Dotolo, P. Gokulnath, A. Izzo, G. Calì, A. Facchiano and M. Zannini, PAX8 expression in high-grade serous ovarian cancer positively regulates attachment to ECM via Integrin β3, Cancer Cell International 19 (2019), Article number: 303, DOI: 10.1186/s12935-019-1022-8.

M. J. Sundari and N. C. Brintha, A comparative study of various machine learning methods on ovarian tumor, in: 2021 Sixth International Conference on Image Information Processing (ICIIP, Shimla, India, 2021), pp. 314 – 319, (2021), DOI: 10.1109/ICIIP53038.2021.9702697.

The Human Protein Atlas (2022), SERINC1 in Ovarian Cancer, online accessed on June 6, 2025, URL: https://www.proteinatlas.org/ENSG00000111897-SERINC1.

C.-W. Wang, Y.-C. Lee, C.-C. Chang, Y.-J. Lin, Y.-A. Liou, P.-C. Hsu, C.-C. Chang, A.-K.-O. Sai, C.-H. Wang and T.-K. Chao, A weakly supervised deep learning method for guiding ovarian cancer treatment and identifying an effective biomarker, Cancers 14(7) (2022), 1651, DOI: 10.3390/cancers14071651.

S. Wang, Y. Wang, D. Wang, Y. Yin, Y. Wang and Y. Jin, An improved random forest-based rule extraction method for breast cancer diagnosis, Applied Soft Computing 86 (2020), 105941, DOI: 10.1016/j.asoc.2019.105941.

Y. Wang, Y. Wang, J. Li, Z. Yuan, B. Yuan, T. Zhang, J. M. Cragun, B. Kong and W. Zheng, PAX8: A sensitive and specific marker to identify cancer cells of ovarian origin for patients prior to neoadjuvant chemotherapy, Journal of Hematology & Oncology 6 (2013), Article number: 60, DOI: 10.1186/1756-8722-6-60.

J. Zhou, W. Cao, L. Wang, Z. Pan and Y. Fu, Application of artificial intelligence in the diagnosis and prognostic prediction of ovarian cancer, Computers in Biology and Medicine 146 (2022), 105608, DOI: 10.1016/j.compbiomed.2022.105608.

Downloads

Published

30-12-2025
CITATION

How to Cite

Rehman, R., & Konwar, P. (2025). Feature Selection and Biomarker Identification in Ovarian Cancer Clinical Datasets Using Supervised Algorithms. Communications in Mathematics and Applications, 16(4), 1093–1106. https://doi.org/10.26713/cma.v16i4.3185

Issue

Section

Research Article