| Home | E-Submission | Sitemap | Contact Us |  
top_img
Cancer Research and Treatment > Accepted Articles
doi: https://doi.org/10.4143/crt.2021.206    [Accepted]
Machine Learning Model for Predicting Postoperative Survival of Patients with Colorectal Cancer
Mohamed Hosny Osman1, Reham Hosny Mohamed1, Hossam Mohamed Sarhan2, Eun Jung Park3, Seung Hyuk Baik3, Kang Young Lee4, Jeonghyun Kang3
1Faculty of Medicine, Zagazig University, Zagazig, Egypt
2Faculty of Pharmacy, British University in Egypt (BUE), Egypt
3Department of Surgery, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, Korea
4Department of Surgery, Severance Hospital, Yonsei University College of Medicine, Seoul, Korea
Correspondence  Jeonghyun Kang ,Tel: 82-2-2019-3372, Fax: 82-2-3462-5994, Email: ravic@naver.com
Received: February 10, 2021;  Accepted: June 13, 2021.  Published online: June 15, 2021.
ABSTRACT
Purpose
Machine learning (ML) is a strong candidate for making accurate predictions, as we can use large amount of data with powerful computational algorithms. We developed a ML based model to predict survival of patients with colorectal cancer (CRC) using data from 2 independent datasets.
Materials and Methods
A total of 364,316 and 1,572 CRC patients were included from the Surveillance, Epidemiology, and End Results (SEER) and a Korean dataset, respectively. As SEER combines data from 18 cancer registries, internal validation was done using 18-Fold-Cross-Validation then external validation was performed by testing the trained model on the Korean dataset. Performance was evaluated using area under the receiver operating characteristic curve (AUROC), sensitivity and positive predictive values.
Results
Clinicopathological characteristics were significantly different between the two datasets and the SEER showed a significant lower 5-year survival rate compared to the Korean dataset (60.1% vs. 75.3%, p<0.001). The ML based model using Light gradient boosting algorithm achieved a better performance in predicting 5-year-survival compared to AJCC stage (AUROC, 0.804 vs. 0.736, p<0.001). The most important features which influenced model performance were age, number of examined lymph nodes, and tumor size. Sensitivity and positive predictive values of predicting 5-year-survival for classes including dead or alive were reported as 68.14%, 77.51% and 49.88%, 88.1% respectively in the validation set. Survival probability can be checked using the web-based survival predictor (http://colorectalcancer.pythonanywhere.com).
Conclusion
ML based model achieved a much better performance compared to staging in individualized estimation of survival of patients with CRC.
Key words: Machine learning, LightGBM, Colorectal neoplasms, Area under the Curve, Mortality, SEER
Editorial Office
Korean Cancer Association
Room 1824, Gwanghwamun Officia
92 Saemunan-ro, Jongno-gu, Seoul 03186, Korea
TEL: +82-2-3276-2410   FAX: +82-2-792-1410   E-mail: journal@cancer.or.kr
About |  Browse Articles |  Current Issue |  For Authors and Reviewers
Copyright © Korean Cancer Association.                 Developed in M2PI