Purpose
Acute myeloid leukemia (AML) shows significant heterogeneity in therapeutic responses. We aimed to develop a gene signature for the stratification of high-risk pediatric AML using publicly available AML datasets, with a focus on literature-based prognostic gene sets. Materials and Methods
We identified 300 genes from 12 well-validated studies on AML-related gene signatures. Clinical and gene expression data were obtained from three datasets: TCGA-LAML, TARGET-AML, and BeatAML. Least absolute shrinkage and selection operator (LASSO)-Cox regression analysis was used to perform the initial gene selection and to construct a prognostic model using the TCGA database (n=132). The final gene signature was validated with two independent cohorts: BeatAML (n=411) and TARGET-AML (n=187).
Results
We identified a six-gene signature (ETFB, ARL6IP5, PTP4A3, CSK, HS3ST3B1, PLA2G4A), referred to as the literature-based signature 6 (LBS6), that was significantly associated with lower overall survival rates across the TCGA (HR=4.2, 95% CI: 2.59–6.81, p<0.0001), BeatAML (HR=1.52, 95% CI: 1.17–1.96, p=0.0013), and TARGET (HR=2.05, 95% CI: 1.36–3.08, p<0.001) datasets. The high-LBS6 score group exhibited significantly poorer five-year event-free survival compared to the low-LBS6 score group (HR=2.09, 95% CI: 1.38–3.15, p<0.001). After adjusting for key risk factors, including gene mutations (WT1, FLT3, and NPM1), protocol-based risk group, WBC count, and age, the LBS6 score was independently associated with worse survival rates in validation cohorts.
Conclusion
Our literature-driven approach identified a robust gene signature that stratifies AML patients into distinct risk groups. The LBS6 score shows promise in redefining initial risk stratification and identifying high-risk AML patients.