Integrated Transcriptomic Landscape and Deep Learning Based Survival Prediction in Uterine Sarcomas
Article information
Abstract
Purpose
The genomic characteristics of uterine sarcomas have not been fully elucidated. This study aimed to explore the genomic landscape of the uterine sarcomas (USs).
Materials and Methods
Comprehensive genomic analysis through RNA-sequencing was conducted. Gene fusion, differentially expressed genes (DEGs), signaling pathway enrichment, immune cell infiltration, and prognosis were analyzed. A deep learning model was constructed to predict the survival of US patients.
Results
A total of 71 US samples were examined, including 47 endometrial stromal sarcomas (ESS), 18 uterine leiomyosarcomas (uLMS), three adenosarcomas, two carcinosarcomas, and one uterine tumor resembling an ovarian sex-cord tumor. ESS (including high-grade ESS [HGESS] and low-grade ESS [LGESS]) and uLMS showed distinct gene fusion signatures; a novel gene fusion site, MRPS18A–PDC-AS1 could be a potential diagnostic marker for the pathology differential diagnosis of uLMS and ESS; 797 and 477 uterine sarcoma DEGs (uDEGs) were identified in the ESS vs. uLMS and HGESS vs. LGESS groups, respectively. The uDEGs were enriched in multiple pathways. Fifteen genes including LAMB4 were confirmed with prognostic value in USs; immune infiltration analysis revealed the prognositic value of myeloid dendritic cells, plasmacytoid dendritic cells, natural killer cells, macrophage M1, monocytes and hematopoietic stem cells in USs; the deep learning model named Max-Mean Non-Local multi-instance learning (MMN-MIL) showed satisfactory performance in predicting the survival of US patients, with the area under the receiver operating curve curve reached 0.909 and accuracy achieved 0.804.
Conclusion
USs harbored distinct gene fusion characteristics and gene expression features between HGESS, LGESS, and uLMS. The MMN-MIL model could effectively predict the survival of US patients.
Introduction
Uterine sarcomas (USs) are rare primary malignant tumors originating from the mesenchymal tissue of the uterus [1]. USs are mainly classified into uterine leiomyosarcoma (uLMS), endometrial stromal sarcoma (ESS), rare undifferentiated uterine sarcoma and uterine tumors resembling ovarian sex-cord tumors (UTROSCT), and ESS is further divided into low-grade ESS (LGESS) and high-grade ESS (HGESS) based on its pathological morphological characteristics [2].
USs are usually high-grade tumors with a high local recurrence rate and metastatic risk, and their overall prognosis is poor [3]. Despite the limited number of studies and therapeutic effects of immune checkpoint inhibitors (ICIs) in USs, HGESS has been reported to be excellent candidates for ICIs [4,5], providing a promising therapeutic prospect for USs. Since USs show complex histological morphology and each subtype represents distinct molecular characteristics, prognosis, and treatment, accurate differential diagnosis of the histological subtype of USs is of great significance for personalized treatment of patients.
Recently, many studies have reported unique genomic changes and clinicopathological characteristics of each subtype [2,6]. For instance, over 50% of LGESS harbored JAZF1-SUZ12 gene fusion, whereas in HGESS, YWHAE-NUTM2 gene fusion, ZC3H7B-BCOR gene fusion, and BCOR internal tandem duplication were commonly found; a small number of epitheliod uLMS harbored PGR gene fusion, and approximately 25% of myxoid uLMS exhibited PLAG1 gene fusion [7,8]. Currently, the genomic characteristics of each US subtype have not been fully elucidated, and little is known about the genetic landscape of each subtype.
In this study, we used RNA transcriptome sequencing to conduct a comprehensive genomic analysis of 71 surgically resected US samples, aiming to identify new gene fusion sites and clarify signaling pathway enrichment, immune infiltration, and clinical indicators that are related to the prognosis of ESS and uLMS, as well as to explore the possibility of molecular classification of USs based on genomic characteristics and the role of guidance in pathological diagnosis.
Materials and Methods
1. Patients
We retrospectively collected data from 64 patients diagnosed with USs who underwent surgical resection at the Affiliated Hospital of Qingdao University between March 2020 and March 2023. Open data for 21 cases of ESS was obtained from public literature [9]. Patients with substandard RNA quality or testing of distal metastatic tissues were excluded, and 71 patients with USs were finally enrolled in this study. All pathological slides were cross-checked by two experienced pathologists. If there was a different opinion on the pathological diagnosis, a third senior expert gave a final judgment to achieve a majority vote.
2. RNA-sequencing and gene fusion detection
All patients provided formalin-fixed paraffin-embedded (FFPE) samples for RNA extraction. Total RNA was extracted using the RNAprep Pure FFPE Kit (Tiangen Biotech, Beijing, China), by following the manufacturer’s instructions. The cDNA library was constructed using the Hieff next-generation sequencing Ultima Dual-mode RNA Library Prep Kit (Yeasen, Shanghai, China) and paired-end sequenced with 100 bp lengths on the DNBseq-T7 platform (MGI, Shenzhen, China). Raw data in FASTQ format were filtered to remove low-quality reads and adaptor sequences. Clean reads were aligned to the reference human genome (hg19) using the STAR software (https://code.google.com/archive/p/rnastar/). In this study, we used whole transcriptome sequencing to detect gene fusions, which theoretically can detect all gene fusions. Gene fusion was detected using Arriba software (v1.1.0) (https://github.com/suhrig/arriba) based on the STAR alignment results.
3. Differentially expressed gene analysis and pathway enrichment
Differentially expressed genes (DEGs) in each histological subtype were identified using the DESeq package (v1.8.3) (http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html) and the limma package (v3.52.1) (https://bioconductor.org/packages/release/bioc/html/limma.html). First, genes with expression values of 0 in more than 50% of the samples were filtered out, and genes with an absolute log2 fold change greater than 1 and an adjusted p-value less than 0.05 were considered as significant DEGs in each package. Only genes that were confirmed to be significant in both DESeq and limma packages were considered uterine sarcoma DEGs (uDEGs). The clusterProfiler package (v3.14.0) (https://bioconductor.org/packages/release/bioc/html/clusterProfiler.html) was used for pathway enrichment analysis of uDEGs, and the Kyoto Encyclopedia of Genes and Genomes (KEGG; https://www.genome.jp/kegg/) and Gene Ontology (GO; https://geneontology.org/) were selected to confirm the pathway enrichment.
4. Immune infiltration analysis
Cellular immune scores were calculated using the immunedeconv v2.0.0 (https://omnideconv.org/immunedeconv/) and GSVA packages v1.45.5 (https://bioconductor.org/packages/release/bioc/html/GSVA.html). For the immunedeconv package, xCell (http://xCell.ucsf.edu), EPIC (http://epic.gfellerlab.org), and quanTIseq (http://icbi.at/quantiseq) were used to calculate the immune scores. For the GSVA package, the ssGSEA (single-sample gene set enrichment analysis) was performed using gene sets derived from pan-cancer immunological study [10]. The parameter settings in this study were as follows: method=“ssgsea”, kcdf=“Gaussian”, abs.ranking=TRUE. Unsupervised hierarchical clustering was applied to the immune infiltration scores. Differences in the immune scores of different cell types were calculated using the rank-sum test, and the p-values were adjusted using the Benjamini-Hochberg method.
5. Least absolute shrinkage and selection operator regression model construction
Least absolute shrinkage and selection operator (LASSO) regression was used to identify the characteristic genes for each histological subtype. The glmnet package v4.1-7 (https://glmnet.stanford.edu/) was applied to LASSO regression analysis with the following parameters: alpha=1, family=“binomial”, standardize=FALSE, nfolds=3. The pROC package v1.18.0 (https://xrobin.github.io/pROC/) was used to plot the receiver operating characteristic (ROC) curves to evaluate the classification results. Survival package v3.3-1 (https://github.com/therneau/survival) was used to evaluate the Kaplan-Meier survival, as well as to calculate the Cox proportional hazards regression model by the coxph function in this package. The Survminer package v0.4.9 (https://rpkgs.datanovia.com/survminer/index.html) was used to plot the survival curves.
6. Prognostic model
In this study, we propose a progression-free survival (PFS) classification method based on self-supervised learning and multi-instance learning (MIL), named MMN-MIL (Max-Mean Non-Local MIL), which aims to use the whole slide image (WSI) of tumors to distinguish the PFS stage (patients without progression as 0, patients with recurrence, metastasis, or death as 1). The MIL method treats each input as a bag containing a group of instances (i.e., image patches) [11]. For all the samples, if a bag contained at least one positive instance, it was assigned a positive label. Otherwise, it was assigned a negative label. The pathological hematoxylin and eosin (H&E)–stained slides were scanned by WSI using a scanner (CellsVision, CellPlatform). The tumor area was delineated and segmented by two experienced pathologists (Yanjiao Hu and Fengyun Hao), if there was any different diagnosis, a senior pathologist (Jigang Wang) would give a final opinion. The tumor area was processed at 4× magnification and then divided into patches with a resolution of 224×224. In this study, we used the images of 54 patients for model development, including 43 patients for the training cohort and 11 patients for the validation cohort. It should be noted that the PFS label for the patients were not balanced due to our short follow-up period, most of the patients were labeled as 0, which could cause potential biases for the training of the model.
For the development of the model, the first step was a self-supervised learning feature extraction network based on SimCLR (with Resnet-18 as the backbone) [12]. The input of the network is a single-patch image, and the output is a 512-dimensional patch embedding. The second step involved the MIL aggregator. The input of the network is the bag of all patch embeddings in the WSI, and the output of the network is the prediction of the PFS label. The MIL aggregator mainly adopts the network structure in DSMIL [13], except that we change the Max Pooling to Max Pooling and Mean Pooling, considering both the maximum and average values in the patch bag. Finally, the MMN-MIL model was developed.
7. Statistical analysis
In our study, PFS was defined as the interval between the date of diagnosis and the date of recurrence or metastasis, or the last follow-up. PFS was calculated using the Kaplan-Meier method, and the comparison of each group was confirmed using log-rank tests. All statistical analyses were performed using the R package ver. 3.6.1 (R Foundation for Statistical Computing, Vienna, Austria). p-values were considered statistically significant at two-sided p-values of < 0.05.
8. Data availability statement
The datasets generated and analysed during the current study are available in the China National GenBank DataBase (CNGBdb) repository (https://db.cngb.org/; project code: CNP0005099).
Results
1. Clinicopathologic characteristics
Of the 85 patients with USs, 71 were enrolled in this cohort (S1 Fig.). Clinicopathological characteristics are summarized in S2 Table. Several histological subtypes of USs were included in this study, including 47 cases of ESS, 18 cases of uLMS, three cases of adenosarcoma, two cases of carcinosarcoma, and one case of UTROSCT. The average age in this study was 51 years (range, 27 to 78 years), and the distribution of the International Federation of Obstetricians and Gynecologists (FIGO) stage of 54 patients at the first diagnosis was as follows: stage I (n=31, 43.66%), stage II (n=4, 5.63%), stage III (n=5, 7.04%), and stage IV (n=14, 19.72%). Representative H&E-stained images of each histological subtype are shown in S3 Fig.
2. Gene fusion landscape in USs
All gene fusions detected in the 71 US patients are summarized in Fig. 1A. For LGESS, the top four fusion pairs were JAZF1-SUZ12 (50%, 12/24), RNU6-33P-SNORA8 (45.8%, 11/24), RNU6-33P-TAF1D (45.8%, 11/24), and C10ORF68-CCDC7 (37.5%, 9/24). In the HGESS, the most frequent fusion pairs were ZC3H7B-BCOR (34.8%, 8/23), C10ORF68-CCDC7 (30.4%, 7/23), and MRPS18A-ZAN (17.4%, 4/23). In the uLMS, the most commonly detected fusion pairs were C10ORF68-CCDC7 (50%, 9/18), MRPS18A-PDC-AS1 (27.8%, 5/18), RNU4ATAC-RNU6ATAC (22.2%, 4/18), RNU6-4P–TTI2 (22.2%, 4/18), and RNU6-9–SNORA20 (22.2%, 4/18). As shown in Fig. 1B, compared to ESS, gene fusions of MRPS18A–PDC-AS1, RNU6-4P–TTI2, AC018717.1, AC018717.2 (6988)–AC019330.1, and AKAP2-PALM2 (also known as PAL-M2AKAP2) were more commonly detected in uLMS (p < 0.05). MRPS18A–PDC-AS1 gene fusion was specifically observed in uLMS with a frequency of 27.8%. Integrative Genomics Viewer (IGV) revealed a detailed fusion point (Fig. 1D). Considering the absence of MRPS18A–PDC-AS1 gene fusion in ESS, it might be a potential diagnostic marker for ESS and uLMS. For ESS, we found that JAZF1-SUZ12 fusion was only detected in LGESS (p < 0.001), ZC3H7B-BCOR was notably more frequent in HGESS (p < 0.01), and RNU6-33P–SNORA8 and RNU6-33P–TAF1D were found in both subtypes, but were more prevalent in LGESS (p < 0.01) (Fig. 1C). Representative fusion details of JAZF1-SUZ12 and ZC3H7B-BCOR are delineated in S4 Fig.

Gene fusion landscape of 71 uterine sarcomas (USs). (A) Frequently occurring gene fusions in USs displayed as an oncoplot. Clinicopathological characteristics are displayed in the top panel. (B) Comparison gene fusion frequencies of endometrial stromal sarcoma vs. uterine leiomyosarcoma (uLMS) and high-grade endometrial stromal sarcoma (HGESS) vs. low-grade endometrial stromal sarcoma (LGESS) (C) with statistical significance. *p < 0.05, ***p < 0.001. (D) The Integrative Genomics Viewer (IGV) reads of MRPS18A–PDC-AS1 gene fusion. AS1, antisense RNA 1; MRPS18A, mitochondrial ribosomal protein s18a; PDC, phosducin; turosct, uterine UTROSCT, uterine tumor resembling ovarian sex-cord tumor.
3. DEGs in USs
We analyzed the DEGs in ESS and uLMS using DESeq and limma. For DESeq analysis, a comparison between ESS and uLMS showed that 603 genes were upregulated in ESS, while 1,250 genes were upregulated in uLMS (Fig. 2A). One thousand and forty-one genes were upregulated in HGESS and 573 genes were downregulated in LGESS (Fig. 2B). Limma analysis showed similar results: 1,078 genes were upregulated in ESS, 215 genes were upregulated in uLMS (Fig. 2C), 193 genes were upregulated in HGESS, and 573 genes were downregulated in LGESS (Fig. 2D). We defined uDEGs as genes that showed statistically significant expression levels as identified by both DESeq and limma in US; a total of 797 uDEGs were ultimately screened for ESS vs. uLMS. In addition, 477 uDEGs were identified when comparing HGESS and LGESS.

Differentially expressed genes and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of the selected genes. (A-D) Volcano plot of differentially expressed genes analyzed using DESeq2 (A, B) and limma (C, D) between the two groups of endometrial stromal sarcoma (ESS) vs. uterine leiomyosarcoma (uLMS) and high-grade endometrial stromal sarcoma (HGESS) vs. lowgrade endometrial stromal sarcoma (LGESS), respectively. Up- and down-regulated genes are represented by red and blue symbols. Top five genes with significant expression change were represented. (E, F) KEGG analysis of differentially expressed genes identified using both DESeq2 and limma between the two groups of ESS vs. uLMS (E) and HGESS vs. LGESS (F), respectively. The enrichment pathways of up- and down-regulated genes are represented with red and blue symbols; depth of the color indicates the number of genes enriched into the pathway. (G) The differential gene expressions between ESS and uLMS in Hedgehog and Wnt signaling pathways. Up-regulation in ESS and uLMS was represented by red and blue symbols, respectively. (H) The differential gene expressions between HGESS and LGESS in p53, cell cycle and homologous recombination (HR) signaling pathways. Up-regulation in HGESS was represented with a red symbol. ECM, extracellular matrix; FC, fold change; MAPK, mitogen-activated protein kinase; TPM, transcripts per million.
4. Enrichment analysis of DEGs
Among the 797 uDEGs in the USs, 602 genes were upregulated in the ESS, and KEGG pathway analysis showed that uDEGs were significantly enriched in the Wnt, mitogen-activated protein kinase, and Hedgehog signaling pathways (Fig. 2E). Moreover, 195 genes were downregulated in ESS, and KEGG enrichment analysis suggested enrichment of the p53 signaling pathway, cell cycle, focal adhesion, oocyte meiosis, and regulation of the actin cytoskeleton (Fig. 2E). All uDEGs and their interactions with these pathways are shown in the network plot (S5A and S5B Fig.). We explored the expression of uDEGs identified in the Hedgehog and Wnt signaling pathways in ESS vs. uLMS. In the Hedgehog signaling pathway, PTCH1 and SMO were significantly upregulated in ESS, whereas GAS1 was obviously increased in uLMS (Fig. 2G). The upstream genes of GAS1 and PTCH1, IHH, DHH, and SHH showed no statistical significance between ESS and uLMS. In the Wnt signaling pathway, WNT6, SFRP1, FZD8, LEF1, WNT5B, WNT11, NKD1, and BAMBI were markedly upregulated in ESS (Fig. 2G). GO analysis showed that upregulated uDEGs were enriched in synapses, cation channels, and transmembrane transport functions, whereas downregulated uDEGs were enriched in mitosis-related pathways (S6A and S6B Fig.).
Among the 477 uDEGs of LGESS and HGESS, 187 were upregulated in HGESS. KEGG analysis indicated the enrichment of p53, cell cycle, oocyte meiosis, and homologous recombination (HR). Among the 290 genes downregulated in HGESS, the neuroactive ligand-receptor interaction pathway was enriched in a single pathway (Fig. 2F). The uDEGs in the p53 signaling pathway, cell cycle, and HR and their interactions with these pathways are represented in Fig. 2H, S5C and S5D Fig. GO enrichment analysis demonstrated that the transcriptional upregulation of the uDEGs was also enriched in pathways related to mitosis. Downregulated uDEGs exhibited universal enrichment in channel activity, transmembrane transport, and neurotransmitter transport (S6C and S6D Fig.).
5. Gene signature distinct histology subtype of USs
To determine the unique gene expression characteristics between different subtypes of USs, we developed a LASSO regression model to identify the divergent set of genes in LGESS, HGESS, and uLMS. As shown in Fig. 3A, the area under the ROC curve (AUC) of the ESS vs. uLMS model was 0.917, indicating an impressive accuracy. Thirteen genes were differentially expressed between ESS and uLMS. Among these genes, ESS exhibited higher expression of PAGE4, SOX10, TERT, MUC6, KCNQ2, C16orf89, SLPI, RP11-386G21.2, VCX3B, and SLC22A11, whereas uLMS exhibited higher expression of GLYATL2, ASB5, and CCDC185 (Fig. 3C). The LASSO model for HGESS vs. LGESS showed a lower AUC (0.775) than the uLMS model (Fig. 3B), with HSD17B2 displaying a unique upregulated gene and CD164L2, PAGE4, CYP3A7, SLC6A2, PABPC1L2A, and B3GALT5-AS1 as downregulated genes in HGESS (Fig. 3C).

Characteristic expression genes in different histology subtype of uterine sarcomas identified by least absolute shrinkage and selection operator (LASSO) regression model. (A, B) The receiver operating curve (ROC) curves of LASSO regression models in the validation set to classify endometrial stromal sarcoma (ESS), uterine leiomyosarcoma (uLMS) (A, area under the ROC curve [AUC]=0.917), high-grade endometrial stromal sarcoma (HGESS), and low-grade endometrial stromal sarcoma (LGESS) (B, AUC=0.775). (C) Heatmap of characteristic expression genes identified using LASSO regression models.
6. Gene expression signature and prognosis
In the present study, 48 patients had valid survival data. Kaplan-Meier analysis showed no significant difference in PFS between the HGESS, LGESS, and uLMS (Fig. 4A). Consequently, we utilized the Cox proportional-hazards model to evaluate the influence of 1,176 distinctively expressed genes on clinical outcomes, and univariate Cox analysis suggested 256 genes as potential prognosis predictors (p < 0.05). Among the 256 candidates, we selected 20 genes whose p-values were less than 0.005 and absolute regression coefficients were over 0.2. Multivariate Cox analysis was performed to evaluate the association of the 20 genes with survival, and found that 15 genes were found to be independent prognostic factors (p < 0.05) (Fig. 4B). Among the 15 genes, the upregulation of LAMB4, ATP7B, EML5, SARDH, PLAG1, NCCRP1, and CACNA1B was associated with poor prognosis, with hazard ratios higher than 1. Conversely, increased expression of PLEKHH1, GPLD1, TRNP1, ZNF423, POTEG, ACSBG1, SLC18A2, and ARL9 was associated with a positive prognosis, with hazard ratios lower than 1.

Prognostic analysis and cluster analysis of patients with uterine sarcomas (USs). (A) Kaplan-Meier survival analysis of progression-free survival of high-grade endometrial stromal sarcoma (HGESS), low-grade endometrial stromal sarcoma (LGESS), and uterine leiomyosarcoma (uLMS). (B) Cox multivariate analysis represented that 15 out of the 20 differentially expressed genes identified by the two groups of endometrial stromal sarcoma vs. uLMS and HGESS vs. LGESS were statistically significant with the prognosis of USs. AIC, Akaike information criterion. *p < 0.05, ***p < 0.001. (C) Unsupervised hierarchical clustering analysis of immune scores calculated using xcell clustered the samples into cluster 1, cluster 2, and cluster 3. (D) Kaplan-Meier survival analysis of progression-free survival of cluster 1, cluster 2, and cluster 3 showed no statistical significance (p=0.075). (E) Kaplan-Meier survival analysis of progression-free survival of cluster 1 and clusters 2-3 showed statistical association with prognosis (p=0.025). NK, natural killer.
7. Immune infiltration and prognosis
We used xCell to investigate the immune infiltration of the USs. As shown in Fig. 4C, USs could be divided into three distinct clusters based on immune infiltration analysis. Kaplan-Meier analysis showed that progression-free survival of cluster 1, cluster 2, and cluster 3 had no statistical significance (p=0.075) (Fig. 4D). Additionally, we merged clusters 2 and 3 as a new clusters 2-3, and found that cluster 1 showed significantly improved prognosis compared to cluster 2-3 (p=0.025) (Fig. 4E).
In addition, we evaluated the prognostic value of immune scores in USs using the EPIC, quanTIseq, and ssGSEA methods. Unsupervised hierarchical clustering analysis showed that the USs samples could be divided into several clusters based on immune infiltration results (Fig. 5A-C). In the EPIC analysis, we noticed that cancer-associated fibroblasts were obviously active in several samples but were not associated with the histological tissue type or tumor stage (Fig. 5D and E). Survival analysis between these clusters suggested no statistically significant differences (Fig. 5F-H). A high concentration of plasmacytoid dendritic cells (pDCs) was related to a positive prognosis, whereas increased amounts of natural killer (NK) cells, M1 macrophages, and monocytes indicated poor prognosis (Fig. 5I and J).

Cluster analysis of patients with uterine sarcomas (USs) using EPIC (estimating the proportion of immune and cancer cells), quanTIseq, and ssGSEA (single-sample gene set enrichment analysis). (A-C) Unsupervised hierarchical clustering analysis of immune scores calculated using EPIC (A), quanTIseq (B), and ssGSEA (C) could cluster the samples into two clusters by EPIC and three clusters by quanTIseq and ssGSEA. (D) Cancer-associated fibroblasts showed no statistical significance among high-grade endometrial stromal sarcoma (HGESS), low-grade endometrial stromal sarcoma (LGESS), and uterine leiomyosarcoma (uLMS) (Wilcoxon test). (E) Cancer-associated fibroblasts showed no association with clinical stage of USs (Wilcoxon test). MDSC, myeloid-derived suppressor cell. (F-H) Kaplan-Meier survival analysis of progression-free survival of the clusters identified using EPIC (F), quanTIseq (G), and ssGSEA (H) showed no statistical significance with prognosis. (I) Cox multivariate analysis indicated natural killer (NK) cell and macrophage M1 calculated using quanTIseq were related with poor prognosis. (J) Cox multivariate analysis showed increased immune scores of monocyte and plasmacytoid dendritic cell calculated using ssGSEA method were related with negative and positive prognosis, respectively. AIC, Akaike information criterion. *p < 0.05, **p < 0.01.
Next, we investigated immune cell infiltration in each cluster. Hematopoietic stem cells showed significantly increased immune infiltration in cluster 1 (p adjust < 0.001). CD4 + Th1 T cells (p adjust < 0.001), activated myeloid dendritic cells (mDCs), and macrophages (p adjust < 0.05) were increased in clusters 2-3 (S7A Fig.). Hazard ratio analysis suggested that the activation of mDCs was an independent negative prognostic factor (S7B Fig.).
8. Molecular mechanisms of distinct prognosis of different sub-populations in USs
To uncover the molecular mechanisms that lead to distinct prognoses in different subpopulations, we performed GSEA. Pathway enrichment results showed that cluster 1 exhibited remarkable activation of basal cell carcinoma and the Hedgehog signaling pathway. Nevertheless, clusters 2-3 demonstrated considerably increased expression of graft versus host disease, allograft rejection, and autoimmune thyroid disease (S7C and S7D Fig.).
9. Performance of the prognostic models
In this study, we developed a deep learning model (MMN-MIL) to predict clinical outcomes in patients with US. For this model, our dataset consisted of 54 US patients, and the WSI preprocessing is shown in Fig. 6A. First, the pathological H&E sections were converted to WSI and then uniformly processed at 4× magnification. The tumor area was extracted, cut into rectangles, and divided into patches with a resolution of 224×224 pixels. Each patch was assigned the same label as the WSI. In total, we obtained 55,341 valid patches, 90% of which were used as the training set, and 10% as the test set. We used prediction accuracy and AUC as evaluation indicators.

Flowchart of deep learning study and performance of the MMN-MIL (Max-Mean Non-Local multiple instance learning) model. (A) Whole slide image (WSI) preprocessing. The tumor region in the WSI was found and cut into rectangles, and divided into patches with a resolution of 224×224, finally the patch bag was obtained from the WSI. (B) The framework of MMN-MIL. Each patch was extracted as a 512-dimensional patch embedding through the self-supervised feature extraction network. The bag embedding of the patch bag contains all the patch embeddings. The MMN-MIL aggregator took the bag embedding as input and outputs the progression-free survival (PFS) classification probability. (C) Receiver operating characteristic (ROC) curve of the MMN-MIL model showed satisfactory prediction in the prognosis of uterine sarcomas (area under the ROC curve [AUC] reached 0.909). (D) ROC curves of different deep learning models. DSMIL, dual-stream multiple instance learning.
As shown in Fig. 6B, the proposed MMN-MIL method is a two-step algorithm. For the test set, our approach achieved an accuracy of 0.804 and an AUC of 0.909 (Fig. 6C). We compared our method to three other competitive methods: DSMIL, mean pooling, and non-local only methods and found our approach achieved at least a 2% improvement in accuracy compared to these competitive methods. MMN-MIL successfully extracted the features required for PFS classification from the pathological images and achieved high accuracy. As shown in Fig. 6D, the MMN-MIL method achieved better performance, whereas the model using NonLocal only performed relatively poorly. Our model reached an AUC of 0.909, which was 1.3%, 3.1%, and 2.6% higher than those of the comparison methods, respectively (Table 1), which strongly validates the robustness of our method.
Discussion
USs are rare malignant mesenchymal tumors. The diverse nature of USs entails significant variations in their biological behavior and prognosis, imparting crucial diagnostic significance to their classification in clinical settings, which is a vital foundation for crafting personalized treatment strategies, especially in the case of malignant USs, where early diagnosing and precise subtyping are imperative for enhancing patient survival benefits [1]. USs are typically identified through histological examination following a hysterectomy or myomectomy. Nevertheless, the diagnostic accuracy of USs can be diverse due to a variety of factors, including the expertise of the pathologists involved, the quality of staining techniques, and others. Recently, the development of molecular diagnostic technologies has enabled the use of specific molecular biomarkers in the clinical diagnosis of USs. In HGESS, YWHAE-NUTM2 and ZC3H7B-BCOR fusions are frequently identified, whereas JAZF1-SUZ12 gene fusion is predominantly found in LGESS, highlighting the molecular heterogeneity of these tumors [7,8]. Standard treatment for USs are as follows: for resectable USs, consider re-exploration/reresection or completion salpingo-oophorectomy if applicable, for unresectable USs, consider systemic therapy and/or palliative external beam radiation therapy with or without brachytherapy [6].
In the present study, we enrolled 71 patients with USs, and found approximately 50% of LGESS cases were detected with JAZF1-SUZ12 gene fusion, ZC3H7B-BCOR gene fusion was found in 34.8% of patients with HGESS, consistent with previous research [14,15]. We report a new MPRS18A–PDC-AS1 gene fusion site that was detected only in uLMS and adenocarcinoma. The MRPS18A–PDC-AS1 fusion consists of a translocation fusion between exon 1 of MRPS18A and exon 1 of PDC-AS1 located on chromosomes 6 and 1, respectively. Because the fusion fragment found in this study was small and the depth was low, we confirmed through IGV verification that the fusion site was the true fusion. MRPS18A encodes a mitochondrial ribosomal protein, which has been reported to be upregulated in hepatocellular carcinoma and breast cancer, but has not been reported in USs [16,17]. The MRPS18A–PDC-AS1 gene fusion found in our study, which has not been reported so far, might be involved in mitochondrial energy metabolism and could be used across the full course of uLMS management. Firstly, this gene fusion could be detected in the early phase of oncogenesis and served as a sentinel biomarker, facilitating the timely detection and diagnosis of the disease. Moreover, tracking the expression of this gene alteration may be helpful for the early identification of tumor recurrence or metastasis. Second, MRPS18A–PDC-AS1 gene fusion could be used as a distinctive biomarker to help the diagnosis of uLMS from a spectrum of other uterine malignancies, thereby bolstering the diagnostic precision. In addition, identifying this novel gene fusion helps to investigate the unique biological signature and intrinsic pathogenesis of tumors, as well as identify possible therapeutic targets. However, further study is needed to validate its clinical effectiveness and practicality.
For US special DEGs, we identified 797 uDEGs in the ESS vs. uLMS group, and these genes were enriched mainly in the Hedgehog and Wnt signaling pathways, which are reported to be closely associated with tumorigenesis, proliferation, and metastasis of sarcomas [18,19]. Compared to LGESS, HSD17B2 was upregulated in HGESS, whereas CD164L2, PAGE4, CYP3A7, SLC6A2, PABPC1L2A, and B3GALT5-AS1 were downregulated. All identified genes were first reported in uterine sarcomas.
For prognostic gene markers, we identified 15 uDEGs including LAMB4, ATP7B, EML5, SARDH, PLAG1, NCCRP1, and CACNA1B as poor prognostic indicators and PLEKHH1, GPLD1, TRNP1, ZNF423, POTEG, ACSBG1, SLC18A2, and ARL9 as positive prognostic markers. All the prognostic genes identified in this study were innovatively reported as prognostic factors for USs.
The tumor microenvironment is composed of cellular (including tumor cells, immune cells, and stromal cells) and non-cellular components, and plays a key role in the development of tumors and affects the clinical outcome of patients [20]. In this study, we analyzed the tumor immune infiltration of each US subtype and found that mDC infiltration was an independent negative prognostic factor for USs. As a subpopulation of antigen-presenting cells, mDCs can induce the generation of immunosuppressive Th2 cells in favor of tumor-protective Th1 cells [21]. An autologous vaccine containing primary mDCs resulted in improved PFS in patients with melanoma [22]. Immune score analysis showed that an increased number of pDCs was associated with a positive prognosis, while NK cells, M1 macrophages, and monocytes were significantly associated with negative prognosis, providing a valuable reference for the prognostic evaluation of USs.
Finally, we developed a deep learning model called MMN-MIL to help predict the outcomes of patients with US based on a small sample size of 54 patients. This model showed satisfactory prediction accuracy (0.804), with an AUC of 0.909. The MMN-MIL model can effectively predict the clinical development of patients with sarcoma and provide a reference for clinicians to treat patients as early as possible. However, due to our small training and validation sets, the broad application of MMN-MIL still needs large sample size and prospective cohort for further validation.
There were several limitations to our study: (1) a single medical center study; (2) a relatively small sample size of USs and (3) a short follow-up period of the US patients. In the future study, we plan to keep follow up the survival of the US patients, validate the diagnostic value of MRPS18A–PDC-AS1 gene fusion in US, confirm the prognostic value of the genes and immune cells we reported, and apply the MMN-MIL model to a larger scale of US to achieve better performance.
In conclusion, this study comprehensively analyzed the genome fusions and transcription of 71 US patients, reported a novel gene fusion site of MRPS18A–PDC-AS1 in US for the first time, which could be used as a potential marker for the differential diagnosis of uLMS and ESS; presented the differential expression genes in US and identified five genes with prognostic value in US, including LAMB4, OR4NA, POTEC, DNAH2, and POTEM, which were innovatively reported as prognostic factors for US; revealed the immune infiltration of uLMS and ESS, and reported the prognostic role of mDCs, pDCs, NK cells, macrophage M1, monocytes and hematopoietic stem cells in US for the first time. Moreover, we developed a deep learning model to effectively predict the clinical outcome of patients with US. Our study provides a valuable reference for understanding and investigating this rare malignant tumor.
Electronic Supplementary Material
Supplementary materials are available at Cancer Research and Treatment website (https://www.e-crt.org).
Notes
Ethical Statement
This study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Ethics Committee of the Affiliated Hospital of Qingdao University (Project identification code: QYFY KYLL 917211920, Date: 2022-04-06). The requirement of informed consent for this study was waived according to the observational design.
Author Contributions
Conceived and designed the analysis: Song Y, Ma D, Tai Z, Xing X.
Collected the data: Li G, Zhang Z, Jia H, Tai Z, Xing X.
Contributed data or analysis tools: Liu Y, Wang J, Hu Y, Hao F, Liu X, Xing X.
Performed the analysis: Song Y, Liu Y, Jia H, Zhang C, Xie Y, Tai Z, Xing X.
Wrote the paper: Song Y, Xing X.
Conflicts of Interest
We would like to thank all the patients and their families. We would like to thank the Wuhan Kingwise Biotechnology Company for technology support.
Acknowledgements
This work was supported by the National Natural Science Foundation of China (grant number 81972329, 82003374), the Natural Science Foundation of Shandong Province (grant number ZR2020QH139), the China Postdoctoral Science Foundation funded project (grant number 2023M741860), Qingdao Municipal Science and Technology Bureau Project (grant number 23-2-8-smjk-19-nsh), South District of Qingdao Municipal Science and Technology Bureau Project (grant number 2022-2-003-YY), and the Postdoctoral Innovation Project of Qingdao (grant number QDBSH20220201055).