1Department of Medical Sciences, Graduate School of The Catholic University of Korea, Seoul, Korea
2Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul, Korea
3Department of Physiology, Ajou University School of Medicine, Suwon, Korea
4Center for Precision Medicine, Seoul National University Hospital, Seoul, Korea
5Department of Genomic Medicine, Seoul National University Hospital, Seoul, Korea
6Department of Life Science, Dongguk University, Seoul, Korea
7Department of Precision Medicine and Big Data, The Catholic University of Korea, Seoul, Korea
8Precision Medicine Research Center, College of Medicine, The Catholic University of Korea, Seoul, Korea
9Cancer Evolution Research Center, College of Medicine, The Catholic University of Korea, Seoul, Korea
10CMC Institute for Basic Medical Science, The Catholic Medical Center of The Catholic University of Korea, Seoul, Korea
Copyright © 2024 by the Korean Cancer Association
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Author Contributions
Conceived and designed the analysis: Park H, Park J, Woo HG, Yoon H, Lee M, Hong D.
Collected the data: Park H and Hong D.
Contributed data or analysis tools: Park H, Park J, Hong D.
Performed the analysis: Park H, Park J, Woo HG, Yoon H, Lee M, Hong D.
Wrote the paper: Park H, Park J, Hong D.
Conflicts of Interest
Conflict of interest relevant to this article was not reported.
Original text | ||
---|---|---|
⑥ (Genomic information) Except for a few exceptional cases as outlined below, whether pseudonymization is possible is deferred (usable only based on individual consent, excluding exceptions). | ||
※ Genomic information may contain information about third parties such as parents, ancestors, siblings, offspring, relatives, etc., and until appropriate pseudonymization methods are developed, deferring the determination of pseudonymization feasibility is appropriate. | ||
1) Presence or absence of genetic mutations related to widely known diseases: | ||
- The risk of individual re-identification is significantly reduced by providing information at the level of genes, not specific mutation details (e.g., Loci). | ||
* (Example) Study on the treatment response of patients with B gene mutations when using anticancer drugs. | ||
2) Newly acquired mutation information of neoplasms with the removal of germline mutation information: | ||
- The newly generated mutation information, with the removal of germline mutations (normal tissue mutations), contains only mutation information that causes cancer, ensuring no risk of individual identification. | ||
· Neoplasm: Abnormal cell proliferation known as a tumor. | ||
⑦ (Omics* information excluding genomics) No separate measures required. | ||
* (Example) Metabolomics, proteomics, etc. | ||
- Unlike genomic information, metabolomics, proteomics, etc., do not allow the recovery of genomic information, making separate measures unnecessary. However, transcriptomics is subject to deferral regarding pseudonymization feasibility since genomic information may be recoverable. | ||
Revised version | ||
Genomic Data: | ||
※ The methods outlined in this guideline do not apply to human-derived materials collected and processed with consent for research or donation purposes. | ||
- For human-derived materials collected by medical institutions and subjected to NGS-based genetic tests, generating SAM/BAM/VCF files and test records, the following appropriate methods should be employed: | ||
1. Nucleic acid sequence information: Rare variant information (germline) and short tandem repeat (STR) information that pose personal identification risks should be deleted or appropriately processed if unrelated to the processing purpose, by either partial deletion or substitution, among other suitable methods. | ||
2. Information excluding the above nucleic acid sequences: Metadata or unstructured strings (or codes) listed in target files and records, which contain information posing personal identification risks or specific information, should be partially deleted, or appropriately processed, either in part or in full, by substitution among other suitable methods. | ||
- When considering the use of raw data, such as FASTQ files generated through NGS-based genetic tests on human-derived materials collected for medical purposes, it is recommended that consent from the data subject be obtained. | ||
- The FASTQ file enables any data processor to generate files like SAM/BAM/VCF, which record chromosome numbers, positions, and variant information through mapping nucleotide sequence information for each sequencing read on the standard reference genome. | ||
- Genomic data, containing nucleotide sequences among other information, has inherent limitations in fully interpreting the contained data, thus posing constraints on reducing the identification risk of the data itself. Since it may include information about third parties such as parents, siblings, and relatives, a crucial step involves restricting the utilization environment through a risk assessment of processing conditions (such as access control management and the establishment of closed environments), especially when compared to other types of information. | ||
Omics Data | ||
- Metabolomics and proteomics data, which cannot be used to reconstruct genomic information, do not require separate measures. Likewise, no separate measures are needed when utilizing expression matrix values of transcriptomes generated through NGS-based genetic tests on human-derived materials collected by medical institutions for diagnostic purposes. | ||
- For data generated through NGS-based genetic testing of human-derived samples collected by medical institutions, excluding expression matrix values, which contain information posing personal identification risks, appropriate measures should be taken by deleting or replacing personal identification information, personally identifiable information, and specific information, either in part or in full, using appropriate methods. |
NGS, next-generation sequencing.