Shortcuts to body

About

Introduction

The Korean Cancer Genomics (KCG) Portal serves as a comprehensive resource for advancing cancer research through high-quality, large-scale omics data from Korean populations. Built upon the Korea BioData Station (K-BDS) infrastructure, the KCG Portal systematically extracts, curates, and reanalyzes raw genomic and transcriptomic datasets to deliver harmonized, analysis-ready resources for the research community.

Through rigorous quality control (QC) processes and standardized analytical pipelines, KCG establishes a gold-standard reference for Korean cancer omics data. This unified approach enables reproducible and comparative studies, supporting both domestic research initiatives and international collaborative efforts in precision oncology and population-specific cancer genomics.

KCG Overview

The KCG Portal provides access to a systematically curated and quality-assured collection of cancer-related multi-omics datasets. The database is structured into three complementary components—Projects, Repositories, and Guides—each designed to support distinct aspects of data exploration, access, and research application.

Component Description
Projects This section organizes cancer omics data sourced from the Korea Sequence Read Archive (KRA) and Korea Expression Archive (KEA) repositories into thematic, project-based collections. Each project aggregates datasets by tumor type, cohort, or research focus, presenting summarized analytical results, comprehensive metadata, and key biological insights to facilitate targeted research inquiries.
Repositories The repository page offers a comprehensive catalog of all quality-controlled and preprocessed datasets generated through standardized analytical pipelines. Users can explore and download genomic, transcriptomic, and epigenomic results—including variant calls, expression matrices, and methylation profiles—accompanied by detailed metadata to support secondary analysis and integrative studies.
Guides The guide section provides complete documentation of the analytical workflows and pipelines applied to process the data. This includes tool specifications, parameter settings, and methodological details, ensuring full transparency and enabling users to reproduce or adapt the analyses performed within KCG.

Additionally, the KCG Dashboard on the home page delivers an at-a-glance overview of the entire dataset collection. Through interactive charts and summary tables—displaying statistics on sample counts by cancer type, sequencing platforms, data modalities, and quality metrics—users can rapidly assess and navigate the overall composition and scope of the KCG Portal.

KCG Resources

1. Raw Data Types
The KCG Portal integrates multiple types of raw cancer omics data extracted from the Korean BioData Station (K-BDS), specifically the KRA and KEA repositories. These datasets encompass a wide range of high-throughput sequencing and array platforms, including:

Whole Genome Sequencing (WGS)
Whole Exome Sequencing (WES)
RNA Sequencing (RNA-seq)
Methylation Sequencing (Methyl-seq)
Methylation Array (Methyl-array)

All raw data underwent stringent quality control to ensure only high-quality samples were retained for downstream analyses.

2. Analysis Pipelines
KCG employs standardized, internationally recognized analytical workflows tailored to each data type. These pipelines ensure consistency, reproducibility, and comparability across studies.

2.1 DNA Sequencing (WGS / WES)

Map / Align: BWA-MEM
Sort BAM: GATK–Picard
Mark Duplicates: GATK
Base Quality Score Recalibration (BQSR): GATK
Somatic Mutation Calling: Mutect2 (GATK), MuSE, VarScan2, PINDEL

2.2 RNA Sequencing (RNA-seq)

Alignment and Splice Junction Detection: STAR
Gene Expression Quantification: RSEM

2.3 Methylation Sequencing (Methyl-seq)

Alignment: Bismark
Deduplication and Methylation Calling: Bismark

2.4 Methylation Array (Methyl-array)

Data Import and Conversion: Convert raw data into RGChannelSet using minfi
Extraction of Detection p-values
Normalization: Two-way normalization using minfi

Each pipeline’s parameters and configuration details are fully documented in the Guide section, allowing users to review, replicate, and extend KCG analyses with transparency and confidence.

3. Analysis Output Formats

Data type File Formats Description
Somatic Variants .vcf, .vcf.tbi, .maf Variant call files and annotated mutation data
Gene Expression .tsv Normalized expression matrices across samples
Methylation Peaks .bedgraph Methylation signal intensity and peak coordinates

All datasets are accompanied by comprehensive metadata, including project information, sample identifiers, sequencing platform, and quality metrics.

Data Access Usage Policy

The KCG Portal operates under an independent data access and usage policy designed to promote ethical use, transparency, and equitable access to Korean cancer genomics data. This policy ensures that all data resources—whether open or controlled-access—are used responsibly and exclusively for legitimate scientific and educational purposes.

Data Analysis and Participant Privacy

No Re-identification: Under no circumstances may users attempt to re-identify any individual, participant, or sample donor represented in KCG data. This prohibition applies to both open and controlled-access datasets. KCG enforces strict compliance with participant privacy protection and data anonymization standards.

Controlled-Access Data: Certain datasets may contain sensitive or potentially identifiable information. Access to such data requires prior authorization through a Data Use Agreement (DUA). The DUA specifies permissible analyses, publication terms, and data sharing restrictions. Users are obligated to adhere fully to these terms.

Open-Access Data: Open datasets are freely available for research and publication, provided that users maintain the confidentiality of participant identities and comply with all ethical use guidelines established by KCG.

Data Access and Fair Usage

Fair Use Policy: KCG promotes equitable access to computational and data resources among all users. To prevent system overload or abuse, KCG may implement limits on download rates, concurrent connections, or total data volume per user.

Responsible Usage: Users are expected to access and analyze data solely for scientific, educational, or non-commercial research purposes consistent with the mission of KCG. Redistribution or modification of KCG data for unauthorized or commercial purposes is strictly prohibited.

Enforcement and Compliance: Violations of this policy—including data misuse, breach of confidentiality, or attempts at re-identification—may result in suspension of access privileges, termination of user accounts, and potential reporting to institutional or legal authorities.

Transparency and Support

User Support: Researchers are encouraged to contact the KCG support team with inquiries regarding data access procedures, publication guidelines, or compliance with ethical standards. KCG strives to maintain open communication channels to ensure users can responsibly and effectively utilize its data resources.

Data Use Agreements (DUA): For controlled-access data, investigators must review and fully understand their DUA before accessing or downloading any files. It is the user’s responsibility to ensure compliance with institutional review board (IRB) or ethics committee requirements when applicable.