단백체 데이터
-
-
AccessionKPX10000264
-
Submission date2026-05-27
-
Metadata export
Project Detail
|
BioProject
|
KAP242372 | ||||||
|---|---|---|---|---|---|---|---|
|
ProjectTitle
|
AI Model Training for Fucosylation Classification | ||||||
|
Description
|
Protein glycosylation is known to be involved in biological progresses such as cell recognition, growth, differentiation, and apoptosis. Fucosylation of glycoproteins plays an important role for structural stability and function of N-linked glycoproteins. Although many of biological and clinical studies of protein fucosylation by fucosyltransferases has been reported, structural classification of fucosylated N-glycoproteins such as core or outer isoforms remains a challenge. Here, we report for the first time the classification of N-glycopeptides as core- and outer-fucosylated types using tandem mass spectrometry (MS/MS) and machine learning algorithms such as the deep neural network (DNN) and support vector machine (SVM). Training and test sets of more than 800 MS/MS spectra of N-glycopeptides from the immunoglobulin gamma and alpha 1-acid-glycoprotein standards were selected for classification of the fucosylation types using supervised learning models. The best-performing model had an accuracy of more than 99% against manual characterization and area under the curve values greater than 0.99, which were calculated by probability scores from target and decoy datasets. Finally, this model was applied to classify fucosylated N-glycoproteins from human plasma. A total of 82N-glycopeptides, with 54 core-, 24 outer-, and 4 dual-fucosylation types derived from 54 glycoproteins, were commonly classified as the same type in both the DNN and SVM. Specifically, outer fucosylation was dominant in tri- and tetra-antennary N-glycopeptides, while core fucosylation was dominant in the mono-, bi-antennary and hybrid types of N-glycoproteins in human plasma. Thus, the machine learning methods can be combined with MS/MS to distinguish between different isoforms of fucosylated N-glycopeptides. | ||||||
|
Keywords
|
|||||||
|
Submitter
|
Hwang Heeyoun , Korea Basic Science Institute | ||||||
|
Publication
|
|
Dataset Detail
|
Dataset Title
|
Sample 1 | ||||||||
|---|---|---|---|---|---|---|---|---|---|
|
Submission Type
|
|||||||||
|
Species
|
Others - IgG standard, Others - AGP standard, Homo sapiens (Human) | ||||||||
|
Sample type
|
|
||||||||
|
Disease
|
|
||||||||
|
Fractionation
|
|
||||||||
|
Digestion
|
Trypsin | ||||||||
|
Quantification
|
|
||||||||
|
Modifications
|
Carbamidomethyl (C), Oxidation (M)
|
||||||||
|
Modifications
|
Bottom-up proteomics
|
||||||||
|
MS instrument
|
Thermo Scientific LTQ Orbitrap Elite
|
||||||||
|
Sample processing protocol
|
|||||||||
|
Data analysis protocol
|
|||||||||
|
Supplementary information
|
|
||||||||
|
Announce Date
|
2026-05-26 |
Files Summary
Total 37 files 3,473,989,394 3,473,989,394
| File type | # Files | Total Size |
|---|---|---|
| raw | 12 | 2,067,660,992 2,067,660,992 |
| peakList | 24 | 1,321,022,551 1,321,022,551 |
| searchResultFile | 1 | 85,305,851 85,305,851 |
| File Name | Size | Type | Published | Download | mzMl QC file | |
|---|---|---|---|---|---|---|
| AGP.zip | 85,305,851 85,305,851 | zip | 2026-05-26 | |||
| KBSI_AGP_D1_H1_MS1.ms1 | 48,407,171 48,407,171 | ms1 | 2026-05-26 | |||
| KBSI_AGP_D1_H1_MS1.ms2 | 46,780,580 46,780,580 | ms2 | 2026-05-26 | |||
| KBSI_AGP_D1_H1_MS1.raw | 151,821,804 151,821,804 | raw | 2026-05-26 | |||
| KBSI_AGP_D1_H2_MS2.ms1 | 57,242,508 57,242,508 | ms1 | 2026-05-26 | |||
| KBSI_AGP_D1_H2_MS2.ms2 | 52,282,025 52,282,025 | ms2 | 2026-05-26 | |||
| KBSI_AGP_D1_H2_MS2.raw | 174,605,585 174,605,585 | raw | 2026-05-26 | |||
| KBSI_AGP_D1_H3_MS3.ms1 | 53,832,647 53,832,647 | ms1 | 2026-05-26 | |||
| KBSI_AGP_D1_H3_MS3.ms2 | 49,926,866 49,926,866 | ms2 | 2026-05-26 | |||
| KBSI_AGP_D1_H3_MS3.raw | 166,073,073 166,073,073 | raw | 2026-05-26 | |||
| KBSI_AGP_D2_H1_MS1.ms1 | 56,057,637 56,057,637 | ms1 | 2026-05-26 | |||
| KBSI_AGP_D2_H1_MS1.ms2 | 55,576,047 55,576,047 | ms2 | 2026-05-26 | |||
| KBSI_AGP_D2_H1_MS1.raw | 173,127,993 173,127,993 | raw | 2026-05-26 | |||
| KBSI_AGP_D2_H2_MS2.ms1 | 54,320,380 54,320,380 | ms1 | 2026-05-26 | |||
| KBSI_AGP_D2_H2_MS2.ms2 | 51,391,002 51,391,002 | ms2 | 2026-05-26 | |||
| KBSI_AGP_D2_H2_MS2.raw | 167,659,847 167,659,847 | raw | 2026-05-26 | |||
| KBSI_AGP_D2_H3_MS3.ms1 | 53,827,924 53,827,924 | ms1 | 2026-05-26 | |||
| KBSI_AGP_D2_H3_MS3.ms2 | 54,449,208 54,449,208 | ms2 | 2026-05-26 | |||
| KBSI_AGP_D2_H3_MS3.raw | 168,374,317 168,374,317 | raw | 2026-05-26 | |||
| KBSI_AGP_D3_H1_MS1.ms1 | 52,992,701 52,992,701 | ms1 | 2026-05-26 | |||
| KBSI_AGP_D3_H1_MS1.ms2 | 54,119,066 54,119,066 | ms2 | 2026-05-26 | |||
| KBSI_AGP_D3_H1_MS1.raw | 166,649,062 166,649,062 | raw | 2026-05-26 | |||
| KBSI_AGP_D3_H2_MS2.ms1 | 52,607,516 52,607,516 | ms1 | 2026-05-26 | |||
| KBSI_AGP_D3_H2_MS2.ms2 | 53,819,200 53,819,200 | ms2 | 2026-05-26 | |||
| KBSI_AGP_D3_H2_MS2.raw | 164,816,663 164,816,663 | raw | 2026-05-26 | |||
| KBSI_AGP_D3_H3_MS3.ms1 | 55,538,480 55,538,480 | ms1 | 2026-05-26 | |||
| KBSI_AGP_D3_H3_MS3.ms2 | 57,392,107 57,392,107 | ms2 | 2026-05-26 | |||
| KBSI_AGP_D3_H3_MS3.raw | 173,556,491 173,556,491 | raw | 2026-05-26 | |||
| KBSI_AGP_H1_MS1.ms1 | 58,072,641 58,072,641 | ms1 | 2026-05-26 | |||
| KBSI_AGP_H1_MS1.ms2 | 58,539,332 58,539,332 | ms2 | 2026-05-26 | |||
| KBSI_AGP_H1_MS1.raw | 179,720,620 179,720,620 | raw | 2026-05-26 | |||
| KBSI_AGP_H2_MS2.ms1 | 61,906,218 61,906,218 | ms1 | 2026-05-26 | |||
| KBSI_AGP_H2_MS2.ms2 | 58,888,015 58,888,015 | ms2 | 2026-05-26 | |||
| KBSI_AGP_H2_MS2.raw | 188,455,673 188,455,673 | raw | 2026-05-26 | |||
| KBSI_AGP_H3_MS3.ms1 | 63,742,836 63,742,836 | ms1 | 2026-05-26 | |||
| KBSI_AGP_H3_MS3.ms2 | 59,310,444 59,310,444 | ms2 | 2026-05-26 | |||
| KBSI_AGP_H3_MS3.raw | 192,799,864 192,799,864 | raw | 2026-05-26 |