Shortcuts to body


단백체 데이터
-

  • Accession
    KPX10000264
  • Submission date
    2026-05-27
  • Metadata export

Project Detail
Dataset detail - Dataset Tilte, Species, Sample type, Disease, Offline, SCX, 16 fractions, Trypsin, Quantification, Modifications, Experiment type, MS instrument, Experimental protocol, Data analysis protocol
BioProject
KAP242372
ProjectTitle
AI Model Training for Fucosylation Classification
Description
Protein glycosylation is known to be involved in biological progresses such as cell recognition, growth, differentiation, and apoptosis. Fucosylation of glycoproteins plays an important role for structural stability and function of N-linked glycoproteins. Although many of biological and clinical studies of protein fucosylation by fucosyltransferases has been reported, structural classification of fucosylated N-glycoproteins such as core or outer isoforms remains a challenge. Here, we report for the first time the classification of N-glycopeptides as core- and outer-fucosylated types using tandem mass spectrometry (MS/MS) and machine learning algorithms such as the deep neural network (DNN) and support vector machine (SVM). Training and test sets of more than 800 MS/MS spectra of N-glycopeptides from the immunoglobulin gamma and alpha 1-acid-glycoprotein standards were selected for classification of the fucosylation types using supervised learning models. The best-performing model had an accuracy of more than 99% against manual characterization and area under the curve values greater than 0.99, which were calculated by probability scores from target and decoy datasets. Finally, this model was applied to classify fucosylated N-glycoproteins from human plasma. A total of 82N-glycopeptides, with 54 core-, 24 outer-, and 4 dual-fucosylation types derived from 54 glycoproteins, were commonly classified as the same type in both the DNN and SVM. Specifically, outer fucosylation was dominant in tri- and tetra-antennary N-glycopeptides, while core fucosylation was dominant in the mono-, bi-antennary and hybrid types of N-glycoproteins in human plasma. Thus, the machine learning methods can be combined with MS/MS to distinguish between different isoforms of fucosylated N-glycopeptides.
Keywords
 
Submitter
Hwang Heeyoun , Korea Basic Science Institute
Publication
Publication
PubMed ID DOI
31941975  
32754952  

Dataset Detail
Dataset detail - Description, Keywords, Principal investigato, Pubmed ID, Doi, Dataset Tilte, Species, Sample type, Disease, Offline, SCX, 16 fractions, Trypsin, Quantification, Modifications, Experiment type, MS instrument, Experimental protocol, Data analysis protocol
Dataset Title
Sample 1
Submission Type
Species
Others - IgG standard, Others - AGP standard, Homo sapiens (Human)
Sample type
Sample type
Body fluid Tissue Cell Others
Plasma standard proteins
Disease
Disease
Fractionation
Fractionation
Method Separation mode Number of fractions
Not applicable Others
Digestion
Trypsin
Quantification
Quantification
Labeling Labeling Child Plex
No
Modifications
Carbamidomethyl (C), Oxidation (M)
Modifications
Bottom-up proteomics
MS instrument
Thermo Scientific LTQ Orbitrap Elite
Sample processing protocol
Data analysis protocol
Supplementary information
 
Announce Date
2026-05-26

Files Summary

Total 37 files 3,473,989,394 3,473,989,394

Files Summary
File type # Files Total Size
raw 12 2,067,660,992 2,067,660,992
peakList 24 1,321,022,551 1,321,022,551
searchResultFile 1 85,305,851 85,305,851

File
file 목록
File Name Size Type Published Download mzMl QC file
AGP.zip 85,305,851 85,305,851 zip 2026-05-26
KBSI_AGP_D1_H1_MS1.ms1 48,407,171 48,407,171 ms1 2026-05-26
KBSI_AGP_D1_H1_MS1.ms2 46,780,580 46,780,580 ms2 2026-05-26
KBSI_AGP_D1_H1_MS1.raw 151,821,804 151,821,804 raw 2026-05-26
KBSI_AGP_D1_H2_MS2.ms1 57,242,508 57,242,508 ms1 2026-05-26
KBSI_AGP_D1_H2_MS2.ms2 52,282,025 52,282,025 ms2 2026-05-26
KBSI_AGP_D1_H2_MS2.raw 174,605,585 174,605,585 raw 2026-05-26
KBSI_AGP_D1_H3_MS3.ms1 53,832,647 53,832,647 ms1 2026-05-26
KBSI_AGP_D1_H3_MS3.ms2 49,926,866 49,926,866 ms2 2026-05-26
KBSI_AGP_D1_H3_MS3.raw 166,073,073 166,073,073 raw 2026-05-26
KBSI_AGP_D2_H1_MS1.ms1 56,057,637 56,057,637 ms1 2026-05-26
KBSI_AGP_D2_H1_MS1.ms2 55,576,047 55,576,047 ms2 2026-05-26
KBSI_AGP_D2_H1_MS1.raw 173,127,993 173,127,993 raw 2026-05-26
KBSI_AGP_D2_H2_MS2.ms1 54,320,380 54,320,380 ms1 2026-05-26
KBSI_AGP_D2_H2_MS2.ms2 51,391,002 51,391,002 ms2 2026-05-26
KBSI_AGP_D2_H2_MS2.raw 167,659,847 167,659,847 raw 2026-05-26
KBSI_AGP_D2_H3_MS3.ms1 53,827,924 53,827,924 ms1 2026-05-26
KBSI_AGP_D2_H3_MS3.ms2 54,449,208 54,449,208 ms2 2026-05-26
KBSI_AGP_D2_H3_MS3.raw 168,374,317 168,374,317 raw 2026-05-26
KBSI_AGP_D3_H1_MS1.ms1 52,992,701 52,992,701 ms1 2026-05-26
KBSI_AGP_D3_H1_MS1.ms2 54,119,066 54,119,066 ms2 2026-05-26
KBSI_AGP_D3_H1_MS1.raw 166,649,062 166,649,062 raw 2026-05-26
KBSI_AGP_D3_H2_MS2.ms1 52,607,516 52,607,516 ms1 2026-05-26
KBSI_AGP_D3_H2_MS2.ms2 53,819,200 53,819,200 ms2 2026-05-26
KBSI_AGP_D3_H2_MS2.raw 164,816,663 164,816,663 raw 2026-05-26
KBSI_AGP_D3_H3_MS3.ms1 55,538,480 55,538,480 ms1 2026-05-26
KBSI_AGP_D3_H3_MS3.ms2 57,392,107 57,392,107 ms2 2026-05-26
KBSI_AGP_D3_H3_MS3.raw 173,556,491 173,556,491 raw 2026-05-26
KBSI_AGP_H1_MS1.ms1 58,072,641 58,072,641 ms1 2026-05-26
KBSI_AGP_H1_MS1.ms2 58,539,332 58,539,332 ms2 2026-05-26
KBSI_AGP_H1_MS1.raw 179,720,620 179,720,620 raw 2026-05-26
KBSI_AGP_H2_MS2.ms1 61,906,218 61,906,218 ms1 2026-05-26
KBSI_AGP_H2_MS2.ms2 58,888,015 58,888,015 ms2 2026-05-26
KBSI_AGP_H2_MS2.raw 188,455,673 188,455,673 raw 2026-05-26
KBSI_AGP_H3_MS3.ms1 63,742,836 63,742,836 ms1 2026-05-26
KBSI_AGP_H3_MS3.ms2 59,310,444 59,310,444 ms2 2026-05-26
KBSI_AGP_H3_MS3.raw 192,799,864 192,799,864 raw 2026-05-26