본문으로 바로가기

기타 데이터
Allergenicity Prediction

  • Accession
    KGD10764699
  • Submission date
    2026-06-17

Project Detail
Dataset detail - Accession, 프로젝트의 영문 제목, 프로젝트의 국문 제목, 프로젝트의 영문 설명, 프로젝트의 국문 설명
Accession
KAP242381
프로젝트의 영문 제목
Plant-Derived Allergenic Protein Sequences
프로젝트의 국문 제목
-
프로젝트의 영문 설명
A comprehensive plant protein dataset was constructed for allergenicity prediction by integrating plant-derived allergen sequences and non-allergen protein sequences from publicly available allergen databases and UniProt. The allergen dataset includes proteins from plant-related sources such as pollen, seeds, nuts, fruits, vegetables, cereals, legumes, and other plant taxa. Non-allergen proteins were collected from UniProt and assigned to the plant category based on organism-level taxonomy information. Where possible, non-allergen proteins were selected from the same or related plant-source organisms as those represented in the allergen dataset to reduce organism-level bias. This design helps the model focus on allergen-associated sequence features rather than simply distinguishing proteins by biological source. In total, 13,186 plant-derived protein sequences were compiled, including 1,927 allergen sequences and 11,259 non-allergen sequences, providing a curated resource for protein language model-based allergenicity prediction and explainable deep learning analysis.
프로젝트의 국문 설명
-

BioSample
  • Accession
    KAS24201784
  • 생명체 명
    Viridiplantae
  • 샘플 종류
    Plant or fungi

Metadata
범주
기타(직접입력) - Protein Sequences
인체유래데이터여부
NO
제목
Allergenicity Prediction
키워드
Allergenicity prediction, Protein language models, Explainable deep learning
파일 설명
Header: ID, Label, Domain, Database Source; Seq: Protein sequence
공개 날짜
2026-06-09

File