- Five-fold cross-validation test dataset: The learning dataset contains
900 transporter proteins classified into substrate specific class study
(amino acid transporter, anion transporter, cation transporter, electron transporter, protein/mRNA transporter, sugar transporter
and other transporter) and 660 non-transporters according to the annotationinformation available in Swiss-Prot. Both the Swiss-Prot IDs
and sequences are provided in FASTA format. None of the proteins has >70% sequence identity to any other substrate class or within same class.
Click Amino Acid transporter (70 sequences), Anion transporter (60 sequences),
Cation transporter (260 sequences), Electron transporter (60 sequences),
Protein/mRNA (70 sequences), Sugar transporter (60 sequences), and
Other transporter (200 sequences) and non-transporters (600 sequences) to download the
training/testing dataset for each substrate specific class.
- Proteome annotation: We used our best model to predict the transporter at proteome level for Human, Drosophila, Yeast, E. coli, and Arabidopsis. We collect protein sequences from Swissprot, which are full-length and experimentally annotated. Number of protein sequences are 19254, 3198, 21142, 7794 and 12197 respectively for Human, Drosophila, E. Coli., Yeast, and Arabidopsis.
E. coli, and
are available in table form. "Yes" means belong to that class while "No" means not belongs to that class.