AtSubP: the Arabidopsis Subcellular Localization Prediction Server   
  About |  AtSubP |  Datasets |  Appendix |  Help
Location:   Appendix

APPENDIX - I
 
Table 1: The following table contains some localization examples where AtSubP predicts correctly; whereas TargetP, the widely used prediction system (e.g. by TAIR, MIPS, PLASdb etc. for Arabidopsis annotation), gives wrong or no prediction. Please note: the following list is not exhaustive as we include only few examples from the experimentally-proved sequences available at the Arabidopsis Subcellular Database (SUBA). Moreover, to increase the confidence level, we further filter this list containing sequences NOT used in the training of AtSubP system; i.e. the GFP data for TAIR IDs numbered 1 - 17 are taken from SUBA (also present in our independent test set-II). Similarly, we take some examples (TAIR IDs 18 - 23) from Dr. Niko Geldner's wave list published recently (Geldner et al., 2009).
Furthermore, AtSubP has been used by some of the researchers at our Foundation in their own projects and performed green fluorescent protein (GFP) fusions on such previously 'unknown' proteins (TAIR IDs 24 - 26). Interestingly, these proteins were also wrongly predicted by TargetP, and accordingly, we have included them in the following table.

No. TAIR ID AtSubP prediction TargetP prediction Experimental annotation / GFP data

1 AT2G28900.1 Chloroplast -- Chloroplast (SUBA)
2 AT3G26570.1 Chloroplast -- Chloroplast (SUBA)
3 AT3G47070.1 Chloroplast -- Chloroplast (SUBA)
4 AT4G02510.1 Chloroplast -- Chloroplast (SUBA)
5 AT4G15530.5 Chloroplast -- Chloroplast (SUBA)
6 AT4G38240.1 Golgi body Secretory pathway* Golgi body (SUBA)
7 AT1G79230.1 Mitochondrion Chloroplast Mitochondrion (SUBA)
8 AT4G05020.1 Mitochondrion -- Mitochondrion (SUBA)
9 AT1G20960.1 Nucleus Mitochondrion Nucleus (SUBA)
10 AT1G54060.1 Nucleus Chloroplast Nucleus (SUBA)
11 AT1G54850.1 Nucleus Chloroplast Nucleus (SUBA)
12 AT1G55310.1 Nucleus Chloroplast Nucleus (SUBA)
13 AT2G45640.1 Nucleus Mitochondrion Nucleus (SUBA)
14 AT3G58660.1 Nucleus Chloroplast Nucleus (SUBA)
15 AT5G44500.1 Nucleus Mitochondrion Nucleus (SUBA)
16 AT1G71830.1 Cell membrane Secretory pathway* Cell membrane (SUBA)
17 AT3G54140.1 Cell membrane -- Cell membrane (SUBA)

18 AT1G43890.1 Golgi body Secretory pathway* Post-golgi (Geldner's wave list)
19 AT3G24350.1 Golgi body -- Golgi (Geldner's wave list)
20 AT3G11730.1 Golgi body Secretory pathway* Post-golgi (Geldner's wave list)
21 AT1G02130.1 Golgi body Secretory pathway* Golgi (Geldner's wave list)
22 AT5G50440.1 Golgi body -- Golgi (Geldner's wave list)
23 AT4G00430.1 Cell membrane -- Cell membrane (Geldner's wave list)

24 AT1G12600.1 Cell membrane -- Cell membrane (Noble foundation)
25 AT5G42290.1 Nucleus -- Nucleus (Noble foundation)
26 AT1G36925.1 Nucleus Chloroplast Nucleus (Noble foundation)

* For these sequences, TargetP does provide partially correct information, e.g. Secretory pathway; however, AtSubP predicts the exact localization (e.g. Golgi, Cell membrane etc.) matching with the GFP data.


APPENDIX - II
 
Table 2: The following table contains some localization examples where AtSubP predicts correctly, even when the query sequence has wrong or no PSI-BLAST hit. Although, our best classifier is a result of integrated approach combining machine learning techniques and similarity-based information, it is NOT purely based on the sequence similarity. For example, the pure similarity-based PSI-BLAST approach achieved only 78% accuracy; however, the integrated approach achieved more than 90% accuracy when machine learning technique was implemented (see Figure 1 in manuscript and Supplementary material for details).
Secondly, AtSubP predicts 'correctly' for those sequences also where incorrect or no similarity-based information was available. To better address this, we provide here some examples from independent testing sets I (from Swiss-Prot) and II (experimentally-proved GFP data from SUBA); where AtSubP predicts 'correctly' even when it gives an incorrect or no Psi-Blast hits (see table below). Please note: to increase the confidence level, we have included only those sequences which are NOT used in the original training/testing of the AtSubP system.
Similarly, we take some examples (TAIR IDs 27 - 29) from Dr. Niko Geldner's wave list published recently (Geldner et al., 2009). The 3 previously 'unknown' proteins validated by our internal wet labs at the Noble Foundation as mentioned above in Table 1 also gave wrong or no Psi-Blast hits and so, further included in the list below.

No. TAIR ID AtSubP prediction PSI-BLAST hit Hit ID Experimental annotation / GFP data

1 AT5G17990.1 Chloroplast No hit -- Chloroplast (Swiss-Prot)
2 AT1G03600.1 Chloroplast No hit -- Chloroplast (Swiss-Prot)
3 AT1G28150.1 Chloroplast No hit -- Chloroplast (Swiss-Prot)
4 AT4G37920.1 Chloroplast No hit -- Chloroplast (Swiss-Prot)
5 AT1G79440.1 Mitochondrion Cytoplasm AL1A1_SHEEP Mitochondrion (Swiss-Prot)
6 AT1G49410.1 Mitochondrion No hit -- Mitochondrion (Swiss-Prot)
7 AT5G48720.2 Nucleus No hit -- Nucleus (Swiss-Prot)
8 AT4G18375.2 Nucleus Cytoplasm / Nucleus* IF2B1_CHICK Nucleus (Swiss-Prot)
9 AT4G28100.1 Cell membrane No hit -- Cell membrane (Swiss-Prot)
10 AT5G67130.1 Cell membrane No hit -- Cell membrane (Swiss-Prot)

11 AT2G21280.1 Chloroplast No hit -- Chloroplast (SUBA)
12 AT2G28900.1 Chloroplast No hit -- Chloroplast (SUBA)
13 AT3G26570.1 Chloroplast Cell membrane Y1401_PYRAB Chloroplast (SUBA)
14 AT3G47070.1 Chloroplast No hit -- Chloroplast (SUBA)
15 AT4G02510.1 Chloroplast Nucleus MDN1_YEAST Chloroplast (SUBA)
16 AT4G09020.1 Chloroplast Cytoplasm (Potential) GLGX_ERWCH Chloroplast (SUBA)
17 AT4G27700.1 Chloroplast No hit -- Chloroplast (SUBA)
18 AT5G42480.1 Chloroplast No hit -- Chloroplast (SUBA)
19 AT3G13790.1 Extracellular Cell wall / Secreted* INV1_DAUCA Extracellular (SUBA)
20 AT1G45000.1 Nucleus Cytoplasm / Nucleus* PRS10_SPETR Nucleus (SUBA)
21 AT1G54060.1 Nucleus No hit -- Nucleus (SUBA)
22 AT1G54850.1 Nucleus No hit -- Nucleus (SUBA)
23 AT1G55310.1 Nucleus Cytoplasm / Nucleus* PABP_DEBHA Nucleus (SUBA)
24 AT3G04880.1 Nucleus No hit -- Nucleus (SUBA)
25 AT1G71830.1 Cell membrane Membrane; Single-pass type I* EPHB5_CHICK Cell membrane (SUBA)
26 AT3G54140.1 Cell membrane Membrane; Single-pass type I* S15A4_RAT Cell membrane (SUBA)

27 AT1G43890.1 Golgi body Cell membrane / Cytoplasm YPTC1_CHLRE Golgi body (Geldner's wave list)
28 AT3G11730.1 Golgi body Cell membrane / Cytoplasm RIC1_ORYSJ Golgi body (Geldner's wave list)
29 AT1G02130.1 Golgi body Cell membrane / Cytoplasm RIC1_ORYSJ Golgi body (Geldner's wave list)

30 AT1G12600.1 Cell membrane Golgi body S35B3_ANOGA Cell membrane (Noble foundation)
31 AT5G42290.1 Nucleus No hit -- Nucleus (Noble foundation)
32 AT1G36925.1 Nucleus No hit -- Nucleus (Noble foundation)

# TAIR IDs from No. 1 - 10 are from Independent test set-I; 11 - 26 from Independent test set-II; 27 - 29 examples from Geldner's Wave list; and 30 -32 from our internal wet labs as mentioned above.
* For these dual location hits, we consider them as wrong hits (e.g. in our original binary coding also from Psi-Blast results). IDs numbered 25 & 26 also considered wrong hits as Swiss-Prot assigns 'Membrane' and 'Cell membrane' into two different classes.


 
© 2012 by The Samuel Roberts Noble Foundation, Inc.