AtSubP Analysis

AtSubP: the Arabidopsis Subcellular Localization Prediction Server

	About \|	AtSubP \|	Datasets \|	Appendix \|	Help
Location: Appendix

APPENDIX - I


	Table 1: The following table contains some localization examples where AtSubP predicts correctly; whereas TargetP, the widely used prediction system (e.g. by TAIR, MIPS, PLASdb etc. for Arabidopsis annotation), gives wrong or no prediction. Please note: the following list is not exhaustive as we include only few examples from the experimentally-proved sequences available at the Arabidopsis Subcellular Database (SUBA). Moreover, to increase the confidence level, we further filter this list containing sequences NOT used in the training of AtSubP system; i.e. the GFP data for TAIR IDs numbered 1 - 17 are taken from SUBA (also present in our independent test set-II). Similarly, we take some examples (TAIR IDs 18 - 23) from Dr. Niko Geldner's wave list published recently (Geldner et al., 2009). Furthermore, AtSubP has been used by some of the researchers at our Foundation in their own projects and performed green fluorescent protein (GFP) fusions on such previously 'unknown' proteins (TAIR IDs 24 - 26). Interestingly, these proteins were also wrongly predicted by TargetP, and accordingly, we have included them in the following table.

No.

TAIR ID

AtSubP prediction

TargetP prediction

Experimental annotation / GFP data

1	AT2G28900.1	Chloroplast	--	Chloroplast (SUBA)
2	AT3G26570.1	Chloroplast	--	Chloroplast (SUBA)
3	AT3G47070.1	Chloroplast	--	Chloroplast (SUBA)
4	AT4G02510.1	Chloroplast	--	Chloroplast (SUBA)
5	AT4G15530.5	Chloroplast	--	Chloroplast (SUBA)
6	AT4G38240.1	Golgi body	Secretory pathway^*	Golgi body (SUBA)
7	AT1G79230.1	Mitochondrion	Chloroplast	Mitochondrion (SUBA)
8	AT4G05020.1	Mitochondrion	--	Mitochondrion (SUBA)
9	AT1G20960.1	Nucleus	Mitochondrion	Nucleus (SUBA)
10	AT1G54060.1	Nucleus	Chloroplast	Nucleus (SUBA)
11	AT1G54850.1	Nucleus	Chloroplast	Nucleus (SUBA)
12	AT1G55310.1	Nucleus	Chloroplast	Nucleus (SUBA)
13	AT2G45640.1	Nucleus	Mitochondrion	Nucleus (SUBA)
14	AT3G58660.1	Nucleus	Chloroplast	Nucleus (SUBA)
15	AT5G44500.1	Nucleus	Mitochondrion	Nucleus (SUBA)
16	AT1G71830.1	Cell membrane	Secretory pathway^*	Cell membrane (SUBA)
17	AT3G54140.1	Cell membrane	--	Cell membrane (SUBA)

18	AT1G43890.1	Golgi body	Secretory pathway^*	Post-golgi (Geldner's wave list)
19	AT3G24350.1	Golgi body	--	Golgi (Geldner's wave list)
20	AT3G11730.1	Golgi body	Secretory pathway^*	Post-golgi (Geldner's wave list)
21	AT1G02130.1	Golgi body	Secretory pathway^*	Golgi (Geldner's wave list)
22	AT5G50440.1	Golgi body	--	Golgi (Geldner's wave list)
23	AT4G00430.1	Cell membrane	--	Cell membrane (Geldner's wave list)

24	AT1G12600.1	Cell membrane	--	Cell membrane (Noble foundation)
25	AT5G42290.1	Nucleus	--	Nucleus (Noble foundation)
26	AT1G36925.1	Nucleus	Chloroplast	Nucleus (Noble foundation)

^* For these sequences, TargetP does provide partially correct information, e.g. Secretory pathway; however, AtSubP predicts the exact localization (e.g. Golgi, Cell membrane etc.) matching with the GFP data.

APPENDIX - II


	Table 2: The following table contains some localization examples where AtSubP predicts correctly, even when the query sequence has wrong or no PSI-BLAST hit. Although, our best classifier is a result of integrated approach combining machine learning techniques and similarity-based information, it is NOT purely based on the sequence similarity. For example, the pure similarity-based PSI-BLAST approach achieved only 78% accuracy; however, the integrated approach achieved more than 90% accuracy when machine learning technique was implemented (see Figure 1 in manuscript and Supplementary material for details). Secondly, AtSubP predicts 'correctly' for those sequences also where incorrect or no similarity-based information was available. To better address this, we provide here some examples from independent testing sets I (from Swiss-Prot) and II (experimentally-proved GFP data from SUBA); where AtSubP predicts 'correctly' even when it gives an incorrect or no Psi-Blast hits (see table below). Please note: to increase the confidence level, we have included only those sequences which are NOT used in the original training/testing of the AtSubP system. Similarly, we take some examples (TAIR IDs 27 - 29) from Dr. Niko Geldner's wave list published recently (Geldner et al., 2009). The 3 previously 'unknown' proteins validated by our internal wet labs at the Noble Foundation as mentioned above in Table 1 also gave wrong or no Psi-Blast hits and so, further included in the list below.

No.

TAIR ID

AtSubP prediction

PSI-BLAST hit

Hit ID

Experimental annotation / GFP data

1	AT5G17990.1	Chloroplast	No hit	--	Chloroplast (Swiss-Prot)
2	AT1G03600.1	Chloroplast	No hit	--	Chloroplast (Swiss-Prot)
3	AT1G28150.1	Chloroplast	No hit	--	Chloroplast (Swiss-Prot)
4	AT4G37920.1	Chloroplast	No hit	--	Chloroplast (Swiss-Prot)
5	AT1G79440.1	Mitochondrion	Cytoplasm	AL1A1_SHEEP	Mitochondrion (Swiss-Prot)
6	AT1G49410.1	Mitochondrion	No hit	--	Mitochondrion (Swiss-Prot)
7	AT5G48720.2	Nucleus	No hit	--	Nucleus (Swiss-Prot)
8	AT4G18375.2	Nucleus	Cytoplasm / Nucleus^*	IF2B1_CHICK	Nucleus (Swiss-Prot)
9	AT4G28100.1	Cell membrane	No hit	--	Cell membrane (Swiss-Prot)
10	AT5G67130.1	Cell membrane	No hit	--	Cell membrane (Swiss-Prot)

11	AT2G21280.1	Chloroplast	No hit	--	Chloroplast (SUBA)
12	AT2G28900.1	Chloroplast	No hit	--	Chloroplast (SUBA)
13	AT3G26570.1	Chloroplast	Cell membrane	Y1401_PYRAB	Chloroplast (SUBA)
14	AT3G47070.1	Chloroplast	No hit	--	Chloroplast (SUBA)
15	AT4G02510.1	Chloroplast	Nucleus	MDN1_YEAST	Chloroplast (SUBA)
16	AT4G09020.1	Chloroplast	Cytoplasm (Potential)	GLGX_ERWCH	Chloroplast (SUBA)
17	AT4G27700.1	Chloroplast	No hit	--	Chloroplast (SUBA)
18	AT5G42480.1	Chloroplast	No hit	--	Chloroplast (SUBA)
19	AT3G13790.1	Extracellular	Cell wall / Secreted^*	INV1_DAUCA	Extracellular (SUBA)
20	AT1G45000.1	Nucleus	Cytoplasm / Nucleus^*	PRS10_SPETR	Nucleus (SUBA)
21	AT1G54060.1	Nucleus	No hit	--	Nucleus (SUBA)
22	AT1G54850.1	Nucleus	No hit	--	Nucleus (SUBA)
23	AT1G55310.1	Nucleus	Cytoplasm / Nucleus^*	PABP_DEBHA	Nucleus (SUBA)
24	AT3G04880.1	Nucleus	No hit	--	Nucleus (SUBA)
25	AT1G71830.1	Cell membrane	Membrane; Single-pass type I^*	EPHB5_CHICK	Cell membrane (SUBA)
26	AT3G54140.1	Cell membrane	Membrane; Single-pass type I^*	S15A4_RAT	Cell membrane (SUBA)

27	AT1G43890.1	Golgi body	Cell membrane / Cytoplasm	YPTC1_CHLRE	Golgi body (Geldner's wave list)
28	AT3G11730.1	Golgi body	Cell membrane / Cytoplasm	RIC1_ORYSJ	Golgi body (Geldner's wave list)
29	AT1G02130.1	Golgi body	Cell membrane / Cytoplasm	RIC1_ORYSJ	Golgi body (Geldner's wave list)

30	AT1G12600.1	Cell membrane	Golgi body	S35B3_ANOGA	Cell membrane (Noble foundation)
31	AT5G42290.1	Nucleus	No hit	--	Nucleus (Noble foundation)
32	AT1G36925.1	Nucleus	No hit	--	Nucleus (Noble foundation)

^# TAIR IDs from No. 1 - 10 are from Independent test set-I; 11 - 26 from Independent test set-II; 27 - 29 examples from Geldner's Wave list; and 30 -32 from our internal wet labs as mentioned above.

^* For these dual location hits, we consider them as wrong hits (e.g. in our original binary coding also from Psi-Blast results). IDs numbered 25 & 26 also considered wrong hits as Swiss-Prot assigns 'Membrane' and 'Cell membrane' into two different classes.

	About \|	AtSubP \|	Datasets \|	Appendix \|	Help
Location: Appendix