psRNATarget: A Plant Small RNA Target Analysis Server (2017 Update)

Data formats allowed by psRNATarget:

multi-FASTA format (for small RNAs and target transcripts):

>AT1G27360.1
AAGGTATCTATTTGCCTAGCCAGAGTTATATATAGGATTGATTGTCTAGTCTTTTCTTAT
ATGATTTTTGTTCTCATTTACTAATCAAAGTTCTGCAAACTTGTAGTTGTTGTAGGATTT
GTTGCTCTGGCTCTGGTGGTAGGTCTATGAAATCAACCCATATCGTGAATGGACTGCAAC
ATGGTATCTTCGTCCCAGTGGGATTGGGAGCATTTGATCATGTCCAATCCGTCAAGGACT
GAAGATGACAGCAAACAG
>AT1G27360.4 | Symbols:  | squamosa promoter-binding protein
CTGGGTGAAACATAGAAAAGTTTCTCTTGCTCAAGTTAATGATAAAAGGGTGAGAGCAAT
AAACGCTGATAAGCCTTGTCTGGTCCTTGGAATTTTGAATTTTCTTTTTCTATCTTACTT
ATAGTATTGGTAGTTGAGGGTGTCGTCGATAAGTTGTTGTAGGATTTGTTGCTCTGGCTC
TGGTGGTAGGTCTATGAAATCAACCCATATCGTGAATGGACTGCAACATGGTATCTTCGT
CCCAGTGGGATTGGGAGCATTTGATCATGTCCAATCCGTCAAGGACTGAAGATGACAGCA
AACAGCTACCTACTGAGTGGGAAATTGAAAAAGGTGAAGGAATTGAATCTATAGTTCCAC
ATTTCTCAGGCCTTGAGAGAGTCAGTAGTGGCTCTGCCACCAGCTTCTGGCACACTGCTG
TATCGAAAAGCTCACAGTCGACCTCTATCAACTCATCATCTCCCGAAGCCAAACGATGCA
AGCTTGCATCAGA

Short Tags: for small RNA sequences, one sequence per line

UGACAGAAGAGAGUGAGCAC
UUGACAGAAGAUAGAGAGCAC
UCCCAAAUGUAGACAAAGCA
UGUGUUCUCAGGUCACCCCUU
UGUGUUCUCAGGUCACCCCUG
UGUGUUCUCAGGUCACCCCUG
UGGUAGCAGUAGCGGUGGUAA
AAGCUCAGGAGGGAUAGCGCC
AAGCUCAGGAGGGAUAGCGCC

Pure Sequence: a single target transcript sequence without FASTA head (may occupy multi-lines)

CTGGGTGAAACATAGAAAAGTTTCTCTTGCTCAAGTTAATGATAAAAGGGTGAGAGCAAT
AAACGCTGATAAGCCTTGTCTGGTCCTTGGAATTTTGAATTTTCTTTTTCTATCTTACTT
ATAGTATTGGTAGTTGAGGGTGTCGTCGATAAGTTGTTGTAGGATTTGTTGCTCTGGCTC
TGGTGGTAGGTCTATGAAATCAACCCATATCGTGAATGGACTGCAACATGGTATCTTCGT
CCCAGTGGGATTGGGAGCATTTGATCATGTCCAATCCGTCAAGGACTGAAGATGACAGCA
AACAGCTACCTACTGAGTGGGAAATTGAAAAAGGTGAAGGAATTGAATCTATAGTTCCAC
ATTTCTCAGGCCTTGAGAGAGTCAGTAGTGGCTCTGCCACCAGCTTCTGGCACACTGCTG
TATCGAAAAGCTCACAGTCGACCTCTATCAACTCATCATCTCCCGAAGCCAAACGATGCA
AGCTTGCATCAGA

User-submitted small RNA sequence(s):

Before analysis, the back-end pipeline will check submitted small RNAs, mainly including miRNA and phasiRNA (sRNA) sequences, by the following standards:

A valid sequence can be only FASTA or short-tag format (see above figures).
At most, 50M sRNA sequences can be analyzed at once by the pipeline, and maximal submission size is 25MiB.
Sequence minimum length is equal to the HSP value, which is 19 for scoring schema V2, 20 for scoring schema V1 or up to your choice in a customized schema.
Sequence maximum length is 25 or HSP+5, whichever is greater. So it should be 25 for both scoring schemas.
Unqualified small RNA will be ignored by the pipeline.
Only A, T, C, G, U and N are valid sequence letters; a sequence containing other letters will be ignored.

For the FASTA format, we suggest only letters, numbers and minor/underscore characters in the sequence ID. In addition, please avoid long sequences ID, such as an ID longer than 50 letters, because they may disrupt the web display.

User-submitted target candidate sequence(s):

Users are allowed to submit target candidate sequences of their interest in this section. A typical target transcript sequence can be a cDNA, EST, unigene, mRNA, genomic segment, etc. The server will search possible target sites on these submitted target candidates for (submitted or preloaded) small RNA sequences (mainly including miRNA and ta-siRNA, sic passim). Before analysis, the back-end pipeline will check these submitted sequences by the following standards:

A valid sequence can be only FASTA format or a single sequence without FASTA header (pure sequence; see above figures).
At most, 5M target candidate sequences can be analyzed at once by the pipeline, and maximal submission size is 100MiB.
A target candidate sequence should be between 50 and 5M nucleotides in length, and the pipeline will ignore sequences outside this range.
Only A,T, C, G, U and N are valid sequence letters; other characters will be deleted or changed to N.

For the FASTA format, we suggest only letters, numbers, and monor/underscore characters in the sequence ID. In addition, please avoid long sequence ID, such as an ID longer than 50 letters, because they may disrupt the web display.

Preprocess of Next-Generation-Sequencing (NGS) Data:

The raw NGS data need to be preprocessed before submission. For the miRNA sequenced by NGS, users should first convert them into either FASTA format or short tags (see above examples). To reduce data size, users need to filter sequences by length to keep only those with 19 to 25 NT. Redundant sequences can be removed to further reduce data size. For the mRNA transcript (target candidates) sequenced by NGS, we recommend de novo transcriptome assembly, which will generate longer contig and improve prediction quality. The workload for the analysis server will also be reduced.

Scoring schema V1 (2011):

The V1 scoring schema [PMID:21622958] was developed bansed on an animal model from a series of research papers at an early stage. One of the major features is that the seed region is from No. 2-8 bp only, and there is no limit for the number of mismatches occurring in the seed region. In our early study, the V1 schema can identify all of the validated miRNA-target pairs (usually by 5'-RACE) in our curated dataset if the maximum expectation is set to 5.0. In psRNATarget, we set the default value of maximum expectation to 3.0 for compatibility reasons.

Scoring schema V2 (2017):

We improved the default scoring schema based on the curated dataset, including the validated miRNA-target pairs, after the V1 schema was published. The improved schema (V2, 2017 release) can find more curated miRNA-target pairs from the updated dataset without a significant increase in total output. In the V2 schema, the seed region has been extended to No. 2-13 bp, and the maximum number of mismatches (excluding G-U) allowed in the seed region has been restricted to two. In addition, the analysis of target accessibility has been disabled since its value did not change the final output. The default maximum expectation is set to 5.0, which recalls 93% of validated miRNA-target pairs, compared with the 86% recall rate reached by the V1 schema with the same cutoff.

User-customized schema:

The user may change settings to handle the special case of target recognition. For example, some miRNA-target interactions may accommodate long INDEL, so penalty for opening gap can be reduced to display more of such interactions. Extra weight in seed region can also be increased to give more weight to seed region recognition. Calculate target accessibility can be enabled to consider the effect of mRNA secondary structure on target recognition. Refer to the information below to adjust the schema.

Maximum expectation:

Expectation value is the penalty for the mismatches between mature small RNA and the target sequence. A higher value indicates less similarity (and possibility) between small RNA and the target candidate. The default penalty rule is set up by the scoring schema. Maximum expectation is the cutoff; any small RNA-target pair with an expectation less than the cutoff will be discarded in the final result. The recommended values are 3.0-5.0 depending on the scoring schema.

Length for complementarity scoring (hspsize):

The length of the region in which the server will score complementarity between small RNA and the target transcript. The recommended range for hspsize is 19-20. Be aware that the scoring algorithm will only penalize mismatches in this region(from No. 1 to No.hspsize nt) and that subsequent mismatches will be ignored. In addition, the submitted small RNAs will be removed if they are shorter than the HSP value.

Number of top target genes for each small RNA:

The number of top (the best) target gene candidates that will be listed for each submitted small RNA.

Target accessibility-maximum energy to unpair the target site (UPE):

The accessibility of the mRNA target site to small RNA has been identified as one of the important factors involved in target recognition because the secondary structure (stem etc.) around the target site will prevent small RNA (including miRNA and ta-siRNA, sic passim) and the mRNA target from having contact. The psRNATarget server employes RNAup to calculate target accessibility, which is represented by the energy required to open (unpair) the secondary structure around the target site (usually the complementary region with small RNA and up/downstream) on target mRNA(see figure below). Less energy means more possibility that small RNA is able to contact (and cleave) target mRNA.

In the above figure, represents the energy that is required to open the secondary structure around the target site. We use software, namely RNAup, described by Muckstein et al. (2005, pmid:16446276) to calculate this value, denoted as UPE.

Flanking length around target site for target accessibility analysis:

Besides the target site (complementary region with small RNA) itself, its two flanks on mRNA are required to be opened in the secondary structure for small RNA's (including miRNA and ta-siRNA, sic passim) binding and cleavage (see two red up-arrows in the following figure). The reason is that small RNA binds to the target mRNA in the groove of the RISC complex, which needs extra space on two sides of the target site. Kertesz et al. (2007, PMID:17893677) suggested that 17 upstream and 13 downstream nucleotides of the target site should be considered in target accessibility analysis.

Translation inhibition range:

In addition to cleaving mRNA, plant miRNA reportedly inhibits the translation of target genes. This often happens if any mismatch occurs in around the center of the complementary region because the central region is essential for cleavage (Brodersen et al., 2008, PMID: 18483398). This mechanism is different from translational inhibition of animal miRNA, although the latter also inhibits gene expression at the translational level.

The users are allowed to set the coordinates of the central region in which any mismatch will be reported as the trigger of translational inhibition.

Multiplicity of target site:

The two-hits model (Axtell et al., 2005; PMID:17081978) suggests that miRNA or ta-siRNA may have multiple target sites (i.e., complementary regions) on a specific target transcript, which will increase recognition activity of the miRNA/ta-siRNA to the mRNA target. The server will report the number of target sites for each small RNA/target pair. Users are advised to select an sRNA/target pair with more target sites.