multi-FASTA format (for small RNAs and target transcripts): |
>AT1G27360.1 AAGGTATCTATTTGCCTAGCCAGAGTTATATATAGGATTGATTGTCTAGTCTTTTCTTAT ATGATTTTTGTTCTCATTTACTAATCAAAGTTCTGCAAACTTGTAGTTGTTGTAGGATTT GTTGCTCTGGCTCTGGTGGTAGGTCTATGAAATCAACCCATATCGTGAATGGACTGCAAC ATGGTATCTTCGTCCCAGTGGGATTGGGAGCATTTGATCATGTCCAATCCGTCAAGGACT GAAGATGACAGCAAACAG >AT1G27360.4 | Symbols: | squamosa promoter-binding protein CTGGGTGAAACATAGAAAAGTTTCTCTTGCTCAAGTTAATGATAAAAGGGTGAGAGCAAT AAACGCTGATAAGCCTTGTCTGGTCCTTGGAATTTTGAATTTTCTTTTTCTATCTTACTT ATAGTATTGGTAGTTGAGGGTGTCGTCGATAAGTTGTTGTAGGATTTGTTGCTCTGGCTC TGGTGGTAGGTCTATGAAATCAACCCATATCGTGAATGGACTGCAACATGGTATCTTCGT CCCAGTGGGATTGGGAGCATTTGATCATGTCCAATCCGTCAAGGACTGAAGATGACAGCA AACAGCTACCTACTGAGTGGGAAATTGAAAAAGGTGAAGGAATTGAATCTATAGTTCCAC ATTTCTCAGGCCTTGAGAGAGTCAGTAGTGGCTCTGCCACCAGCTTCTGGCACACTGCTG TATCGAAAAGCTCACAGTCGACCTCTATCAACTCATCATCTCCCGAAGCCAAACGATGCA AGCTTGCATCAGA |
Short Tags: for small RNA sequences, one sequence per line |
UGACAGAAGAGAGUGAGCAC UUGACAGAAGAUAGAGAGCAC UCCCAAAUGUAGACAAAGCA UGUGUUCUCAGGUCACCCCUU UGUGUUCUCAGGUCACCCCUG UGUGUUCUCAGGUCACCCCUG UGGUAGCAGUAGCGGUGGUAA AAGCUCAGGAGGGAUAGCGCC AAGCUCAGGAGGGAUAGCGCC |
Pure Sequence: a single target transcript sequence without FASTA head (may occupy multi-lines) |
CTGGGTGAAACATAGAAAAGTTTCTCTTGCTCAAGTTAATGATAAAAGGGTGAGAGCAAT AAACGCTGATAAGCCTTGTCTGGTCCTTGGAATTTTGAATTTTCTTTTTCTATCTTACTT ATAGTATTGGTAGTTGAGGGTGTCGTCGATAAGTTGTTGTAGGATTTGTTGCTCTGGCTC TGGTGGTAGGTCTATGAAATCAACCCATATCGTGAATGGACTGCAACATGGTATCTTCGT CCCAGTGGGATTGGGAGCATTTGATCATGTCCAATCCGTCAAGGACTGAAGATGACAGCA AACAGCTACCTACTGAGTGGGAAATTGAAAAAGGTGAAGGAATTGAATCTATAGTTCCAC ATTTCTCAGGCCTTGAGAGAGTCAGTAGTGGCTCTGCCACCAGCTTCTGGCACACTGCTG TATCGAAAAGCTCACAGTCGACCTCTATCAACTCATCATCTCCCGAAGCCAAACGATGCA AGCTTGCATCAGA |
Before analysis, the back-end pipeline will check submitted small RNAs, mainly including miRNA and phasiRNA (sRNA) sequences, by the following standards:
Users are allowed to submit target candidate sequences of their interest in this section. A typical target transcript sequence can be a cDNA, EST, unigene, mRNA, genomic segment, etc. The server will search possible target sites on these submitted target candidates for (submitted or preloaded) small RNA sequences (mainly including miRNA and ta-siRNA, sic passim). Before analysis, the back-end pipeline will check these submitted sequences by the following standards:
The raw NGS data need to be preprocessed before submission. For the miRNA sequenced by NGS, users should first convert them into either FASTA format or short tags (see above examples). To reduce data size, users need to filter sequences by length to keep only those with 19 to 25 NT. Redundant sequences can be removed to further reduce data size. For the mRNA transcript (target candidates) sequenced by NGS, we recommend de novo transcriptome assembly, which will generate longer contig and improve prediction quality. The workload for the analysis server will also be reduced.
The V1 scoring schema [PMID:21622958] was developed bansed on an animal model from a series of research papers at an early stage. One of the major features is that the seed region is from No. 2-8 bp only, and there is no limit for the number of mismatches occurring in the seed region. In our early study, the V1 schema can identify all of the validated miRNA-target pairs (usually by 5'-RACE) in our curated dataset if the maximum expectation is set to 5.0. In psRNATarget, we set the default value of maximum expectation to 3.0 for compatibility reasons.
We improved the default scoring schema based on the curated dataset, including the validated miRNA-target pairs, after the V1 schema was published. The improved schema (V2, 2017 release) can find more curated miRNA-target pairs from the updated dataset without a significant increase in total output. In the V2 schema, the seed region has been extended to No. 2-13 bp, and the maximum number of mismatches (excluding G-U) allowed in the seed region has been restricted to two. In addition, the analysis of target accessibility has been disabled since its value did not change the final output. The default maximum expectation is set to 5.0, which recalls 93% of validated miRNA-target pairs, compared with the 86% recall rate reached by the V1 schema with the same cutoff.
The user may change settings to handle the special case of target recognition. For example, some miRNA-target interactions may accommodate long INDEL, so penalty for opening gap can be reduced to display more of such interactions. Extra weight in seed region can also be increased to give more weight to seed region recognition. Calculate target accessibility can be enabled to consider the effect of mRNA secondary structure on target recognition. Refer to the information below to adjust the schema.
Expectation value is the penalty for the mismatches between mature small RNA and the target sequence. A higher value indicates less similarity (and possibility) between small RNA and the target candidate. The default penalty rule is set up by the scoring schema. Maximum expectation is the cutoff; any small RNA-target pair with an expectation less than the cutoff will be discarded in the final result. The recommended values are 3.0-5.0 depending on the scoring schema.
The length of the region in which the server will score complementarity between small RNA and the target transcript. The recommended range for hspsize is 19-20. Be aware that the scoring algorithm will only penalize mismatches in this region(from No. 1 to No.hspsize nt) and that subsequent mismatches will be ignored. In addition, the submitted small RNAs will be removed if they are shorter than the HSP value.
The number of top (the best) target gene candidates that will be listed for each submitted small RNA.
The accessibility of the mRNA target site to small RNA has been identified as one of the important factors involved in target recognition because the secondary structure (stem etc.) around the target site will prevent small RNA (including miRNA and ta-siRNA, sic passim) and the mRNA target from having contact. The psRNATarget server employes RNAup to calculate target accessibility, which is represented by the energy required to open (unpair) the secondary structure around the target site (usually the complementary region with small RNA and up/downstream) on target mRNA(see figure below). Less energy means more possibility that small RNA is able to contact (and cleave) target mRNA.
In the above figure,
Besides the target site (complementary region with small RNA) itself, its two flanks on mRNA are required to be opened in the secondary structure for small RNA's (including miRNA and ta-siRNA, sic passim) binding and cleavage (see two red up-arrows in the following figure). The reason is that small RNA binds to the target mRNA in the groove of the RISC complex, which needs extra space on two sides of the target site. Kertesz et al. (2007, PMID:17893677) suggested that 17 upstream and 13 downstream nucleotides of the target site should be considered in target accessibility analysis.
In addition to cleaving mRNA, plant miRNA reportedly inhibits the translation of target genes. This often happens if any mismatch occurs in around the center of
the complementary region because the central region is essential for cleavage (Brodersen et al., 2008, PMID: 18483398). This mechanism is different from translational
inhibition of animal miRNA, although the latter also inhibits gene expression at the translational level.
The users are allowed to set the coordinates of the central region in which any mismatch will be reported as the trigger of translational inhibition.
The two-hits model (Axtell et al., 2005; PMID:17081978) suggests that miRNA or ta-siRNA may have multiple target sites (i.e., complementary regions) on a specific target transcript, which will increase recognition activity of the miRNA/ta-siRNA to the mRNA target. The server will report the number of target sites for each small RNA/target pair. Users are advised to select an sRNA/target pair with more target sites.