Data formats allowed by PlantTFcat:
|Pure Sequence: a single sequence without FASTA head (may occupy multi-lines)
Users are allowed to submit protein or nucleic acid sequences for their analysis in FASTA format or single pure sequence. Each nucleic acid sequence will be translated into 6 peptide sequences by ORF and only the translated peptide sequence with at least 35 continuous amino aicd residues will be considered as valid sequences for further analysis. The range of valid length is 50-10000 letters for each sequence and currently the PlantTFcat allows 500000 sequences (or 200M in size) to be uploaded once at most.
Please note: don't mix peptide sequences and nucleic acid sequences in one submission.
Install backend pipeline in local computer:
- Minimal Hardware: 64bit/4core processor, 4GB memory and 100G HDD.
- OS: Redhat/CentOS 6.x 64-bit version.
- OpenJDK or Oracle/Sun JDK >=1.6 Release. The default installation of CentOS includes OpenJDK, running "java -version" will report "OpenJDK Runtime Environment......". If your computer complains "command not found", please run "yum -y install java*" to install OpenJDK from remote software repository.
- Under Linux terminal, type and run "mkdir /opt"
- Download pipeline (with source code) and save it under /opt, the filename is PlantTFcat_pipeline.tar.gz
- type and run "cd /opt"
- type and run "tar -xzvf PlantTFcat_pipeline.tar.gz"
- To try sample data, type and run "/opt/PlantTFcat/PlantTFcat.sh -q /opt/PlantTFcat/sample/test_data_pep.fa" and the last line of pipeline output is the location of the tab-delimited prediction result file.
- The pipeline supports multi-threads computing, so multiple-core processors will significantly speed up analysis.
- Always use the full pathname to call pipeline (e.g. /opt/PlantTFcat/PlantTFcat.sh) and specify sequence file name (e.g. /opt/PlantTFcat/sample/test_data_pep.fa)!!!!