Prost! - A script to quantify and annotate miRNA expression
Prost! (PROcessing of Small Transcripts) is a python script that runs Blast+ to quantify and annotate miRNA expression in chordates and vertebrates with assembled genomes. Prost! works by counting short transcripts within a user-specifiable length range. These counted transcripts are aligned to a user specifiable genome allowing for untemplated additions and editions and then "binned" together based on genomic location. Each bin is then annotated with miRBase mature sequences and hairpins, as well as other types of RNA obtained from Ensembl's Biomart.
Prost! is in the alpha testing stage and has yet to be made user-friendly, currently with many options hard-coded to run on our system, but does produce meaningful output. Output from Prost! using mouse "dicer-seq" datasets as well as the raw sequences that were processed can be found on our FaceBase project page here.
What you'll need to get Prost! running on your system:
- A Linux environment with Python 2.x.x installed (tested with Python 2.6.6)
- Blast+ installed and binaries in your PATH (tested with Blast version 2.2.27+)
- Blast+ database generated from a genome fasta file (no provided due to size constraints)
- Edit the blastn_param file line "-db" to specify the database for the genome you are aligning to that you created above.
- Blast+ database from miRBase's hairpin.fa and mature.fa (provided for mouse)
- Biomart generated fasta file containing RNAs that you'd like to annotate against formatted like this: >geneName|biotype|geneID (provided for mouse)
- Preprocessed fasta files. Quality filtered and barcodes or adapters removed so that the only sequences remaining in the files are full, high quality, short sequences.
- Editing of the file named "filelist" specificying where your input fasta files of short sequences are located, one per line, in the format "fileName descriptiveName" (small samples fasta files provided)
- General knowledge of python in order to edit parameters in prost.py (these will be user-specifiable in a .config file in the future)
- Run Prost! from your Linux command line specifying a temporary output fasta file name and temporary blast output file name i.e. python prost_v.19.py tempOutput.fa tempBlastOutput.txt
Output files of interest:
sample_output.txt- The output of Prost! before the binning step has occured along with counts, annotation, and where this sequence will eventually be binned.
sample_compressed_output.txt- The output of Prost! after the binning step has been completed.
WARNING: This software is still in testing and there are bound to be bugs. One known bug is that the normalized counts are not calculated correctly, so we suggest using the raw counts and then normalizing them with something external from Prost!. Another is that sequences with 3' untemplated additions are not handled correctly in this release and will often be misannotated as having too many locations (TML).