smallRNA-seq¶
Overview¶
bcbio supports configurable best-practices pipeline for smallRNA-seq quality controls, adapter trimming, miRNA/isomiR quantification and other small RNA detection.
upload:
dir: ../final
details:
- analysis: smallRNA-seq
algorithm:
aligner: star # any other aligner is supported.
# change adapter according project
adapters: ["TGGAATTCTCGGGTGC"]
expression_caller: [trna, seqcluster, mirdeep2]
# expression_caller: [trna, seqcluster, mirdeep2, mirge] Read docs to know how to use
# miRge tools: https://bcbio-nextgen.readthedocs.io/en/latest/contents/pipelines.html#smallrna-seq
species: hsa
genome_build: hg19
#resources:
# atropos:
# options: ["-u 4", "-u -4"]
# mirge:
# options: ["-lib $PATH_TO_LIBS_FOLDER"]
Adapter trimming:
Sequence alignment:
Specific small RNAs quantification (miRNA/tRNAs…):
seqbuster for miRNA annotation
MINTmap for tRNA fragments annotation
miRge2 for alternative small RNA quantification. To setup this tool, you need to install manually miRge2.0, and download the library data for your species. Read how to install and download the data. If you have
human
folder at/mnt/data/human
the option to pass to resources will be/mnt/data
. Then setupresources
:resources: mirge: options: ["-lib $PATH_TO_PARENT_SPECIES_LIB"]
Quality control: FastQC
Other small RNAs quantification:
mirDeep2 for miRNA prediction
The pipeline generates a RMD template file inside report
folder that can be rendered with knitr. An example of the report is here. Count table (counts_mirna.tst
) from mirbase miRNAs will be inside mirbase
or final project folder. Input files for isomiRs package for isomiRs analysis will be inside each sample in mirbase
folder. If mirdeep2 can run, count table (counts_mirna_novel.tsv
) for novel miRNAs will be inside mirdeep2
or final project folder. tdrmapper results will be inside each sample inside tdrmapper
or final project folder.
Parameters¶
adapters
The 3’ end adapter that needs to be remove. For NextFlex protocol you can addadapters: ["4N", "$3PRIME_ADAPTER"]
. For any other options you can use resources:atropos:options:["-u 4", "-u -4"]
.species
3 letters code to indicate the species in mirbase classification (i.e. hsa for human).aligner
Currently STAR is the only one tested although bowtie can be used as well.expression_caller
A list of expression callers to turn on: trna, seqcluster, mirdeep2, mirgetranscriptome_gtf
An optional GTF file of the transcriptome to for seqcluster.spikein_fasta
A FASTA file of spike in sequences to quantitate.umi_type: 'qiagen_smallRNA_umi'
Support of Qiagen UMI small RNAseq protocol.
Output¶
Project directory:
counts_mirna.tsv
– miRBase miRNA count matrix.counts.tsv
– miRBase isomiRs count matrix. The ID is made of 5 tags: miRNA name, SNPs, additions, trimming at 5 and trimming at 3. Here there is detail explanation of the naming.counts_mirna_novel.tsv
– miRDeep2 miRNA count matrix.counts_novel.tsv
– miRDeep2 isomiRs. See counts.tsv explanation for more detail. count matrix.seqcluster
– output of seqcluster tool. Inside this folder, counts.tsv has count matrix for all clusters found over the genome.seqclusterViz
– input file for interactive browser at https://github.com/lpantano/seqclusterVizreport
– Rmd template to help with downstream analysis like QC metrics, differential expression, and clustering.