2.3 Quantification
Rsubread provides a read summarization function featureCounts
, which takes two inputs:
- the aligned reads (BAM or SAM) and assigns them to
- genomic features (GTF annotation file)
This gives the number of reads mapped per feature, which can then be normalised and tested for differential expression.
Rsubread comes with in-built annotations for \(mm9\), \(mm10\) and \(hg19\) for users’ convenience, but you can also supply your own annotation file (GTF), see the tip below.
For experiments with lots of samples or on big genomes, this step can also take a while. We will only be performing the feature counts on the subset chr21
BAM file we created previously.
mini.counts <- featureCounts(output.bamFile,
annot.inbuilt="hg19",
isGTFAnnotationFile=FALSE,
isPairedEnd=TRUE)
## NCBI RefSeq annotation for hg19 (build 37.2) is used.
##
## ========== _____ _ _ ____ _____ ______ _____
## ===== / ____| | | | _ \| __ \| ____| /\ | __ \
## ===== | (___ | | | | |_) | |__) | |__ / \ | | | |
## ==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
## ==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
## ========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
## Rsubread 1.24.2
##
## //========================== featureCounts setting ===========================\\
## || ||
## || Input files : 1 BAM file ||
## || P results/RNAseq/mapping/ERR420386.bam ||
## || ||
## || Dir for temp files : . ||
## || Threads : 1 ||
## || Level : meta-feature level ||
## || Paired-end : yes ||
## || Strand specific : no ||
## || Multimapping reads : not counted ||
## || Multi-overlapping reads : not counted ||
## || Min overlapping bases : 1 ||
## || ||
## || Chimeric reads : counted ||
## || Both ends mapped : not required ||
## || ||
## \\===================== http://subread.sourceforge.net/ ======================//
##
## //================================= Running ==================================\\
## || ||
## || Load annotation file /usr/lib64/R/library/Rsubread/annot/hg19_RefSeq_e ... ||
## || Features : 225074 ||
## || Meta-features : 25702 ||
## || Chromosomes/contigs : 52 ||
## || ||
## || Process BAM file results/RNAseq/mapping/ERR420386.bam... ||
## || Paired-end reads are included. ||
## || Assign fragments (read pairs) to features... ||
## || Total fragments : 161297 ||
## || Successfully assigned fragments : 72440 (44.9%) ||
## || Running time : 0.01 minutes ||
## || ||
## || Read assignment finished. ||
## || ||
## \\===================== http://subread.sourceforge.net/ ======================//
Examine the attributes in the returned mini.counts
object:
summary(mini.counts)
## Length Class Mode
## counts 25702 -none- numeric
## annotation 6 data.frame list
## targets 1 -none- character
## stat 2 data.frame list
Find the dimension of the counts
table:
dim(mini.counts$counts)
## [1] 25702 1
mini.counts$counts[1:6,]
## 653635 100422834 645520 79501 729737 100507658
## 0 0 0 0 0 0
Look at the annotations, which corresponds to the rows of the counts
table:
head(mini.counts$annotation)
## GeneID Chr
## 1 653635 chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1
## 2 100422834 chr1
## 3 645520 chr1;chr1;chr1
## 4 79501 chr1
## 5 729737 chr1;chr1;chr1;chr1
## 6 100507658 chr1;chr1;chr1
## Start
## 1 14362;14970;15796;16607;16858;17233;17606;17915;18268;24738;29321
## 2 30366
## 3 34611;35277;35721
## 4 69091
## 5 136698;136953;139790;140075
## 6 319944;320881;321032
## End
## 1 14829;15038;15947;16765;17055;17368;17742;18061;18366;24891;29370
## 2 30503
## 3 35174;35481;36081
## 4 70008
## 5 136805;139696;139847;140566
## 6 320653;320938;321056
## Strand Length
## 1 -;-;-;-;-;-;-;-;-;-;- 1769
## 2 + 138
## 3 -;-;- 1130
## 4 + 918
## 5 -;-;-;- 3402
## 6 +;+;+ 793
featureCounts
also returns a very hand summary of the number of reads that were assigned or unassiged:
mini.counts$stat
## Status results.RNAseq.mapping.ERR420386.bam
## 1 Assigned 72440
## 2 Unassigned_Ambiguity 227
## 3 Unassigned_MultiMapping 0
## 4 Unassigned_NoFeatures 87851
## 5 Unassigned_Unmapped 779
## 6 Unassigned_MappingQuality 0
## 7 Unassigned_FragmentLength 0
## 8 Unassigned_Chimera 0
## 9 Unassigned_Secondary 0
## 10 Unassigned_Nonjunction 0
## 11 Unassigned_Duplicate 0
-
Run the above command to perform counting reads for the bam file and then take a look at the summary output.
-
Lookup the user guide for Rsubread and find the defintions for the status in
mini.counts$stat
table, specificallyUnassigned_Ambiguity
,Unassigned_NoFeatures
andUnassigned_Unmapped
.
If you want to get read counts using another annotation.GTF file, use the annot.ext
parameter. For example: counts <- featureCounts(output.bamFile, annot.ext=“annotation.GTF”, isGTFAnnotationFile=TRUE, isPairedEnd=TRUE)
For the purpose of this workshop the read summarisation step has already been performed for all libraries. You will need to load the corresponding Rdata
file to get these read counts. You can then print out these counts in a text file for future use.
To load the Rdata object file:
load("../data/RNAseq/quantification/rawCounts.Rdata")
summary(counts)
## Length Class Mode
## counts 205616 -none- numeric
## annotation 6 data.frame list
## targets 8 -none- character
## stat 9 data.frame list
Since we will be using it later for DE analysis, let’s create a raw.counts
object to hold only the count data:
raw.counts <- counts$counts
dim(raw.counts)
## [1] 25702 8
raw.counts[1:6,1:4]
## ERR420386 ERR420387 ERR420388 ERR420389
## 653635 47 151 93 112
## 100422834 1 0 0 0
## 645520 1 2 0 0
## 79501 0 0 0 0
## 729737 15 35 47 57
## 100507658 10 5 15 8
Challenge
-
What are the count statistics for the “Brain and Liver” study?
-
create a copy of the stats object and change the counts into proportion per sample.
-
The
stat
information fromfeatureCounts()
will be much easier to digest if we can plot the proportion (or number) of reads assigned and unassigned due to the different status type. Can you create the following plot?
Solution
All lines starting with ##
are comments.
Export out the counts table for every sample into a tab-separated file:
write.table(counts$counts, file=file.path(RESULTS_DIR,"raw_read_counts.txt"),
sep="\t", quote=F, append=F)