2.6 Prefiltering
Before proceeding with differential expression analysis, it is useful to filter out very lowly expressed genes. This will help increasing the statistical power of the analysis while keeping genes of interest. A common way to do this is by filtering out genes having less than 1 count-per-million reads (cpm) in half the samples.
The edgeR
library provides the cpm
function which can be used here.
library(edgeR)
isexpr <- rowSums(cpm(raw.counts)> 1) >= 4
table(isexpr)
## isexpr
## FALSE TRUE
## 10652 15050
filtered.raw.counts <- raw.counts[isexpr,]
dim(filtered.raw.counts)
## [1] 15050 8
That means that nrow(raw.counts)-nrow(filtered.raw.counts)
are removed.