2.6 Prefiltering

Before proceeding with differential expression analysis, it is useful to filter out very lowly expressed genes. This will help increasing the statistical power of the analysis while keeping genes of interest. A common way to do this is by filtering out genes having less than 1 count-per-million reads (cpm) in half the samples.

The edgeR library provides the cpm function which can be used here.

library(edgeR)
isexpr <- rowSums(cpm(raw.counts)> 1) >= 4
table(isexpr)
## isexpr
## FALSE  TRUE 
## 10652 15050
filtered.raw.counts <- raw.counts[isexpr,]
dim(filtered.raw.counts)
## [1] 15050     8

That means that nrow(raw.counts)-nrow(filtered.raw.counts) are removed.