Differentially expressed gene analysis of RNA-seq data using R

2.7 Normalisation

Since we want to make between sample comparisons, we need to normalize the dataset.

2.7.1 Defining the model matrix

Limma requires a design matrix to be created for the DE analysis. This is created using model.matrix() function and formula notation in R. It is required in all linear modeling.

design <- model.matrix(~0 + expr.design$tissue, data=expr.design)
colnames(design) <- levels(expr.design$tissue)
design

##           brain liver
## ERR420386     1     0
## ERR420387     1     0
## ERR420388     0     1
## ERR420389     0     1
## ERR420390     0     1
## ERR420391     1     0
## ERR420392     1     0
## ERR420393     0     1
## attr(,"assign")
## [1] 1 1
## attr(,"contrasts")
## attr(,"contrasts")$`expr.design$tissue`
## [1] "contr.treatment"

Now, we can normalise the dataset using the following commands. The calcNormFactors(), calculates the normalization factors to scale the library sizes.

The limma package (since version 3.16.0) offers the voom function that will normalise read counts and apply a linear model to the normalised data before computing moderated t-statistics of differential expression.

The returned data object consists of a few attributes, which you can check using names(y), one of which is the normalised expression (y$E) values in log2 scale.

library(limma)

dge <- DGEList(filtered.raw.counts)
dge <- calcNormFactors(dge)
y <- voom(dge, design)
norm.expr <- y$E

write.table(norm.expr, file=file.path(RESULTS_DIR, "normalised_counts.txt"), 
            row.names=T, quote=F, sep="\t")

boxplot(norm.expr, 
        col=group.colours,
        main="Distribution of normalised counts",
        xlab="",
        ylab="log2 normalised expression",
        las=2,cex.axis=0.8)

Challenge 1. Add in the legend to the plot above (hint: see code for previous boxplot)

Can you put the boxplots side by side to show before and after normalisation? (hint: mfrow=c(X,X))

Solution

2.7.2 MA-plots

par(mfrow=c(2,2))
for (ix in 1:2) {
  plotMD(dge,ix)
  abline(h=0,col='grey')
  plotMD(y,ix)
  abline(h=0,col='grey')
}