2.9 Differential expression analysis
2.9.1 Fitting the model
To fit the model, use the lmFit()
function, which takes in the normalised data object and the design matrix:
fit <- lmFit(y,design)
2.9.2 Comparisons
To have a look at the pairwise comparison(s) of interest, we may need to create a constrast matrix. This is usually always the case when there are multiple factors being compared. To do this, we use the makeConstrasts()
function:
cont.matrix <- makeContrasts(liver-brain, levels=design)
cont.matrix
## Contrasts
## Levels liver - brain
## brain -1
## liver 1
Refit the model using the comparisons defined:
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)
options(digits = 3)
dim(fit2)
## [1] 15050 1
2.9.3 Extract top DE features
The topTable
function summarises the output from limma in a table format. Significant DE genes for a particular comparison can be identified by selecting genes with a p-value smaller than a chosen cut-off value and/or a fold change greater than a chosen value in this table. By default the table will be sorted by increasing adjusted p-value, showing the most significant DE genes at the top.
Set the threshold values:
THRES_PVAL <- 0.01
THRES_FC <- 3
colnames(fit$coefficients)
## [1] "brain" "liver"
Get the output table for the 10 most significant DE genes for this comparison: \(liver vs brain\).
comparison='liver - brain'
topTable(fit2, coef=comparison)
## logFC AveExpr t P.Value adj.P.Val B
## 3240 10.818058 7.853145 100.71027 5.603924e-14 7.444180e-10 19.18750
## 5265 10.152815 7.487355 93.96364 9.892598e-14 7.444180e-10 19.01571
## 213 11.221081 10.966981 71.22374 9.577714e-13 2.059208e-09 18.47777
## 2335 6.797037 8.578822 67.73067 1.445862e-12 2.439635e-09 18.46460
## 338 11.093595 8.521278 71.76248 9.004441e-13 2.059208e-09 18.29472
## 2243 11.101339 7.402822 79.82534 3.763856e-13 1.461517e-09 18.09575
## 2244 11.177598 7.179453 75.38332 6.016490e-13 1.810963e-09 17.71017
## 229 10.583832 6.610364 79.51871 3.884431e-13 1.461517e-09 17.65905
## 125 10.915521 7.395653 67.65636 1.458918e-12 2.439635e-09 17.57556
## 716 7.466281 6.769933 59.39239 4.238953e-12 4.253083e-09 17.54615
Get the full table (n
is the number of genes in the fit):
limma.result <- topTable(fit2, coef=comparison, n=nrow(fit2))
## Get significant DEGs only (adjusted p-value < THRES_PVAL)
limma.sigP.DEG <- topTable(fit2, coef=comparison, n=nrow(fit2), p.val=THRES_PVAL)
dim(limma.sigP.DEG)
## [1] 8190 6
head(limma.sigP.DEG,10)
## logFC AveExpr t P.Value adj.P.Val B
## 3240 10.818058 7.853145 100.71027 5.603924e-14 7.444180e-10 19.18750
## 5265 10.152815 7.487355 93.96364 9.892598e-14 7.444180e-10 19.01571
## 213 11.221081 10.966981 71.22374 9.577714e-13 2.059208e-09 18.47777
## 2335 6.797037 8.578822 67.73067 1.445862e-12 2.439635e-09 18.46460
## 338 11.093595 8.521278 71.76248 9.004441e-13 2.059208e-09 18.29472
## 2243 11.101339 7.402822 79.82534 3.763856e-13 1.461517e-09 18.09575
## 2244 11.177598 7.179453 75.38332 6.016490e-13 1.810963e-09 17.71017
## 229 10.583832 6.610364 79.51871 3.884431e-13 1.461517e-09 17.65905
## 125 10.915521 7.395653 67.65636 1.458918e-12 2.439635e-09 17.57556
## 716 7.466281 6.769933 59.39239 4.238953e-12 4.253083e-09 17.54615
Get significant DEG with low adjusted p-values and high fold change
limma.sigFC.DEG <- subset(limma.sigP.DEG, logFC > THRES_FC)
dim(limma.sigFC.DEG)
## [1] 1291 6
head(limma.sigFC.DEG,10)
## logFC AveExpr t P.Value adj.P.Val B
## 3240 10.818058 7.853145 100.71027 5.603924e-14 7.444180e-10 19.18750
## 5265 10.152815 7.487355 93.96364 9.892598e-14 7.444180e-10 19.01571
## 213 11.221081 10.966981 71.22374 9.577714e-13 2.059208e-09 18.47777
## 2335 6.797037 8.578822 67.73067 1.445862e-12 2.439635e-09 18.46460
## 338 11.093595 8.521278 71.76248 9.004441e-13 2.059208e-09 18.29472
## 2243 11.101339 7.402822 79.82534 3.763856e-13 1.461517e-09 18.09575
## 2244 11.177598 7.179453 75.38332 6.016490e-13 1.810963e-09 17.71017
## 229 10.583832 6.610364 79.51871 3.884431e-13 1.461517e-09 17.65905
## 125 10.915521 7.395653 67.65636 1.458918e-12 2.439635e-09 17.57556
## 716 7.466281 6.769933 59.39239 4.238953e-12 4.253083e-09 17.54615
Write the limma output table for significant genes to a tab-delimited file:
filename <- paste0("DEG_",comparison,
"_pval_",THRES_PVAL,
"_logFC_", THRES_FC, ".txt");
write.table(limma.sigFC.DEG, file=file.path(RESULTS_DIR,filename),
row.names=T, quote=F, sep="\t")
filename
## [1] "DEG_liver - brain_pval_0.01_logFC_3.txt"
Get the number of DE genes between technical group 1 and technical group 2 (all Brain samples) with adj pvalue<0.01
- Create a new design matrix for limma with the technical replicate groups
- Re-normalise the read counts with ‘voom’ function with new design matrix
- Fit a linear model on these normalised data
- Make the contrast matrix corresponding to the new set of parameters
- Fit the contrast matrix to the linear model
- Compute moderated t-statistics of differential expression
- Get the output table for the 10 most significant DE genes for this comparison
- Get the number of genes significantly DE (adjusted p-value < 0.01)