2.4 Exploratory analysis

Rsubread provides the number of reads mapped to each gene which can then be used for ploting quality control figures and for differential expression analysis.

QC figures of the mapped read counts can be plotted and investigated for potential outlier libraries and to confirm grouping of samples.

Before plotting QC figures it is useful to get the experiment design. This will allow labeling of the data with the sample groups they belong to, or any other parameter of interest.

The experiment design file corresponding to this study has been downloaded from the ArrayExpress webpage and formatted as a tab separated file for this analysis purposes. You can find it in the shared directory ../data/Data_Analysis_with_R/RNAseq/raw_data.

EXPMT_DESIGN_FILE <- file.path(RNASeq_DATA_DIR, 'experiment_design.txt')

expr.design <- read.table(EXPMT_DESIGN_FILE, header=T, sep='\t')
rownames(expr.design) <- expr.design$SampleID

#order the design in the same ordering as the counts object
expr.design <- expr.design[colnames(counts$counts),]

expr.design
##            SampleID    Source.Name     organism  sex age tissue
## ERR420386 ERR420386 brain_sample_1 Homo sapiens male  26  brain
## ERR420387 ERR420387 brain_sample_1 Homo sapiens male  26  brain
## ERR420388 ERR420388 liver_sample_1 Homo sapiens male  30  liver
## ERR420389 ERR420389 liver_sample_1 Homo sapiens male  30  liver
## ERR420390 ERR420390 liver_sample_1 Homo sapiens male  30  liver
## ERR420391 ERR420391 brain_sample_1 Homo sapiens male  26  brain
## ERR420392 ERR420392 brain_sample_1 Homo sapiens male  26  brain
## ERR420393 ERR420393 liver_sample_1 Homo sapiens male  30  liver
##           Extract.Name Material.Type Assay.Name technical.replicate.group
## ERR420386       GCCAAT           RNA     Assay4                   group_2
## ERR420387       ACAGTG           RNA     Assay2                   group_1
## ERR420388       GTGAAA           RNA     Assay7                   group_4
## ERR420389       GTGAAA           RNA     Assay8                   group_4
## ERR420390       CTTGTA           RNA     Assay6                   group_3
## ERR420391       ACAGTG           RNA     Assay1                   group_1
## ERR420392       GCCAAT           RNA     Assay3                   group_2
## ERR420393       CTTGTA           RNA     Assay5                   group_3
samples <- as.character(expr.design$SampleID)
group <- factor(expr.design$tissue)
group
## [1] brain brain liver liver liver brain brain liver
## Levels: brain liver

The samples are in random order and not sorted by the tissue type, this will make visualisation tricker in downstream analysis. We will reorder the samples by tissue type.

sample.order <- order(expr.design$tissue)
sample.order
## [1] 1 2 6 7 3 4 5 8
expr.design <- expr.design[sample.order,]
raw.counts <- raw.counts[,sample.order]

expr.design
##            SampleID    Source.Name     organism  sex age tissue
## ERR420386 ERR420386 brain_sample_1 Homo sapiens male  26  brain
## ERR420387 ERR420387 brain_sample_1 Homo sapiens male  26  brain
## ERR420391 ERR420391 brain_sample_1 Homo sapiens male  26  brain
## ERR420392 ERR420392 brain_sample_1 Homo sapiens male  26  brain
## ERR420388 ERR420388 liver_sample_1 Homo sapiens male  30  liver
## ERR420389 ERR420389 liver_sample_1 Homo sapiens male  30  liver
## ERR420390 ERR420390 liver_sample_1 Homo sapiens male  30  liver
## ERR420393 ERR420393 liver_sample_1 Homo sapiens male  30  liver
##           Extract.Name Material.Type Assay.Name technical.replicate.group
## ERR420386       GCCAAT           RNA     Assay4                   group_2
## ERR420387       ACAGTG           RNA     Assay2                   group_1
## ERR420391       ACAGTG           RNA     Assay1                   group_1
## ERR420392       GCCAAT           RNA     Assay3                   group_2
## ERR420388       GTGAAA           RNA     Assay7                   group_4
## ERR420389       GTGAAA           RNA     Assay8                   group_4
## ERR420390       CTTGTA           RNA     Assay6                   group_3
## ERR420393       CTTGTA           RNA     Assay5                   group_3

This is will be much easier when we come to visualise our data later. Remember to reassign the groups:

group <- expr.design$tissue
group
## [1] brain brain brain brain liver liver liver liver
## Levels: brain liver