2.4 Exploratory analysis
Rsubread provides the number of reads mapped to each gene which can then be used for ploting quality control figures and for differential expression analysis.
QC figures of the mapped read counts can be plotted and investigated for potential outlier libraries and to confirm grouping of samples.
Before plotting QC figures it is useful to get the experiment design. This will allow labeling of the data with the sample groups they belong to, or any other parameter of interest.
The experiment design file corresponding to this study has been downloaded from the ArrayExpress webpage and formatted as a tab separated file for this analysis purposes. You can find it in the shared directory ../data/Data_Analysis_with_R/RNAseq/raw_data
.
EXPMT_DESIGN_FILE <- file.path(RNASeq_DATA_DIR, 'experiment_design.txt')
expr.design <- read.table(EXPMT_DESIGN_FILE, header=T, sep='\t')
rownames(expr.design) <- expr.design$SampleID
#order the design in the same ordering as the counts object
expr.design <- expr.design[colnames(counts$counts),]
expr.design
## SampleID Source.Name organism sex age tissue
## ERR420386 ERR420386 brain_sample_1 Homo sapiens male 26 brain
## ERR420387 ERR420387 brain_sample_1 Homo sapiens male 26 brain
## ERR420388 ERR420388 liver_sample_1 Homo sapiens male 30 liver
## ERR420389 ERR420389 liver_sample_1 Homo sapiens male 30 liver
## ERR420390 ERR420390 liver_sample_1 Homo sapiens male 30 liver
## ERR420391 ERR420391 brain_sample_1 Homo sapiens male 26 brain
## ERR420392 ERR420392 brain_sample_1 Homo sapiens male 26 brain
## ERR420393 ERR420393 liver_sample_1 Homo sapiens male 30 liver
## Extract.Name Material.Type Assay.Name technical.replicate.group
## ERR420386 GCCAAT RNA Assay4 group_2
## ERR420387 ACAGTG RNA Assay2 group_1
## ERR420388 GTGAAA RNA Assay7 group_4
## ERR420389 GTGAAA RNA Assay8 group_4
## ERR420390 CTTGTA RNA Assay6 group_3
## ERR420391 ACAGTG RNA Assay1 group_1
## ERR420392 GCCAAT RNA Assay3 group_2
## ERR420393 CTTGTA RNA Assay5 group_3
samples <- as.character(expr.design$SampleID)
group <- factor(expr.design$tissue)
group
## [1] brain brain liver liver liver brain brain liver
## Levels: brain liver
The samples are in random order and not sorted by the tissue type, this will make visualisation tricker in downstream analysis. We will reorder the samples by tissue type.
sample.order <- order(expr.design$tissue)
sample.order
## [1] 1 2 6 7 3 4 5 8
expr.design <- expr.design[sample.order,]
raw.counts <- raw.counts[,sample.order]
expr.design
## SampleID Source.Name organism sex age tissue
## ERR420386 ERR420386 brain_sample_1 Homo sapiens male 26 brain
## ERR420387 ERR420387 brain_sample_1 Homo sapiens male 26 brain
## ERR420391 ERR420391 brain_sample_1 Homo sapiens male 26 brain
## ERR420392 ERR420392 brain_sample_1 Homo sapiens male 26 brain
## ERR420388 ERR420388 liver_sample_1 Homo sapiens male 30 liver
## ERR420389 ERR420389 liver_sample_1 Homo sapiens male 30 liver
## ERR420390 ERR420390 liver_sample_1 Homo sapiens male 30 liver
## ERR420393 ERR420393 liver_sample_1 Homo sapiens male 30 liver
## Extract.Name Material.Type Assay.Name technical.replicate.group
## ERR420386 GCCAAT RNA Assay4 group_2
## ERR420387 ACAGTG RNA Assay2 group_1
## ERR420391 ACAGTG RNA Assay1 group_1
## ERR420392 GCCAAT RNA Assay3 group_2
## ERR420388 GTGAAA RNA Assay7 group_4
## ERR420389 GTGAAA RNA Assay8 group_4
## ERR420390 CTTGTA RNA Assay6 group_3
## ERR420393 CTTGTA RNA Assay5 group_3
This is will be much easier when we come to visualise our data later. Remember to reassign the groups:
group <- expr.design$tissue
group
## [1] brain brain brain brain liver liver liver liver
## Levels: brain liver