6.1 Combining two files with identical row identifiers

The first situation might be where the output from our measurement platform is produced in one file for each sample, and then needs to be merged so all results for all samples are in the same matrix or dataframe for analysis. In the simplest case here, the order and length of the files is identical, so they can just be joined as columns in the same dataset. You can do this using the cbind() function

# Create several vectors of equal length. Each represents the data from one sample
vec.one <- (1:10)
vec.two <- (2:11)
vec.three <- c(rep(1,3), rep(4,3), rep(8,4))

# Then join them together to make a matrix with one column per sample vector
joined.data <- cbind(vec.one, vec.two, vec.three)
joined.data
##       vec.one vec.two vec.three
##  [1,]       1       2         1
##  [2,]       2       3         1
##  [3,]       3       4         1
##  [4,]       4       5         4
##  [5,]       5       6         4
##  [6,]       6       7         4
##  [7,]       7       8         8
##  [8,]       8       9         8
##  [9,]       9      10         8
## [10,]      10      11         8

Depending on the needs of your analysis, you can alternatively use the rbind() function to join them as rows.

cbind will work to join matrices as well as vectors, as long as they all have the same number of rows. Try this using some pre-prepared subsets of the Golub data.

  1. Load in the three datasets stored in file golub_cbind.RData using the load() function.

    • This file contains three objects:
      • golub.names - a vector of the gene names
      • golub.all - a dataframe of the ALL sample data, and
      • golub.aml - a dataframe of the AML sample data
  2. Use cbind() to create a duplicate of the Golub dataframe from these three objects.