4.2 Reformat the file

We now have a data frame containing the following:

Column number Contents
Row names Microarray probe ID
1 Gene names/descriptions
2, 4, 6… (up to 54) Intensity calls for 27 ALL patients
56, 58, 60… (up to 76) Intensity calls for 11 AML patients
3, 5, 7… (up to 77) Status calls for the intensity values

To make this data easier to work with, we want to transform this data frame into a matrix containing just the intensity calls. We will also need to create an object to store the gene names associated with the probe identifiers; this could be a vector of gene names with probe IDs as item names, a two column matrix (probe ID, gene name), or even a list, whichever you prefer.

  1. Use a matrix indexing ([row,col]) to create a matrix of just the intensity values from the golub data frame

Hint:

* use all the even numbered columns
* as an example of a slice, `data[, c(2,4,6)]` will output all rows for columns 2, 4 and 6 of the matrix `data`
* find another function that will create a sequence of even numbers instead of manually typing them
  1. Create a vector, matrix or other object to link gene names with probe identifiers

We should now have two objects: (i) the numerical data in one matrix and (ii) the gene name information in whatever format we chose to use. The next stage is to transform and normalise the data

Below are the first 5 rows and 5 columns of the golub.matrix:

##                 ALL  ALL  ALL  ALL  ALL
## AFFX-BioB-5_at -214 -139  -76 -135 -106
## AFFX-BioB-M_at -153  -73  -49 -114 -125
## AFFX-BioB-3_at  -58   -1 -307  265  -76
## AFFX-BioC-5_at   88  283  309   12  168
## AFFX-BioC-3_at -295 -264 -376 -419 -230

Below is the first 6 items of the gene information (golub.names):

##                         AFFX-BioB-5_at 
##  "AFFX-BioB-5_at (endogenous control)" 
##                         AFFX-BioB-M_at 
##  "AFFX-BioB-M_at (endogenous control)" 
##                         AFFX-BioB-3_at 
##  "AFFX-BioB-3_at (endogenous control)" 
##                         AFFX-BioC-5_at 
##  "AFFX-BioC-5_at (endogenous control)" 
##                         AFFX-BioC-3_at 
##  "AFFX-BioC-3_at (endogenous control)" 
##                        AFFX-BioDn-5_at 
## "AFFX-BioDn-5_at (endogenous control)"