4.3 Transform data
Our data is now in a structured format suitable for further filtering and processing. The first step that we will carry out will be to apply a floor and ceiling to the values in the dataset. Microarrays, like many other measurement platforms in biology, may give inaccurate readings at the high and low end of their operating range. In this case, any values of less than 100 or more than 16,000 are likely to be unreliable.
Also, because expression values are typically spread over an extreme range (in this case, around 2.5 orders of magnitude), it is conventional to log transform them to convert the data into a more linear and simple to analyse form.
Often in transformation steps, it is good practice to create a new data object to store the transformed values. Remember to appropriately name the data objects so that you remember it has been transformed, e.g. data.norm
to indicate the data has been normalised.
If the data being analysed requires lots of memory, then remove hte raw data after it has been transformed to free some space.
-
Replace all values <100 with the value 100. Hint: a command of the format
data[data < 100]
will provide the positions of all the matrix elements with a value of less than 100 -
Replace all values >16,000 with the value 16,000.
-
Convert all values to their log10 equivalent.
Below shows the first 5 rows and 5 columns of the matrix after transformation:
## ALL ALL ALL ALL ALL
## AFFX-BioB-5_at 2 2.000000 2.000000 2.000000 2.000000
## AFFX-BioB-M_at 2 2.000000 2.000000 2.000000 2.000000
## AFFX-BioB-3_at 2 2.000000 2.000000 2.423246 2.000000
## AFFX-BioC-5_at 2 2.451786 2.489958 2.000000 2.225309
## AFFX-BioC-3_at 2 2.000000 2.000000 2.000000 2.000000