Introduction to R

3.6 Other IO operations

3.6.1 Interactive input

scan() is a primitive method that can be used to import data from a variety of sources. In most instances it would be preferable to read data from file using read.table or readLines. scan is however magnificent for reading in data from the standard input or from other software streams. As a quick demonstration we can read in data as you type it:

scan(what=numeric())

3.6.2 Reading from a web connection

Most of the time we will be reading information into R that is stored on our local filesystem, but R can also import data directly from the web. The read.table() and readLines() functions are happy to read in from a web socket. This is great but requires that the data be present on a web-page to download.

For example, the following reads data from Google Trends github account on search results for “Fathers day”. Based on the URL we can expect the file is a comma separated file because the filename as in “*.csv“, however, we do not know whether there are any comments in the files, if the first line is the heading columns etc. So we will peak into the file first before reading it as a data object.

data.URL <- "https://raw.githubusercontent.com/GoogleTrends/data/gh-pages/
20150624_FathersDay.csv"
fathers.txt <- readLines(data.URL)
head(fathers.txt)

## Warning in readLines(data.URL): incomplete final line found on
## 'https://raw.githubusercontent.com/GoogleTrends/data/gh-pages/
## 20150624_FathersDay.csv'

## [1] "Search interest in dads in each country's respective father's day in 2014,,,,,"
## [2] "country,ISO Code,Date,Holiday,Indexed,Rank"                                    
## [3] "Puerto Rico,PR,06/15/2014,Father's Day,100.00,1"                               
## [4] "Caribbean Netherlands,BQ,06/01/2014,Global Day of Parents,79.46,2"             
## [5] "Curaçao,CW,06/15/2014,Father's Day,75.63,3"                                    
## [6] "Aruba,AW,06/15/2014,Father's Day,74.62,4"

Now that we can see the first line is a description and the column headings are in line 2 we will read the data file into a data object:

fathers.day <- read.csv(data.URL, header=T, skip=1)
fathers.day[1:5,1:6]

##                 country ISO.Code       Date               Holiday Indexed
## 1           Puerto Rico       PR 06/15/2014          Father's Day  100.00
## 2 Caribbean Netherlands       BQ 06/01/2014 Global Day of Parents   79.46
## 3               Curaçao       CW 06/15/2014          Father's Day   75.63
## 4                 Aruba       AW 06/15/2014          Father's Day   74.62
## 5                Guyana       GY 06/15/2014          Father's Day   67.42
##   Rank
## 1    1
## 2    2
## 3    3
## 4    4
## 5    5

The RCurl package provides a much more flexible approach to accessing data that is on the web and is worth reviewing if you wish to scrape a web-accessible database in a more automated fashion.

Bioinformatics data

Of greatest interest however is the ability to download pre-structured biological data from the web. This can be managed using packages such as bioMart and GEOquery.