3.6 Other IO operations
3.6.1 Interactive input
scan()
is a primitive method that can be used to import data from a variety of sources. In most instances it would be preferable to read data from file using read.table
or readLines
. scan is however magnificent for reading in data from the standard input or from other software streams. As a quick demonstration we can read in data as you type it:
scan(what=numeric())
3.6.2 Reading from a web connection
Most of the time we will be reading information into R that is stored on our local filesystem, but R can also import data directly from the web. The read.table()
and readLines()
functions are happy to read in from a web socket. This is great but requires that the data be present on a web-page to download.
For example, the following reads data from Google Trends github account on search results for “Fathers day”. Based on the URL we can expect the file is a comma separated file because the filename as in “*.csv“, however, we do not know whether there are any comments in the files, if the first line is the heading columns etc. So we will peak into the file first before reading it as a data object.
data.URL <- "https://raw.githubusercontent.com/GoogleTrends/data/gh-pages/
20150624_FathersDay.csv"
fathers.txt <- readLines(data.URL)
head(fathers.txt)
## Warning in readLines(data.URL): incomplete final line found on
## 'https://raw.githubusercontent.com/GoogleTrends/data/gh-pages/
## 20150624_FathersDay.csv'
## [1] "Search interest in dads in each country's respective father's day in 2014,,,,,"
## [2] "country,ISO Code,Date,Holiday,Indexed,Rank"
## [3] "Puerto Rico,PR,06/15/2014,Father's Day,100.00,1"
## [4] "Caribbean Netherlands,BQ,06/01/2014,Global Day of Parents,79.46,2"
## [5] "Curaçao,CW,06/15/2014,Father's Day,75.63,3"
## [6] "Aruba,AW,06/15/2014,Father's Day,74.62,4"
Now that we can see the first line is a description and the column headings are in line 2 we will read the data file into a data object:
fathers.day <- read.csv(data.URL, header=T, skip=1)
fathers.day[1:5,1:6]
## country ISO.Code Date Holiday Indexed
## 1 Puerto Rico PR 06/15/2014 Father's Day 100.00
## 2 Caribbean Netherlands BQ 06/01/2014 Global Day of Parents 79.46
## 3 Curaçao CW 06/15/2014 Father's Day 75.63
## 4 Aruba AW 06/15/2014 Father's Day 74.62
## 5 Guyana GY 06/15/2014 Father's Day 67.42
## Rank
## 1 1
## 2 2
## 3 3
## 4 4
## 5 5
-
The
RCurl
package provides a much more flexible approach to accessing data that is on the web and is worth reviewing if you wish to scrape a web-accessible database in a more automated fashion.
Bioinformatics data
-
Of greatest interest however is the ability to download pre-structured biological data from the web. This can be managed using packages such as
bioMart
andGEOquery
.