Going further with R
This section is a quick glance at the other features in R that will help you in your future data analysis. We will quickly look at:
- Documentation with R
- Vignettes that come with R packages including citing packages
- SessionInfo information to reproduce your data analysis, and
- Other ways to run R
Vignettes
Just now, we explored some of the help functions built into R. As well as this builtin help, most R packages also come with their own, often extensive, documentation. This is normally in the form of a vignette, a document that provides a task-oriented description of package functionality. Vignettes contain executable examples and are intended to be used interactively. You can also download the vignette for a package that isn’t installed on your system by visiting the CRAN, Bioconductor project webpages, or via an internet search.
Using vignettes (optional)
-
The
vignette()
command provides access to vignettes for installed packages. -
Run
vignette()
at the console to list all the vignettes available.-
Notice that a package can, but does not always, have a vignette with the package name - for example the
annotate
package has an annotate vignette, butBiocParallel
only has theIntroductionToBiocParallel
vignette
-
Notice that a package can, but does not always, have a vignette with the package name - for example the
-
Type
vignette(“biomaRt”)
to open the vignette for the biomaRt package, which links to the Biomart gene annotation database. -
Some vignettes have non-unique names - for example, several packages have an intro vignette.
-
Try
vignette(“intro”)
-
To get the limma package vignette, use the command
vignette(“intro”, package = “limma”)
-
Try
Citations
The R core development team and the very active community of package authors have invested a lot of time and effort in creating R as it is today. Please give credit where credit is due and cite R and R packages when you use them for data analysis.
citation("VennDiagram")
citation("limma")
SessionInfo
Congratulations - you have now conquered the basics of R and you are ready to start breaking things! One of the most useful aspects of R is the rich ecosystem of supplementary packagess that can aid, facilitate and empower your research. The Bioconductor framework contains hundreds of packages of methods of relevance to biologists working with biological data (population genetics, SNPs, NGS reads, microarrays, mass spectrometry etc). As we discovered in the Vignette section, there are well documented tutorials and guides that will work us through the application of fabulous methods and workflows. You will come across some unexpected ’error’ messages.
R packages are written by people like the QFAB bioinformaticians. Well intentioned, but busy. Code is crafted lovingly and tested in the scenarios that are implicit in our daily development. You are likely to be running R on a different computer with combinations of installed packages that are subtly different to those that we wrote the software with. If you identify a ”bug” in that the package gives a wrong result, fails with a hairy error message or misbehaves it is worthwhile to let the developer know that you are having a problem. Reproducibility of the error is critical - we would like to know which version of R is being used and all of the packages that are loaded in memory so that the developer can agree that yes, there is a problem and can help in the resolution.
It is trivial to report the data on the R environment that you are working in - the SessionInfo function reports installed packages and their versions.
require(limma)
## Loading required package: limma
sessionInfo()
## R version 3.4.1 (2017-06-30)
## Platform: x86_64-redhat-linux-gnu (64-bit)
## Running under: CentOS Linux 7 (Core)
##
## Matrix products: default
## BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
##
## locale:
## [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
## [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
## [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] limma_3.30.13 kableExtra_0.4.0 knitr_1.17
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.12 bookdown_0.5 codetools_0.2-15 digest_0.6.12
## [5] rprojroot_1.2 R6_2.2.2 backports_1.1.0 magrittr_1.5
## [9] evaluate_0.10.1 highr_0.6 httr_1.3.1 rlang_0.1.2
## [13] stringi_1.1.5 rstudioapi_0.6 xml2_1.1.1 rmarkdown_1.6
## [17] tools_3.4.1 stringr_1.2.0 readr_1.1.1 hms_0.3
## [21] yaml_2.1.14 compiler_3.4.1 rvest_0.3.2 htmltools_0.3.6
## [25] tibble_1.3.4
Other ways of running R
Today we are using RStudio, but R can be run in a number of different ways; for example, you could embed some R functionality in a computer program written in Python or Java. As discussed in sections 1.2 and 1.3, both R and RStudio provide an interactive environment where you are prompted to enter a single command at the time. In this training course we are providing you with snippets of information that could be used within a data analysis workflow. These are intended to be run interactively.
In a bioinformatics laboratory the scientists who use R craft a mixture of packages and workflow scripts into analytical pipelines. R may be even be called by other software environments. Bioinformaticians will typically use the R console for refining, tuning and modifying existing scripts that they have written. The workflow is run through master scripts. In this section we will look at a simple R script that will create a table of information and write it out to file. We will be having a more complete look at the process of writing data to file in a later section (Section ??).
The advantage of running R scripts as a batch analysis is that larger and more complex analyses (such as the mapping of short DNA sequence reads) can be run overnight and each of the commands will run successively as prior commands complete.
Preparing an R script
An R script is a container for multiple R commands. It is intended to be run largely hands-off.
The RStudio software provides us with a very convenient way to create an R script - in the file dialog there is the option to create file and an R Script is a primary file type. R scripts typically have the extension .R
. The file is a plain text file and needs to know the packages that should be loaded, the objects that should be set and the working environment where we should be working.
Running an R script
The following command is how you run an R script from a terminal window. This is outside of RStudio.
R CMD BATCH [options] my_script.R [outfile]
R.exe" CMD BATCH
--vanilla --slave "c:\my projects\my_script.R"
We will demonstrate the next exercise to show you the method to run a script file using RStudio. You can repeat this exercise in your own time.
Running a script in RStudio
- Create a new script file by going to File > New File > R Script.
- Enter some commands in the editor window, for example the code chunk below.
- Save the script with a file name e.g. “area-rectangle.R”
-
In RStudio, there are 3 ways you can run this script from top to bottom:
-
In the console window, type in the command:
source(“area-rectangle.R”)
. What is the output? - In the editor window, top right corner click on the arrow next to the Source button, select Source. What is the output?
- Repeat step 2 but this time select Source with Echo. What is the difference between step 2 and 3?
-
In the console window, type in the command:
print(date())
print("This is an R script!")
length <- 10.25
width <- 4.35
area.of.rectangle <- length * width
print(paste("The area of a rectangle with length=",
length,
"cm by width=",
width,
"cm is",
area.of.rectangle,"square cm."))
Use [Tab] to autocomplete
You can use the tab key to autocomplete a variable name, a function name or a filename (if the file is in the working directory). This will save you a lot of time and prevent typing errors when you are analysing your data.