Summary and Setup
an introduction to R for non-programmers using gapminder data
The goal of this lesson is to teach novice programmers to write modular code and best practices for using R for data analysis. R is commonly used in many scientific disciplines for statistical analysis and its array of third-party packages.
The emphasis of these materials is to give attendees a strong foundation in the fundamentals of R, and to teach best practices for scientific computing.
Note that this workshop will focus on teaching the fundamentals of the programming language R, and will not teach statistical analysis.
A variety of third party packages are used throughout this workshop. These are not necessarily the best, nor are they comprehensive, but they are packages we find useful, and have been chosen primarily for their usability.
Prerequisites
Understand that computers store data and instructions (programs, scripts etc.) in files. Files are organised in directories (folders). Know how to access files not in the working directory by specifying the path.
Data Setup
We will be using some example data in the lesson. Download the file using the link below and extract it to your computer: - zip file of raw data.
Software Setup
Discussion
This lesson assumes you have R and RStudio installed on your computer. Please follow the setup instructions below.
To install R and RStudio on a personal laptop
Packages
The course teaches the tidyverse, which is a collection of R packages that are designed to make many common data analysis tasks easier. Please install this before the course. You can do this by starting Rstudio, and typing:
R
install.packages("tidyverse")
At the > prompt in the left hand window of RStudio. You may be prompted to select a mirror to use; either select one in the UK, or the “cloud” option at the start of the list.
R will download the packages that constitute the tidyverse, and then install them. This can take some time. You may get a prompt There are binary versions available but the source versions are later and asking if you want to install from sources packages which require compilation. You should answer no to this.
If you are using a mac you may be prompted whether you wish to install binary or source versions of the packages; you should select binary.
On Linux, several of the packages will be compiled from source. This can take several minutes. You may find that you need to install additional development libraries to allow this to happen.
There will be a number of messages displayed during installation. After the installation has completed you should see a message containing:
OUTPUT
** testing if installed package can be loaded
* DONE (tidyverse)
Or:
OUTPUT
package ‘tidyverse’ successfully unpacked and MD5 sums checked
Check Your Installation
Type the following commands at the > prompt:
R
library(tidyverse)
ggplot(cars, aes(x=speed, y=dist)) + geom_point()
(any message about conflicts can be safely ignored) This should produce a plot in the lower right hand window of RStudio.