| Introduction to R and RStudio |
|
| Project Management With RStudio |
|
| Seeking Help |
|
| Data Structures |
|
| Exploring Data Frames |
|
| Subsetting Data |
|
| Control Flow |
|
| Creating Publication-Quality Graphics with ggplot2 |
|
| Vectorization |
|
| Functions Explained |
|
| Writing Data |
|
| Splitting and Combining Data Frames with plyr |
|
| Dataframe Manipulation with dplyr |
|
| Dataframe Manipulation with tidyr |
|
| Producing Reports With knitr |
|
| Writing Good Software |
|
(, )^ or **/*+-2e-3# is a comment, R will ignore this!function_name(). Expressions inside the
brackets are evaluated before being passed to the function, and
functions can be nested.exp, sin, log, log10, log2 etc.<, <=, >, >=, ==, !=all.equal to compare numbers!<- is the assignment operator. Anything to the right is evaluate, then
stored in a variable named to the left.ls lists all variables and functions you’ve createdrm can be used to remove them=.packrat package to create self-contained projectsinstall.packages to install packages from CRANlibrary to load a package into Rpackrat::status to check whether all packages referenced in your
scripts have been installed.?function_name or help(function_name)?"+"?dput will dump data you are working from so others can load it easily.sessionInfo() will give details of your setup that others may need for debugging.Individual values in R must be one of 5 data types, multiple values can be grouped in data structures.
Data types
typeof(object) gives information about an items data type.?numeric real (decimal) numbers?integer whole numbers only?character text?complex complex numbers?logical TRUE or FALSE valuesSpecial types:
?NA missing values?NaN “not a number” for undefined values (e.g. 0/0).?Inf, -Inf infinity.?NULL a data structure that doesn’t existNA can occur in any atomic vector. NaN, and Inf can only
occur in complex, integer or numeric type vectors. Atomic vectors
are the building blocks for all other data structures. A NULL value
will occur in place of an entire data structure (but can occur as list
elements).
Basic data structures in R:
?vector (can only contain one type)?list (containers for other objects)?data.frame two dimensional objects whose columns can contain different types of data?matrix two dimensional objects that can contain only one type of data.?factor vectors that contain predefined categorical data.?array multi-dimensional objects that can only contain one type of dataRemember that matrices are really atomic vectors underneath the hood, and that data.frames are really lists underneath the hood (this explains some of the weirder behaviour of R).
?vector() All items in a vector must be the same type.seq(from=0, to=1, by=1) will create a sequence of numbers.names() function.?factor() Factors are a data structure designed to store categorical data.levels() shows the valid values that can be stored in a vector of type factor.?list() Lists are a data structure designed to store data of different types.?matrix() Matrices are a data structure designed to store 2-dimensional data.?data.frame is a key data structure. It is a list of vectors.cbind() will add a column (vector) to a data.frame.rbind() will add a row (list) to a data.frame.Useful functions for querying data structures:
?str structure, prints out a summary of the whole data structure?typeof tells you the type inside an atomic vector?class what is the data structure??head print the first n elements (rows for two-dimensional objects)?tail print the last n elements (rows for two-dimensional objects)?rownames, ?colnames, ?dimnames retrieve or modify the row names
and column names of an object.?names retrieve or modify the names of an atomic vector or list (or
columns of a data.frame).?length get the number of elements in an atomic vector?nrow, ?ncol, ?dim get the dimensions of a n-dimensional object
(Won’t work on atomic vectors or lists).read.csv to read in data in a regular structure
sep argument to specify the separator
header=TRUE if there is a header row[ single square brackets:
x[1] extracts the first item from vector x.list().[ with two arguments to:
x[1,2] will extract the value in row 1, column 2.x[2,:] will extract the entire second column of values.[[ double square brackets to extract items from lists.$ to access columns or list elements by nameif condition to start a conditional statement, else if condition to provide
additional tests, and else to provide a default== to test for equality.X && Y is only true if both X and Y are TRUE.X || Y is true if either X or Y, or both, are TRUE.FALSE; all other numbers are considered TRUElibrary(ggplot2)ggplot to create the base figureaesthetics specify the data axes, shape, color, and data sizegeometry functions specify the type of plot, e.g. point, line, density, boxgeometry functions also add statistical transforms, e.g. geom_smoothscale functions change the mapping from data to aestheticsfacet functions stratify the figure into panelsaesthetics apply to individual layers, or can be set for the whole plot
inside ggplot.theme functions change the overall look of the plotggsave to save a figure.* applies element-wise to matrices%*% for true matrix multiplicationany() will return TRUE if any element of a vector is TRUEall() will return TRUE if all elements of a vector are TRUE?"function"return explicitlywrite.table to write out objects in regular formatquote=FALSE so that text isn’t wrapped in " marksxxply family of functions to apply functions to groups within
some data.array , data.frame or list corresponds to the input dataplyr family
of functions on groups within data.library(dplyr)?select to extract variables by name.?filter return rows with matching conditions.?group_by group data by one of more variables.?summarize summarize multiple values to a single value.?mutate add new variables to a data.frame.?"%>%" pipe operator.library(tidyr)# character and run to the end of the line;
comments in SQL start with --,
and other languages have other conventions.(5,3).