Reference
Last updated on 2024-12-10 | Edit this page
Reference
Introduction to R and RStudio
- Use the escape key to cancel incomplete commands or running code (Ctrl+C) if you’re using R from the shell.
- Basic arithmetic operations follow standard order of precedence:
- Brackets:
(,) - Exponents:
^or** - Divide:
/ - Multiply:
* - Add:
+ - Subtract:
-
- Brackets:
- Scientific notation is available, e.g:
2e-3 - Anything to the right of a
#is a comment, R will ignore this! - Functions are denoted by
function_name(). Expressions inside the brackets are evaluated before being passed to the function, and functions can be nested. - Mathematical functions:
exp,sin,log,log10,log2etc. - Comparison operators:
<,<=,>,>=,==,!= - Use
all.equalto compare numbers! -
<-is the assignment operator. Anything to the right is evaluate, then stored in a variable named to the left. -
lslists all variables and functions you’ve created -
rmcan be used to remove them - When assigning values to function arguments, you must use
=.
Data structures
Individual values in R must be one of 5 data types, multiple values can be grouped in data structures.
Data types
typeof(object)gives information about an items data type.-
There are 5 main data types:
-
?numericreal (decimal) numbers -
?integerwhole numbers only -
?charactertext -
?complexcomplex numbers -
?logicalTRUE or FALSE values
Special types:
-
?NAmissing values -
?NaN“not a number” for undefined values (e.g.0/0). -
?Inf,-Infinfinity. -
?NULLa data structure that doesn’t exist
NAcan occur in any atomic vector.NaN, andInfcan only occur in complex, integer or numeric type vectors. Atomic vectors are the building blocks for all other data structures. ANULLvalue will occur in place of an entire data structure (but can occur as list elements). -
Basic data structures in R:
- atomic
?vector(can only contain one type) -
?list(containers for other objects) -
?data.frametwo dimensional objects whose columns can contain different types of data -
?matrixtwo dimensional objects that can contain only one type of data. -
?factorvectors that contain predefined categorical data. -
?arraymulti-dimensional objects that can only contain one type of data
Remember that matrices are really atomic vectors underneath the hood, and that data.frames are really lists underneath the hood (this explains some of the weirder behaviour of R).
Useful functions for querying data structures:
-
?strstructure, prints out a summary of the whole data structure -
?typeoftells you the type inside an atomic vector -
?classwhat is the data structure? -
?headprint the firstnelements (rows for two-dimensional objects) -
?tailprint the lastnelements (rows for two-dimensional objects) -
?rownames,?colnames,?dimnamesretrieve or modify the row names and column names of an object. -
?namesretrieve or modify the names of an atomic vector or list (or columns of a data.frame). -
?lengthget the number of elements in an atomic vector -
?nrow,?ncol,?dimget the dimensions of a n-dimensional object (Won’t work on atomic vectors or lists).
Exploring Data Frames
-
read.csvto read in data in a regular structure-
separgument to specify the separator- “,” for comma separated
- “\t” for tab separated
- Other arguments:
-
header=TRUEif there is a header row
-
-
Seeking help
- To access help for a function type
?function_nameorhelp(function_name) - Use quotes for special operators e.g.
?"+" - Use fuzzy search if you can’t remember a name ‘??search_term’
- CRAN task views are a good starting point.
-
Stack Overflow is a good
place to get help with your code.
-
?dputwill dump data you are working from so others can load it easily. -
sessionInfo()will give details of your setup that others may need for debugging.
-
Dataframe manipulation with dplyr
library(dplyr)-
?selectto extract variables by name. -
?filterreturn rows with matching conditions. -
?group_bygroup data by one of more variables. -
?summarizesummarize multiple values to a single value. -
?mutateadd new variables to a data.frame. - Combine operations using the
?"|>"pipe operator.
Creating publication quality graphics
- figures can be created with the grammar of graphics:
library(ggplot2)-
ggplotto create the base figure -
aesthetics specify the data axes, shape, color, and data size -
geometry functions specify the type of plot, e.g.point,line,density,box -
geometry functions also add statistical transforms, e.g.geom_smooth -
scalefunctions change the mapping from data to aesthetics -
facetfunctions stratify the figure into panels -
aesthetics apply to individual layers, or can be set for the whole plot insideggplot. -
themefunctions change the overall look of the plot - order of layers matters!
-
ggsaveto save a figure.
Glossary
- argument
- A value given to a function or program when it runs. The term is often used interchangeably (and inconsistently) with parameter.
- assign
- To give a value a name by associating a variable with it.
- body
- (of a function): the statements that are executed when a function runs.
- comment
-
A remark in a program that is intended to help human readers understand
what is going on, but is ignored by the computer. Comments in Python, R,
and the Unix shell start with a
#character and run to the end of the line; comments in SQL start with--, and other languages have other conventions. - comma-separated values
- (CSV) A common textual representation for tables in which the values in each row are separated by commas.
- delimiter
- A character or characters used to separate individual values, such as the commas between columns in a CSV file.
- documentation
- Human-language text written to explain what software does, how it works, or how to use it.
- floating-point number
- A number containing a fractional part and an exponent. See also: integer.
- for loop
- A loop that is executed once for each value in some kind of set, list, or range. See also: while loop.
- index
- A subscript that specifies the location of a single value in a collection, such as a single pixel in an image.
- integer
- A whole number, such as -12343. See also: floating-point number.
- library
- In R, the directory(ies) where packages are stored.
- package
- A collection of R functions, data and compiled code in a well-defined format. Packages are stored in a library and loaded using the library() function.
- parameter
- A variable named in the function’s declaration that is used to hold a value passed into the call. The term is often used interchangeably (and inconsistently) with argument.
- return statement
- A statement that causes a function to stop executing and return a value to its caller immediately.
- sequence
- A collection of information that is presented in a specific order.
- shape
-
An array’s dimensions, represented as a vector. For example, a 5×3
array’s shape is
(5,3). - string
- Short for “character string”, a sequence of zero or more characters.
- syntax error
- A programming error that occurs when statements are in an order or contain characters not expected by the programming language.
- type
- The classification of something in a program (for example, the contents of a variable) as a kind of number (e.g. floating-point number, integer), string, or something else. In R the command typeof() is used to query a variables type.
- while loop
- A loop that keeps executing as long as some condition is true. See also: for loop.