Instructor Notes

Dataset


The data used for this lesson are a slightly cleaned up version of the SAFI Survey Results available on GitHub. The original data is on figshare.

This lesson uses SAFI_clean.csv. The direct download link for the data file is: https://raw.githubusercontent.com/datacarpentry/r-socialsci/main/episodes/data/SAFI_clean.csv.

Lesson Plans


The lesson contains much more material than can be taught in a day. Instructors will need to pick an appropriate subset of episodes to use in a standard one day course.

Suggested path for half-day course:

  • Before we Start
  • Introduction to R
  • Starting with Data

Suggested path for full-day course:

  • Before we Start
  • Introduction to R
  • Starting with Data
  • Data Wranging with dplyr
  • (OPTIONAL) Data Wrangling with tidyr
  • Data Visualisation with ggplot2

For a two-day workshop, it may be possible to cover all of the episodes. Feedback from the community on successful lesson plans is always appreciated!

Technical Tips and Tricks


Show how to use the ‘zoom’ button to blow up graphs without constantly resizing windows.

Sometimes a package will not install. You can try a different CRAN mirror:

  • Tools > Global Options > Packages > CRAN Mirror

Alternatively you can go to CRAN and download the package and install from ZIP file:

  • Tools > Install Packages > set to ‘from Zip/TAR’

It’s often easier to make sure they have all the needed packages installed at one time, rather than deal with these issues over and over. See the “Setup instructions” section on the homepage of the course website for package installation instructions.

| character on Spanish keyboards: The Spanish Mac keyboard does not have a | key. This character can be created using:

`alt` + `1`

Other Resources


If you encounter a problem during a workshop, feel free to contact the maintainers by email or open an issue.

For a more in-depth coverage of topics of the workshops, you may want to read “R for Data Science” by Hadley Wickham and Garrett Grolemund.

Before we Start


Slides notes

  • It’s important to let people know they should have the orientation document opened at this point as well as the etherpad.

  • The etherpad will be used rather than asking people questions. That a collab doc which is semi public so please don’t share confidential info. If you have it opened please add an answer to the question in day 1.

  • Mention two sources for the course material, with optional lessons to be found on the official carpentry repository.

  • Mention the use sticky notes as well



Instructor Note

  • The main goal here is to help the learners be comfortable with the RStudio interface.
  • Go very slowly in the “Getting set up” section. Make sure that learners are in the correct working directory, and that they create a data (all lowercase) subfolder.


Instructor Note

  • At this point you may want to show in the file explorer where the project directory is and where the script.R file is. You can also show how to open the project again by double clicking on the .Rproj file. You need to make sure it is extra clear that R interact with your computer locations (file explorer).
  • Highlight the importance of saving your script and project often, and that you should always save your script before closing RStudio. If you don’t, you will lose all the work you have done since the last time you saved.


Instructor Note

  • Show the file pane and the console pane and how they interact with the working directory. You can also show how to check the working directory with getwd() and how to change it with setwd(). Emphasize that you should avoid using setwd() in your scripts and instead use RStudio projects to manage your working directory. You can also show how to set working directory in RStudio by going to Session -> Set Working Directory -> To Source File Location. This will set the working directory to the location of your script file, which is useful if you have a consistent folder structure across your projects.


Introduction to R


Instructor Note

  • The main goal is to introduce users to the various objects in R, from atomic types to creating your own objects.
  • While this epsiode is foundational, be careful not to get caught in the weeds as the variety of types and operations can be overwhelming for new users, especially before they understand how this fits into their own “workflow.”


Note on

Learners sometimes type x<-5 intending a logical test “is x less than -5?”.
In R, x<-5 is parsed as an assignment because <- is a single token.

To resolve this you can either encourage spacing around operators or parentheses to avoid ambiguity:

  • Logical test: x < -5 (note the space between “<” and the “-” negative)

  • Assignment: x <- 5

  • Alternative for clarity: x < (-5) (explicit negative value)



Choose how to teach this section

The section on generative AI is intended to be concise but Instructors may choose to devote more time to the topic in a workshop. Depending on your own level of experience and comfort with talking about and using these tools, you could choose to do any of the following:

  • Explain how large language models work and are trained, and/or the difference between generative AI, other forms of AI that currently exist, and the limits of what LLMs can do (e.g., they can’t “reason”).
  • Demonstrate how you recommend that learners use generative AI.
  • Discuss the ethical concerns listed below, as well as others that you are aware of, to help learners make an informed choice about whether or not to use generative AI tools.

This is a fast-moving technology. If you are preparing to teach this section and you feel it has become outdated, please open an issue on the lesson repository to let the Maintainers know and/or a pull request to suggest updates and improvements.



intro to Quarto (Optional)


Instructor Note

At this point you may want to explain the different chunk options used in the above code chunk, and what they do. You can also explain that there are many more options available, and that you can find them in the Quarto documentation. Since we have not covered yet how to import data, it would be good to move into the next episode before explaining the code in the chunk, and then come back to it to explain the chunk options.



Instructor Note

From now on you can advice them to use a quarto document for the subsequent episodes. You can briefly discuss the possibility of quarto (see next sections) but not in details.



Starting with Data


Instructor Note

The main goals for this lessons are:

  • To make sure that learners are comfortable with working with data frames, and can use the bracket notation to select slices/columns.


Instructor Note

Demonstrate how to import data using click as and show the code that is generated in the console.



Data Wrangling with dplyr


Instructor Note

  • This lesson works better if you have graphics demonstrating dplyr commands. You can modify this Google Slides deck and use it for your workshop.
  • For this lesson make sure that learners are comfortable using pipes.
  • There is also sometimes some confusion on what the arguments of group_by should be, and when to use filter() and select().


Data Visualisation with ggplot2


Instructor Note

  • This episode is a broad overview of ggplot2 and focuses on (1) getting familiar with the layering system of ggplot2, (2) using the argument group in the aes() function, (3) basic customization of the plots.
  • The episode depends on data created in the Data Wrangling with tidyr episode. If you did not get to or through all of the tidyr episode, you can have the learners access the data by either downloading it or quickly creating it using the tidyr code below. You will probably want to copy the code into the Etherpad.
  • If you did skip the tidyr episode, you might want to go over the exporting data section in that episode.


Data Wrangling with tidyr