Exploring Data Frames

Last updated on 2024-12-10 | Edit this page

Overview

Questions

  • How can I manipulate a data frame?

Objectives

  • Add and remove rows or columns.
  • Append two data frames.
  • Display basic properties of data frames including size and class of the columns, names, and first few rows.

At this point, you’ve seen it all: in the last lesson, we toured all the basic data types and data structures in R. Everything you do will be a manipulation of those tools. But most of the time, the star of the show is the data frame—the table that we created by loading information from a csv file. In this lesson, we’ll learn a few more things about working with data frames.

Understanding our Data frame


Let’s ask some questions about this data frame to understand more about its structure:

R

str(weather)

OUTPUT

'data.frame':	3 obs. of  3 variables:
 $ island     : chr  "torgersen" "biscoe" "dream"
 $ temperature: num  1.6 1.5 -2.6
 $ snowfall   : int  0 0 1

R

summary(weather)

OUTPUT

    island           temperature         snowfall
 Length:3           Min.   :-2.6000   Min.   :0.0000
 Class :character   1st Qu.:-0.5500   1st Qu.:0.0000
 Mode  :character   Median : 1.5000   Median :0.0000
                    Mean   : 0.1667   Mean   :0.3333
                    3rd Qu.: 1.5500   3rd Qu.:0.5000
                    Max.   : 1.6000   Max.   :1.0000  

R

ncol(weather)

OUTPUT

[1] 3

R

nrow(weather)

OUTPUT

[1] 3

R

dim(weather)

OUTPUT

[1] 3 3

R

colnames(weather)

OUTPUT

[1] "island"      "temperature" "snowfall"   

R

head(weather)

OUTPUT

     island temperature snowfall
1 torgersen         1.6        0
2    biscoe         1.5        0
3     dream        -2.6        1

R

typeof(weather)

OUTPUT

[1] "list"

Adding columns and rows in data frames


We already learned that the columns of a data frame are vectors, so that our data are consistent in type throughout the columns. As such, if we want to add a new column, we can start by making a new vector:

R

# sunshine hours
sun <- c(10, 11, 12)
weather

OUTPUT

     island temperature snowfall
1 torgersen         1.6        0
2    biscoe         1.5        0
3     dream        -2.6        1

We can then add this as a column via:

R

cbind(weather, sun)

OUTPUT

     island temperature snowfall sun
1 torgersen         1.6        0  10
2    biscoe         1.5        0  11
3     dream        -2.6        1  12

Note that if we tried to add a vector of sunshine hours with a different number of entries than the number of rows in the data frame, it would fail:

R

sun <- c(10, 11, 12, 13)
cbind(weather, sun)

ERROR

Error in data.frame(..., check.names = FALSE): arguments imply differing number of rows: 3, 4

R

sun <- c(10, 11)
cbind(weather, sun)

ERROR

Error in data.frame(..., check.names = FALSE): arguments imply differing number of rows: 3, 2

Why didn’t this work? Of course, R wants to see one element in our new column for every row in the table:

R

nrow(weather)

OUTPUT

[1] 3

R

length(sun)

OUTPUT

[1] 2

So for it to work we need to have nrow(weather) = length(sun). Let’s overwrite the content of weather with our new data frame.

R

sun <- c(10, 11, 12)
weather <- cbind(weather, sun)

Now how about adding rows? We already know that the rows of a data frame are lists:

R

new_row <- list("deception", -3.45, TRUE, 13)
weather <- rbind(weather, new_row)

Let’s confirm that our new row was added correctly.

R

weather

OUTPUT

     island temperature snowfall sun
1 torgersen        1.60        0  10
2    biscoe        1.50        0  11
3     dream       -2.60        1  12
4 deception       -3.45        1  13

Removing rows


We now know how to add rows and columns to our data frame in R. Now let’s learn to remove rows.

R

weather

OUTPUT

     island temperature snowfall sun
1 torgersen        1.60        0  10
2    biscoe        1.50        0  11
3     dream       -2.60        1  12
4 deception       -3.45        1  13

We can ask for a data frame minus the last row:

R

weather[-4, ]

OUTPUT

     island temperature snowfall sun
1 torgersen         1.6        0  10
2    biscoe         1.5        0  11
3     dream        -2.6        1  12

Notice the comma with nothing after it to indicate that we want to drop the entire fourth row.

Note: we could also remove several rows at once by putting the row numbers inside of a vector, for example: weather[c(-3,-4), ]

Removing columns


We can also remove columns in our data frame. What if we want to remove the column “sun”. We can remove it in two ways, by variable number or by index.

R

weather[,-4]

OUTPUT

     island temperature snowfall
1 torgersen        1.60        0
2    biscoe        1.50        0
3     dream       -2.60        1
4 deception       -3.45        1

Notice the comma with nothing before it, indicating we want to keep all of the rows.

Alternatively, we can drop the column by using the index name and the %in% operator. The %in% operator goes through each element of its left argument, in this case the names of weather, and asks, “Does this element occur in the second argument?”

R

drop <- names(weather) %in% c("sun")
weather[,!drop]

OUTPUT

     island temperature snowfall
1 torgersen        1.60        0
2    biscoe        1.50        0
3     dream       -2.60        1
4 deception       -3.45        1

Appending to a data frame


The key to remember when adding data to a data frame is that columns are vectors and rows are lists. We can also glue two data frames together with rbind:

R

weather <- rbind(weather, weather)
weather

OUTPUT

     island temperature snowfall sun
1 torgersen        1.60        0  10
2    biscoe        1.50        0  11
3     dream       -2.60        1  12
4 deception       -3.45        1  13
5 torgersen        1.60        0  10
6    biscoe        1.50        0  11
7     dream       -2.60        1  12
8 deception       -3.45        1  13

Saving Our Data Frame


We can use the write.table function to save our new data frame:

R

write.table(
  weather,
  file="results/prepared_weather.csv",
  sep=",", 
  quote=FALSE, 
  row.names=FALSE
)

Challenge 1

You can create a new data frame right from within R with the following syntax:

R

df <- data.frame(id = c("a", "b", "c"),
                 x = 1:3,
                 y = c(TRUE, TRUE, FALSE))

Make a data frame that holds the following information for yourself:

  • first name
  • last name
  • lucky number

Then use rbind to add an entry for the people sitting beside you. Finally, use cbind to add a column with each person’s answer to the question, “Is it time for coffee break?”

R

df <- data.frame(first = c("Grace"),
                 last = c("Hopper"),
                 lucky_number = c(0))
df <- rbind(df, list("Marie", "Curie", 238) )
df <- cbind(df, coffeetime = c(TRUE,TRUE))

Key Points

  • Use cbind() to add a new column to a data frame.
  • Use rbind() to add a new row to a data frame.
  • Use str(), summary(), nrow(),ncol(),dim(),colnames(),head(), andtypeof()` to understand the structure of a data frame.
  • Read in a csv file using read.csv().
  • Understand what length() of a data frame represents.