Tips
Here’s a few general tips. In addition, we strongly recommend using popular cheatsheets, which give a quick and easy reference for common packages and functions, and from Data to Viz, which guides you through choosing a visualisation.
Hotkeys
Code | Hotkey | Description |
---|---|---|
Ctrl+Enter | Run current line (when in Script) | |
<- |
Alt+Enter | Assignment |
%>% |
Ctrl+Shift+M | Pipe |
Esc | Cancel current operation (when in Console) | |
F1 | Help documentation for selected function |
Data manipulation
Importing and exporting data
In case you’ve forgotten, use the read.csv()
function to import data:
<- read.csv("data/dataset.csv") dataset
If you’d like to export any files from R to “.csv”, use write.csv()
write.csv(dataset, "data/output_name.csv")
Initial exploration
You’ll want to explore the data to start with - below are a few functions to get started.
Function | Example | Description |
---|---|---|
names() |
names(dataset) |
Returns the variable names |
str() |
str(dataset) |
Returns the structure of the dataset (variable names, types and first entries) |
$ |
dataset$variable |
Returns a specific variable |
unique() |
unique(dataset$variable) |
Returns the unique values of a variable |
summary() |
summary(dataset$variable) |
Returns a statistical summary of a variable |
Removing NA
s
We can use the dplyr
package to remove rows which have NA:
library(dplyr)
<- dataset %>%
dataset filter(!is.na(variable_to_check_for_NAs))
We use the exclamation mark
!
to negate the result, becauseis.na
returns all the rows that areNA
.
Time series data
If you’ve picked a dataset with time-series data (e.g. a “date” variable), you should transform that variable so that it visualises better:
$variable <- as.Date(dataset$variable) dataset
Categorical and ordered data
If you’re dealing with categorical data, it can be helpful to tell R that it has levels:
$variable <- factor(dataset$variable) dataset
To manually specify the order to R, send in an ordered list of the levels joined with c()
:
$variable <- factor(dataset$variable, levels = c("first_val", "second_val", ... )) dataset
This is particularly useful for the Coffee survey dataset.
Alternatively, if you only need to specify the first (reference) level, use
$variable <- factor(dataset$variable)
dataset$variable <- relevel(dataset$variable, ref = "reference_level") dataset
Renaming variables
Some datasets have cumbersome names for their variables. We can change variable names with
<- df %>%
df rename(new_name = old_name)
This is particularly useful for the World population dataset.
Visualisation
We use the ggplot()
function with geometries to create visualisations
library(ggplot2)
ggplot(data = dataset,
mapping = aes(x = ..., y = ..., colour = ..., ...)) +
geom_first_layer() +
geom_second_layer() +
...
Take a look at the ggplot2 documentation for more information.
Plotly workaround
If you’re having issues using ggplotly
(it’s producing a blank plot), you can use this workaround to view it in your browser.
<- ggplotly(saved_ggplot_image)
plot ::saveWidget(as_widget(plot), "plots/name_of_plot.html") htmlwidgets
Opening that file will show you the image.