Difference between revisions of "Ggplot2"

From 太極
Jump to navigation Jump to search
Line 477: Line 477:
== Coefficients, intervals, errorbars ==
== Coefficients, intervals, errorbars ==
[https://stackoverflow.com/a/42560960 Plotting two models with regression coefficients] with [https://ggplot2.tidyverse.org/reference/geom_linerange.html geom_pointrange()] - Vertical intervals: lines, crossbars & errorbars.
* [https://stackoverflow.com/a/42560960 Plotting two models with regression coefficients] with [https://ggplot2.tidyverse.org/reference/geom_linerange.html geom_pointrange()] - Vertical intervals: lines, crossbars & errorbars.
* [https://stackoverflow.com/q/49483128 Grouping and staggering estimates with geom_point]
= Special plots =
= Special plots =

Revision as of 14:24, 9 June 2020




The Grammar of Graphics

  • Data: Raw data that we'd like to visualize
  • Geometrics: shapes that we use to visualize data
  • Aesthetics: Properties of geometries (size, color, etc)
  • Scales: Mapping between geometries and aesthetics

Scatterplot aesthetics

geom_point(). The aesthetics is geom dependent.

  • x, y
  • shape
  • color
  • size. It is not always to put 'size' inside aes(). See an example at Legend layout.
  • alpha



> library(ggplot2)
Need help? Try Stackoverflow: https://stackoverflow.com/tags/ggplot2


Some examples

Examples from 'R for Data Science' book - Aesthetic mappings

ggplot(data = mpg) + 
  geom_point(mapping = aes(x = displ, y = hwy))
  # the 'mapping' is the 1st argument for all geom_* functions, so we can safely skip it.
# template
ggplot(data = <DATA>) + 
  <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))

# add another variable through color, size, alpha or shape
ggplot(data = mpg) + 
  geom_point(aes(x = displ, y = hwy, color = class))

ggplot(data = mpg) + 
  geom_point(aes(x = displ, y = hwy, size = class))

ggplot(data = mpg) + 
  geom_point(aes(x = displ, y = hwy, alpha = class))

ggplot(data = mpg) + 
  geom_point(aes(x = displ, y = hwy, shape = class))

ggplot(data = mpg) + 
  geom_point(aes(x = displ, y = hwy), color = "blue")

# add another variable through facets
ggplot(data = mpg) + 
  geom_point(aes(x = displ, y = hwy)) + 
  facet_wrap(~ class, nrow = 2)

# add another 2 variables through facets
ggplot(data = mpg) + 
  geom_point(aes(x = displ, y = hwy)) + 
  facet_grid(drv ~ cyl)

Examples from 'R for Data Science' book - Geometric objects, lines and smoothers

# Points
ggplot(data = mpg) + 
  geom_point(aes(x = displ, y = hwy)) # we can add color to aes()

# Line plot
ggplot() +
  geom_line(aes(x, y))  # we can add color to aes()

# Smoothed
ggplot(data = mpg) + 
  geom_smooth(aes(x = displ, y = hwy))

# Points + smoother, add transparency to points, remove se
# We add transparency if we need to make smoothed line stands out
#                    and points less significant
# We move aes to the '''mapping''' option in ggplot()
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) + 
  geom_point(alpha=1/10) +

# Colored points + smoother
ggplot(data = mpg, aes(x = displ, y = hwy)) + 
  geom_point(aes(color = class)) + 

Examples from 'R for Data Science' book - Transformation, bar plot

# y axis = counts
# bar plot
ggplot(data = diamonds) + 
  geom_bar(aes(x = cut))
# Or
ggplot(data = diamonds) + 
  stat_count(aes(x = cut))

# y axis = proportion
ggplot(data = diamonds) + 
  geom_bar(aes(x = cut, y = ..prop.., group = 1))

# bar plot with 2 variables
ggplot(data = diamonds) + 
  geom_bar(aes(x = cut, fill = clarity))

facet_wrap and facet_grid to create a panel of plots

Color palette

Color picker


Colour related aesthetics: colour, fill and alpha


Combine colors and shapes in legend

  • https://ggplot2-book.org/scales.html#scale-details In order for legends to be merged, they must have the same name.
    df <- data.frame(x = 1:3, y = 1:3, z = c("a", "b", "c"))
    ggplot(df, aes(x, y)) + geom_point(aes(shape = z, colour = z), size=4)
  • How to Work with Scales in a ggplot2 in R. This solution is better since it allows to change the legend title. Just make sure the title name we put in both scale_* functions are the same.
    ggplot(mtcars, aes(x=hp, y=mpg)) +
       geom_point(aes(shape=factor(cyl), colour=factor(cyl))) +
       scale_shape_discrete("Cylinders") +

ggplot2::scale functions and scales packages

  • Scales control the mapping from data to aesthetics. They take your data and turn it into something that you can see, like size, colour, position or shape.
  • Scales also provide the tools that let you read the plot: the axes and legends.

ggplot2::scale - axes/axis, legend


Naming convention: scale_AestheticName_NameDataType where

  • AestheticName can be x, y, color, fill, size, shape, ...
  • NameDataType can be continuous, discrete, manual or gradient.


  • See Figure 12.1: Axis and legend components on the book ggplot2: Elegant Graphics for Data Analysis
    # Set x-axis label
    scale_x_discrete("Car type")   # or a shortcut xlab() or labs()
    # Set legend title
    scale_colour_discrete("Drive\ntrain")    # or a shortcut labs()
    # Change the default color
    # Change the axis scale
    # Change breaks and their labels
    scale_x_continuous(breaks = c(2000, 4000), labels = c("2k", "4k"))
    # Relabel the breaks in a categorical scale
    scale_y_discrete(labels = c(a = "apple", b = "banana", c = "carrot"))
  • How to change the color in geom_point or lines in ggplot
    ggplot() + 
      geom_point(data = data, aes(x = time, y = y, color = sample),size=4) +
      scale_color_manual(values = c("A" = "black", "B" = "red"))
    ggplot(data = data, aes(x = time, y = y, color = sample)) + 
      geom_point(size=4) + 
      geom_line(aes(group = sample)) + 
      scale_color_manual(values = c("A" = "black", "B" = "red"))

ylim and xlim in ggplot2 in axes

https://stackoverflow.com/questions/3606697/how-to-set-limits-for-axes-in-ggplot2-r-plots or the Zooming part of the cheatsheet

Use one of the following

  • + scale_x_continuous(limits = c(-5000, 5000))
  • + coord_cartesian(xlim = c(-5000, 5000))
  • + xlim(-5000, 5000)

Emulate ggplot2 default color palette

It is just equally spaced hues around the color wheel. Emulate ggplot2 default color palette

Answer 1

gg_color_hue <- function(n) {
  hues = seq(15, 375, length = n + 1)
  hcl(h = hues, l = 65, c = 100)[1:n]

n = 4
cols = gg_color_hue(n)

dev.new(width = 4, height = 4)
plot(1:n, pch = 16, cex = 2, col = cols)

Answer 2 (better, it shows the color values in HEX). It should be read from left to right and then top to down.

scales package

show_col(hue_pal()(2)) # (salmon, iris blue) 
           # see https://www.htmlcsscolor.com/ for color names

transform scales

How to make that crazy Fox News y axis chart with ggplot2 and scales

Class variables

"Set1" is a good choice. See RColorBrewer::display.brewer.all()

Heatmap for single channel


# White <----> Blue
RColorBrewer::display.brewer.pal(n = 8, name = "Blues")

Heatmap for dual channels


# Red <----> Blue
display.brewer.pal(n = 8, name = 'RdBu')
# Hexadecimal color specification 
brewer.pal(n = 8, name = "RdBu")

plot(1:8, col=brewer_pal(palette = "RdBu")(8), pch=20, cex=4)

# Blue <----> Red
plot(1:8, col=rev(brewer_pal(palette = "RdBu")(8)), pch=20, cex=4)


Themes and background for ggplot2

ggplot() + geom_bar(aes(x=, fill=y)) +
           theme(panel.background=element_rect(fill='purple')) + 

ggplot() + geom_bar(aes(x=, fill=y)) + 
           theme(panel.background=element_blank()) + 
           theme(plot.background=element_blank()) # minimal background like base R
           # the grid lines are not gone; they are white so it is the same as the background

ggplot() + geom_bar(aes(x=, fill=y)) + 
           theme(panel.background=element_blank()) + 
           theme(plot.background=element_blank()) +
           theme(panel.grid.major.y = element_line(color="grey"))
           # draw grid line on y-axis only

ggplot() + geom_bar() +

ggplot() + geom_bar() +

ggplot() + geom_bar() +

ggplot() + geom_bar() +


ggthmr package



Font size

Change Font Size of ggplot2 Plot in R (5 Examples) | Axis Text, Main Title & Legend

Rotate x-axis labels

theme(axis.text.x = element_text(angle = 90)

ggthemes package


ggplot() + geom_bar() +
           theme_solarized()   # sun color in the background


Common plots


Line plots


Histograms is a special case of bar plots. Instead of drawing each unique individual values as a bar, a histogram groups close data points into bins.

ggplot(data = txhousing, aes(x = median)) +
  geom_histogram()  # adding 'origin =0' if we don't expect negative values.
                    # adding 'bins=10' to adjust the number of bins
                    # adding 'binwidth=10' to adjust the bin width

Histogram vs barplot from deeply trivial.

Boxplot with jittering

What is a boxplot

# Only 1 variable
ggplot(data.frame(Wi), aes(y = Wi)) + 

# Two variable, one of them is a factor
ggplot() + geom_jitter(mapping = aes(x, y))

# Box plot
ggplot() + geom_boxplot(mapping = aes(x, y))
# df2 is n x 2 
ggplot(df2, aes(x=nboot, y=boot)) +
  geom_boxplot(outlier.shape=NA) + #avoid plotting outliers twice
  geom_jitter(aes(color=nboot), position=position_jitter(width=.2, height=0, seed=1)) +
  labs(title="", y = "", x = "nboot")

If we omit the outlier.shape=NA option in geom_boxplot(), we will get the following plot.


Violin plot

ggplot(midwest, aes(state, area)) + geom_violin() + ggforce::geom_sina()


Kernel density plot

Bivariate analysis with ggpair

Correlation in R: Pearson & Spearman with Matrix Example


How to basic: bar plots

Ordered barplot and facet

ggplot(df, aes(x=reorder(x, -y), y=y)) + geom_bar(stat = 'identity')

ggplot(df, aes(x=reorder(x, desc(y)), y=y)), geom_col()

Back to back barplot

Flip x and y axes


Polygon and map plot


Step function

Connect observations: geom_path(), geom_step()

Example: KM curves (without legend)

sf <- survfit(Surv(time, status) ~ x, data = aml)
str(sf) # the first 10 forms one strata and the rest 10 forms the other
ggplot() + 
  geom_step(aes(x=c(0, sf$time[1:10]), y=c(1, sf$surv[1:10])), 
            col='red') + 
  scale_x_continuous('Time', limits = c(0, 161)) + 
  scale_y_continuous('Survival probability', limits = c(0, 1)) +
  geom_step(aes(x=c(0, sf$time[11:20]), y=c(1, sf$surv[11:20])), 
# cf:  plot(sf, col = c('red', 'black'), mark.time=FALSE)

Same example but with legend (see Construct a manual legend for a complicated plot)

cols <- c("NEW"="#f04546","STD"="#3591d1")
ggplot() + 
  geom_step(aes(x=c(0, sf$time[1:10]), y=c(1, sf$surv[1:10]), col='NEW')) +
  scale_x_continuous('Time', limits = c(0, 161)) + 
  scale_y_continuous('Survival probability', limits = c(0, 1)) +
  geom_step(aes(x=c(0, sf$time[11:20]), y=c(1, sf$surv[11:20]), col='STD')) + 
  scale_colour_manual(name="Treatment", values = cols)

Coefficients, intervals, errorbars

Special plots

Bump plot: plot ranking over time


Gauge plots

Generating gauge plots in ggplot2




GUI/Helper packages

ggedit & ggplotgui – interactive ggplot aesthetic and theme editor

esquisse (French, means 'sketch'): creating ggplot2 interactively


A 'shiny' gadget to create 'ggplot2' charts interactively with drag-and-drop to map your variables. You can quickly visualize your data accordingly to their type, export to 'PNG' or 'PowerPoint', and retrieve the code to reproduce the chart.

The interface introduces basic terms used in ggplot2:

  • x, y,
  • fill (useful for geom_bar, geom_rect, geom_boxplot, & geom_raster, not useful for scatterplot),
  • color (edges for geom_bar, geom_line, geom_point),
  • size,
  • facet, split up your data by one or more variables and plot the subsets of data together.

It does not include all features in ggplot2. At the bottom of the interface,

  • Labels & title & caption.
  • Plot options. Palette, theme, legend position.
  • Data. Remove subset of data.
  • Export & code. Copy/save the R code. Export file as PNG or PowerPoint.





R → plotly

ggconf: Simpler Appearance Modification of 'ggplot2'


Plotting individual observations and group means



Easy way to mix multiple graphs on the same page


Force a regular plot object into a Grob for use in grid.arrange

gridGraphics package

make one panel blank/create a placeholder


labs for x and y axes

x and y labels

https://stackoverflow.com/questions/10438752/adding-x-and-y-axis-labels-in-ggplot2 or the Labels part of the cheatsheet

You can set the labels with xlab() and ylab(), or make it part of the scale_*.* call.

labs(x = "sample size", y = "ngenes (glmnet)")

scale_x_discrete(name="sample size")
scale_y_continuous(name="ngenes (glmnet)", limits=c(100, 500))

name-value pairs

See several examples (color, fill, size, ...) from opioid prescribing habits in texas.

Prevent sorting of x labels

See Change the order of a discrete x scale.

The idea is to set the levels of x variable.

junk   # n x 2 table
colnames(junk) <- c("gset", "boot")
junk$gset <- factor(junk$gset, levels = as.character(junk$gset))
ggplot(data = junk, aes(x = gset, y = boot, group = 1)) + 
  geom_line() + 
  theme(axis.text.x=element_text(color = "black", angle=30, vjust=.8, hjust=0.8))


Legend title

  • labs() function
    p <- ggplot(df, aes(x, y)) + geom_point(aes(colour = z))
    p + labs(x = "X axis", y = "Y axis", colour = "Colour\nlegend")
  • scale_colour_manual()
    scale_colour_manual("Treatment", values = c("black", "red"))
  • scale_color_discrete() and scale_shape_discrete(). See Combine colors and shapes in legend.
    df <- data.frame(x = 1:3, y = 1:3, z = c("a", "b", "c"))
    ggplot(df, aes(x, y)) + geom_point(aes(shape = z, colour = z), size=5) + 
      scale_color_discrete('new title') + scale_shape_discrete('new title')

Layout: move the legend from right to top/bottom of the plot or hide it

gg + theme(legend.position = "top")

gg + theme(legend.position="none")

Guide functions for finer control

https://ggplot2-book.org/scales.html#guide-functions The guide functions, guide_colourbar() and guide_legend(), offer additional control over the fine details of the legend.

guide_legend() allows the modification of legends for scales, including fill, color, and shape.

This function can be used in scale_fill_manual(), scale_fill_continuous(), ... functions.

scale_fill_manual(values=c("orange", "blue"), 
                  guide=guide_legend(title = "My Legend Title",
                                     nrow=1,  # multiple items in one row
                                     label.position = "top", # move the texts on top of the color key
                                     keywidth=2.5)) # increase the color key width

The problem with the default setting is it leaves a lot of white space above and below the legend. To change the position of the entire legend to the bottom of the plot, we use theme().

theme(legend.position = 'bottom')

Legend symbol background

ggplot() + geom_point(aes(x, y, color, size)) +
           theme(legend.key = element_blank())
           # remove the symbol background in legend

Construct a manual legend for a complicated plot



Centered title

See the Legends part of the cheatsheet.

ggtitle("MY TITLE") +
  theme(plot.title = element_text(hjust = 0.5))


ggtitle("My title",
        subtitle = "My subtitle")



Time series plot

Multiple lines plot https://stackoverflow.com/questions/14860078/plot-multiple-lines-data-series-each-with-unique-color-in-r

nc <- 9
df <- data.frame(x=rep(1:5, nc), val=sample(1:100, 5*nc), 
                   variable=rep(paste0("category", 1:nc), each=5))
# plot
# http://colorbrewer2.org/#type=qualitative&scheme=Paired&n=9
ggplot(data = df, aes(x=x, y=val)) + 
    geom_line(aes(colour=variable)) + 
    scale_colour_manual(values=c("#a6cee3", "#1f78b4", "#b2df8a", "#33a02c", "#fb9a99", "#e31a1c", "#fdbf6f", "#ff7f00", "#cab2d6"))

Versus old fashion

dat <- matrix(runif(40,1,20),ncol=4) # make data
matplot(dat, type = c("b"),pch=1,col = 1:4) #plot
legend("topleft", legend = 1:4, col=1:4, pch=1) # optional legend

Github style calendar plot

geom_bar(), geom_col(), stat_count()


geom_bar() can not specify the y-axis. To specify y-axis, use geom_col().

ggplot() + geom_col(mapping = aes(x, y))

Add numbers to the plot

An example


Line segments, arrows and curves

Use geom_line() to create a square bracket to annotate the plot

Barchart with Significance Tests

geom_errorbar(): error bars

x <- rnorm(10)
SE <- rnorm(10)
y <- 1:10

xlim <- c(-4, 4)
plot(x[1:5], 1:5, xlim=xlim, ylim=c(0+.1,6-.1), yaxs="i", xaxt = "n", ylab = "", pch = 16, las=1)
mtext("group 1", 4, las = 1, adj = 0, line = 1) # las=text rotation, adj=alignment, line=spacing
plot(x[6:10], 6:10, xlim=xlim, ylim=c(5+.1,11-.1), yaxs="i", ylab ="", pch = 16, las=1, xlab="")
arrows(x[6:10]-SE[6:10], 6:10, x[6:10]+SE[6:10], 6:10, code=3, angle=90, length=0)
mtext("group 2", 4, las = 1, adj = 0, line = 1)


geom_rect(), geom_bar()

Note that we can use scale_fill_manual() to change the 'fill' colors (scheme/palette). The 'fill' parameter in geom_rect() is only used to define the discrete variable.

ggplot(data=) +
  geom_bar(aes(x=, fill=)) +
  scale_fill_manual(values = c("orange", "blue"))


geom_hline(), geom_vline()


text annotations: ggrepel package

  • https://ggplot2-book.org/annotations.html
    annotate("text", label="Toyota", x=3, y=100)
    geom_text(aes(x, y, label), data, size, vjust, hjust, nudge_x)
  • Use the nudge_y parameter to avoid the overlap of the point and the text such as
    ggplot() + geom_point() +
               geom_text(aes(x, y, label), color='red', data, nudge_y=1)


ggtext: Improved text rendering support for ggplot2


Adding Custom Fonts to ggplot in R

Save the plots

ggsave() We can specify dpi to increase the resolution. For example,

g1 <- ggplot(data = mydf) 
ggsave("myfile.png", g1, height = 7, width = 8, units = "in", dpi = 500)

I got an error - Error in loadNamespace(name) : there is no package called ‘svglite’. After I install the package, everything works fine.


smoothScatter with ggplot2


Add your brand to ggplot graph

You Need to Start Branding Your Graphs. Here's How, with ggplot!

Animation and gganimate

Write your own ggplot2 function: rlang

How to make your own ggplot2 functions


plotnine: A Grammar of Graphics for Python.

plotnine is an implementation of a grammar of graphics in Python, it is based on ggplot2. The grammar allows users to compose plots by explicitly mapping data to the visual objects that make up the plot.

The Hitchhiker’s Guide to Plotnine