Open In App

How To Make Scatterplot with Marginal Histograms in R?

Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will discuss how to make a Scatterplot with Marginal Histograms in the R Language.

Marginal Histograms in R:

A scatter plot and separate histograms for each variable on their respective axes are both combined in a marginal histogram, which is a visualization approach. Using this method, we can look at the univariate distributions of each variable as well as the relationship between the two variables.

The scatter plot, which depicts the relationship between the two variables of interest, is often placed in the plot’s center when producing a marginal histogram. A histogram for the first variable is drawn on the x-axis, and a histogram for the second variable is plotted on the y-axis. By doing so, we are able to see how each variable is distributed and how they are related to one another.

To do so we will use the ggExtra package of the R Language. The ggExtra is a collection of functions and layers to enhance ggplot2. The ggMarginal() function can be used to add marginal histograms/boxplots/density plots to ggplot2 scatterplots. 

Installation:

To install the ggExtra package we use:

install.packages("ggExtra")

After installation, we can load the package and use the following function to make a marginal histogram with a scatter plot.

Syntax: ggMarginal( plot, type=”histogram” )

Creating a basic scatter plot with marginal histogram:

Here, is a basic scatter plot with a marginal histogram using the ggMarginal function of the ggExtra package.

R




# load library tidyverse and ggExtra
library(tidyverse)
library(ggExtra)
 
# set theme
theme_set(theme_bw(12))
 
# create x and y vector
xAxis <- rnorm(1000)                
yAxis <- rnorm(1000) + xAxis + 10 
 
# create sample data frame
sample_data <- data.frame(xAxis, yAxis)
 
# create scatter plot using ggplot() function
plot <- ggplot(sample_data, aes(x=xAxis, y=yAxis))+
          geom_point()+
        theme(legend.position="none")
 
# use ggMarginal function to create marginal histogram
ggMarginal(plot, type="histogram")


Output:

Scatterplot with Marginal Histograms in RGeeksforgeeks

Scatterplot with Marginal Histograms in R

  • The tidyverse package, which contains ggplot2, and the ggExtra package are loaded first in the code. These packages offer tools for manipulating and visualizing data.
     
  • The theme “theme_bw” and a base font size of 12 are established by the theme_set() function. This establishes the stories’ general visual aesthetic.
     
  • xAxis and yAxis are two vectors produced by the rnorm() function. The scatter plot’s x and y coordinates are represented by these vectors.
     
  • Using the data.frame() function, a sample data frame with the name sample_data is produced. The xAxis and yAxis vectors are combined into a single data frame.
     
  • The ggplot() method is used to produce the scatter plot. The aes() function is used to map the xAxis variable to the x-axis and the yAxis variable to the y-axis. The sample_data data frame is supplied as the data source. The scatter plot’s data points are represented by additional points added using the geom_point() function.
     
  • To alter the plot’s visual style, use the theme() function. Legend.position=”none” is used in this instance to make the legend disappear from the plot.
     
  • Marginal histograms are produced using the ggExtra package’s ggMarginal() function. It adds histograms to the scatter plot’s margins using the plot object constructed in the preceding step as its input.

The final layout, which features a scatter plot and a marginal.
 

Color scatter plot with marginal histogram by group:

To color scatter plot by the group we use the col parameter of ggplot() function. To color the marginal histogram by the group we use groupColour and group fill as true.

Syntax: ggMarginal( plot, type=”histogram”, groupColour = TRUE, groupFill = TRUE )

Example: Here, we have a scatter plot with marginal histograms both colored by the group. We use boolean values for groupColor and groupFill according to formatting preference.

R




# load library tidyverse and ggExtra
library(tidyverse)
library(ggExtra)
 
# set theme
theme_set(theme_bw(12))
 
# create x and y vector
xAxis <- rnorm(1000)                
yAxis <- rnorm(1000) + xAxis + 10   
 
# create groups in variable using conditional statements
group <- rep(1, 1000)             
group[xAxis > -1.5] <- 2
group[xAxis > -0.5] <- 3
group[xAxis > 0.5] <- 4
group[xAxis > 1.5] <- 5
 
# create sample data frame
sample_data <- data.frame(xAxis, yAxis, group)
 
# create scatter plot using ggplot()
# function colored by group
plot <- ggplot(sample_data, aes(x=xAxis, y=yAxis,
                                col = as.factor(group)))+
          geom_point()+
        theme(legend.position="none")
 
# use ggMarginal function to create marginal histogram
ggMarginal(plot, type="histogram",
           groupColour = TRUE, groupFill = TRUE )


Output:

Scatterplot with Marginal Histograms in RGeeksforgeeks

Scatterplot with Marginal Histograms in R

  • The tidyverse package, which contains ggplot2, and the ggExtra package are loaded first in the code. These packages offer tools for manipulating and visualizing data.
     
  • The theme “theme_bw” and a base font size of 12 are established by the theme_set() function. This establishes the stories’ general visual aesthetic.
     
  • xAxis and yAxis are two vectors produced by the rnorm() function. The scatter plot’s x and y coordinates are represented by these vectors. The following phase involves grouping objects based on conditional expressions using the xAxis vector.
     
  • The group vector is generated, and each element’s value is initialized to 1. Then, based on the values of the xAxis, conditional statements are utilized to assign various group values. As an illustration, values higher than -1.5.
     
  • Using the data.frame() function, a sample data frame with the name sample_data is produced. It creates a single data frame from the xAxis, yAxis, and group vectors.
     
  • The ggplot() method is used to produce the scatter plot. The aes() function is used to translate the group variable to the color aesthetic (col), the xAxis variable to the x-axis, and the yAxis variable to the y-axis. The sample_data data frame is supplied as the data source. The scatter plot’s data points are represented by additional points added using the geom_point() function.
     
  • To alter the plot’s visual style, use the theme() function. Legend.position=”none” is used in this instance to make the legend disappear from the plot.
     
  • Marginal histograms are produced using the ggExtra package’s ggMarginal() function. It adds histograms to the scatter plot’s margins using the plot object constructed in the preceding step as its input. The histograms are colored according to the group variable using the groupColour = TRUE and groupFill = TRUE options.

The resulting plot is shown, together with the marginal histograms and the scatter plot with colored points.



Last Updated : 13 Jun, 2023
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads