Skip to content
Related Articles

Related Articles

Comprehensive Guide to Scatter Plot using ggplot2 in R

View Discussion
Improve Article
Save Article
Like Article
  • Last Updated : 23 Feb, 2022

In this article, we are going to see how to use scatter plots using ggplot2 in the R programming language.

ggplot2 package is a free, open-source, and easy-to-use visualization package widely used in R. It is the most powerful visualization package written by Hadley Wickham. This package can be installed using the R function install.packages().

install.packages("ggplot2")

A scatter plot uses dots to represent values for two different numeric variables and is used to observe relationships between those variables. To plot scatterplot we will use we will be using geom_point() function. Following is brief information about ggplot function, geom_point().

Syntax : geom_point(size, color, fill, shape, stroke)

Parameter :

  • size : Size of Points
  • color : Color of Points/Border
  • fill : Color of Points
  • shape : Shape of Points in in range from 0 to 25
  • stroke : Thickness of point border
  • Return : It creates scatterplots.

Example: Simple scatterplot

R




library(ggplot2)
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
    geom_point()


 
 

Output:

 

Scatter plot with groups

 

Here we will use distinguish the values by a group of data (i.e. factor level data). aes() function controls the color of the group and it should be factor variable.

 

 Syntax: 

 

aes(color = factor(variable))

 

Example: Scatterplot with groups

 

R




# Scatter plot with groups
 
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
    geom_point(aes(color = factor(Sepal.Width)))


 
 

Output:

 

Changing color

 

Here we use aes() methods color attributes to change the color of the datapoints with specific variables.

 

Example: Changing color

 

R




# Changing color
 
ggplot(iris) +
    geom_point(aes(x = Sepal.Length,
                   y = Sepal.Width,
                   color = Species))


 
 

Output:

 

Changing Shape

 

To change the shape of the datapoints we will use shape attributes with aes() methods.

 

Example: Changing shape

 

R




# Changing point shapes in a ggplot scatter plot
# Changing color
 
ggplot(iris) +
    geom_point(aes(x = Sepal.Length, y = Sepal.Width,
                   shape = Species , color = Species))


 
 

Output:

 

Changing the size aesthetic

 

To change the aesthetic or datapoints we will use size attributes in aes() methods.

 

Example: Changing size

 

R




# Changing the size aesthetic mapping in a
# ggplot scatter plot
 
ggplot(iris) +
    geom_point(aes(x = Sepal.Length,
                   y = Sepal.Width,
                   size = .5))


 
 

Output:

 

Label points in the scatter plot

 

To deploy the labels on the datapoint we will use label into the geom_text() methods.

 

Example: Label points in the scatter plot

 

R




# Label points in the scatter plot
 
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
    geom_point() +
    geom_text(label=rownames(iris))


 
 

Output:

 

Regression lines in ggplot2

 

Regression models a target prediction value supported independent variables and mostly used for finding out the relationship between variables and forecasting. In R we can use the stat_smooth() function to smoothen the visualization.

 

Syntax: stat_smooth(method=”method_name”, formula=fromula_to_be_used, geom=’method name’)

Parameters:  

  • method: It is the smoothing method (function) to use for smoothing the line
  • formula: It is the formula to use in the smoothing function
  • geom: It is the geometric object to use display the data

 

Example: Regression line

 

R




# Add regression lines with stat_smooth
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
    geom_point() +
    stat_smooth(method=lm)


 
 

Output:

 

 

Example: Using stat_mooth with loess mode

 

R




# Add regression lines with stat_smooth
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
    geom_point() +
    stat_smooth()


 
 

Output:

 

 

geom_smooth() function to represent a regression line and smoothen the visualization.  

 

Syntax: geom_smooth(method=”method_name”, formula=fromula_to_be_used)

Parameters:

  • method: It is the smoothing method (function) to use for smoothing the line
  • formula: It is the formula to use in the smoothing function

 

Example: Using geom_smooth()

 

R




# Add regression lines with geom_smooth
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
    geom_point() +
    geom_smooth()


 
 

Output:

 

 

In order to show the regression line on the graphical medium with help of geom_smooth() function, we pass the method as “loess” and the formula used as y ~ x.

 

Example: geom_smooth with loess mode

 

R




# Add regression lines with geom_smooth
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
    geom_point() +
    geom_smooth(method=lm, se=FALSE)


 
 

Output:

 

 

The intercept and slope can be easily calculated by the lm() function which is used for linear regression followed by coefficients().

 

Example: Intercept and slope

 

R




# Add regression lines with geom_smooth
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
    geom_point() +
    geom_smooth(intercept = 37, slope = -5, color="red",
                 linetype="dashed", size=1.5)


 
 

Output:

 

Change the point color/shape/size manually

 

scale_fill_manual, scale_size_manual, scale_shape_manual, scale_linetype_manual, are builtin types which is assign desired colors to categorical data, we use one of them scale_color_manual() function, which is used to scale (map).

 

Syntax : 

  • scale_shape_manualValue) for point shapes
  • scale_color_manual(Value) for point colors
  • scale_size_manual(Value) for point sizes

Parameter :

  • values : A set of aesthetic values to map the data. Here we take desired set of colors.

Return : Scale the manual values of colors on data

 

Example: Changing aesthetics

 

R




# Change the point color/shape/size manually
library(ggplot2)
 
# Change point shapes and colors manually
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
    geom_point() +
    geom_smooth(method=lm, se=FALSE, fullrange=TRUE)+
    scale_shape_manual(values=c(3, 16, 17))+
    scale_color_manual(values=c('#999999','#E69F00', '#56B4E9'))+
    theme(legend.position="top")


 
 

Output:

 

Marginal rugs to a scatter plot

 

To add marginal rugs to the scatter plot we will use geom_rug() methods.

 

Example: Marginal rugs

 

R




# Add marginal rugs to a scatter plot
# Changing point shapes in a ggplot scatter plot
# Changing color
 
ggplot(iris) +
    geom_point(aes(x = Sepal.Length, y = Sepal.Width,
                   shape = Species , color = Species))+
    geom_rug()


 
 

Output:

 

 

Here we will add marginal rugs into the scatter plot

 

Example: Marginal rugs

 

R




# Add marginal rugs to a scatter plot
 
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
    geom_point()+
    geom_rug()


 
 

Output:

 

Scatter plots with the 2-D density estimation

 

To create density estimation in scatter plot we will use geom_density_2d() methods and geom_density_2d_filled() from ggplot2.

 

Syntax: ggplot( aes(x)) + geom_density_2d( fill, color, alpha)

Parameters:

  • fill: background color below the plot
  • color: the color of the plotline
  • alpha: transparency of graph

 

Example: Scatterplots with 2-D density estimation

 

R




# Scatter plots with the 2d density estimation
 
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
    geom_point()+
    geom_density_2d()


 
 

Output:

 

 

Using geom_density_2d_filled() to visualize the situation of color inside the datapoints

 

Example: Adding aesthetics

 

R




ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
    geom_point()+
    geom_density_2d(alpha = 0.5)+
    geom_density_2d_filled()


 
 

Output:

 

 

stat_density_2d() can be also used to deploy the 2d density estimation.

 

Example: Deploy density estimation

 

R




# Scatter plots with the 2d density estimation
 
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
    geom_point()+
    stat_density_2d()


 
 

Output:

 

Scatter plots with ellipses

 

To add a circle or ellipse around a cluster of data points, we use the stat_ellipse() function. This function automatically computes the circle/ellipse radius to draw around the cluster of points by categorical data.

 

Example: Scatterplot with ellipses

 

R




# Scatter plots with ellipses
 
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
    geom_point()+
    stat_ellipse()


 
 

Output:

 

 


My Personal Notes arrow_drop_up
Recommended Articles
Page :

Start Your Coding Journey Now!