Comprehensive Guide to Scatter Plot using ggplot2 in R
In this article, we are going to see how to use scatter plots using ggplot2 in the R programming language.
ggplot2 package is a free, open-source, and easy-to-use visualization package widely used in R. It is the most powerful visualization package written by Hadley Wickham. This package can be installed using the R function install.packages().
install.packages("ggplot2")
A scatter plot uses dots to represent values for two different numeric variables and is used to observe relationships between those variables. To plot scatterplot we will use we will be using geom_point() function. Following is brief information about ggplot function, geom_point().
Syntax : geom_point(size, color, fill, shape, stroke)
Parameter :
- size : Size of Points
- color : Color of Points/Border
- fill : Color of Points
- shape : Shape of Points in in range from 0 to 25
- stroke : Thickness of point border
- Return : It creates scatterplots.
Example: Simple scatterplot
R
library (ggplot2) ggplot (iris, aes (x = Sepal.Length, y = Sepal.Width)) + geom_point () |
Output:
Scatter plot with groups
Here we will use distinguish the values by a group of data (i.e. factor level data). aes() function controls the color of the group and it should be factor variable.
Syntax:
aes(color = factor(variable))
Example: Scatterplot with groups
R
# Scatter plot with groups ggplot (iris, aes (x = Sepal.Length, y = Sepal.Width)) + geom_point ( aes (color = factor (Sepal.Width))) |
Output:
Changing color
Here we use aes() methods color attributes to change the color of the datapoints with specific variables.
Example: Changing color
R
# Changing color ggplot (iris) + geom_point ( aes (x = Sepal.Length, y = Sepal.Width, color = Species)) |
Output:
Changing Shape
To change the shape of the datapoints we will use shape attributes with aes() methods.
Example: Changing shape
R
# Changing point shapes in a ggplot scatter plot # Changing color ggplot (iris) + geom_point ( aes (x = Sepal.Length, y = Sepal.Width, shape = Species , color = Species)) |
Output:
Changing the size aesthetic
To change the aesthetic or datapoints we will use size attributes in aes() methods.
Example: Changing size
R
# Changing the size aesthetic mapping in a # ggplot scatter plot ggplot (iris) + geom_point ( aes (x = Sepal.Length, y = Sepal.Width, size = .5)) |
Output:
Label points in the scatter plot
To deploy the labels on the datapoint we will use label into the geom_text() methods.
Example: Label points in the scatter plot
R
# Label points in the scatter plot ggplot (iris, aes (x = Sepal.Length, y = Sepal.Width)) + geom_point () + geom_text (label= rownames (iris)) |
Output:
Regression lines in ggplot2
Regression models a target prediction value supported independent variables and mostly used for finding out the relationship between variables and forecasting. In R we can use the stat_smooth() function to smoothen the visualization.
Syntax: stat_smooth(method=”method_name”, formula=fromula_to_be_used, geom=’method name’)
Parameters:
- method: It is the smoothing method (function) to use for smoothing the line
- formula: It is the formula to use in the smoothing function
- geom: It is the geometric object to use display the data
Example: Regression line
R
# Add regression lines with stat_smooth ggplot (iris, aes (x = Sepal.Length, y = Sepal.Width)) + geom_point () + stat_smooth (method=lm) |
Output:
Example: Using stat_mooth with loess mode
R
# Add regression lines with stat_smooth ggplot (iris, aes (x = Sepal.Length, y = Sepal.Width)) + geom_point () + stat_smooth () |
Output:
geom_smooth() function to represent a regression line and smoothen the visualization.
Syntax: geom_smooth(method=”method_name”, formula=fromula_to_be_used)
Parameters:
- method: It is the smoothing method (function) to use for smoothing the line
- formula: It is the formula to use in the smoothing function
Example: Using geom_smooth()
R
# Add regression lines with geom_smooth ggplot (iris, aes (x = Sepal.Length, y = Sepal.Width)) + geom_point () + geom_smooth () |
Output:
In order to show the regression line on the graphical medium with help of geom_smooth() function, we pass the method as “loess” and the formula used as y ~ x.
Example: geom_smooth with loess mode
R
# Add regression lines with geom_smooth ggplot (iris, aes (x = Sepal.Length, y = Sepal.Width)) + geom_point () + geom_smooth (method=lm, se= FALSE ) |
Output:
The intercept and slope can be easily calculated by the lm() function which is used for linear regression followed by coefficients().
Example: Intercept and slope
R
# Add regression lines with geom_smooth ggplot (iris, aes (x = Sepal.Length, y = Sepal.Width)) + geom_point () + geom_smooth (intercept = 37, slope = -5, color= "red" , linetype= "dashed" , size=1.5) |
Output:
Change the point color/shape/size manually
scale_fill_manual, scale_size_manual, scale_shape_manual, scale_linetype_manual, are builtin types which is assign desired colors to categorical data, we use one of them scale_color_manual() function, which is used to scale (map).
Syntax :
- scale_shape_manualValue) for point shapes
- scale_color_manual(Value) for point colors
- scale_size_manual(Value) for point sizes
Parameter :
- values : A set of aesthetic values to map the data. Here we take desired set of colors.
Return : Scale the manual values of colors on data
Example: Changing aesthetics
R
# Change the point color/shape/size manually library (ggplot2) # Change point shapes and colors manually ggplot (iris, aes (x = Sepal.Length, y = Sepal.Width, color = Species)) + geom_point () + geom_smooth (method=lm, se= FALSE , fullrange= TRUE )+ scale_shape_manual (values= c (3, 16, 17))+ scale_color_manual (values= c ( '#999999' , '#E69F00' , '#56B4E9' ))+ theme (legend.position= "top" ) |
Output:
Marginal rugs to a scatter plot
To add marginal rugs to the scatter plot we will use geom_rug() methods.
Example: Marginal rugs
R
# Add marginal rugs to a scatter plot # Changing point shapes in a ggplot scatter plot # Changing color ggplot (iris) + geom_point ( aes (x = Sepal.Length, y = Sepal.Width, shape = Species , color = Species))+ geom_rug () |
Output:
Here we will add marginal rugs into the scatter plot
Example: Marginal rugs
R
# Add marginal rugs to a scatter plot ggplot (iris, aes (x = Sepal.Length, y = Sepal.Width)) + geom_point ()+ geom_rug () |
Output:
Scatter plots with the 2-D density estimation
To create density estimation in scatter plot we will use geom_density_2d() methods and geom_density_2d_filled() from ggplot2.
Syntax: ggplot( aes(x)) + geom_density_2d( fill, color, alpha)
Parameters:
- fill: background color below the plot
- color: the color of the plotline
- alpha: transparency of graph
Example: Scatterplots with 2-D density estimation
R
# Scatter plots with the 2d density estimation ggplot (iris, aes (x = Sepal.Length, y = Sepal.Width)) + geom_point ()+ geom_density_2d () |
Output:
Using geom_density_2d_filled() to visualize the situation of color inside the datapoints
Example: Adding aesthetics
R
ggplot (iris, aes (x = Sepal.Length, y = Sepal.Width)) + geom_point ()+ geom_density_2d (alpha = 0.5)+ geom_density_2d_filled () |
Output:
stat_density_2d() can be also used to deploy the 2d density estimation.
Example: Deploy density estimation
R
# Scatter plots with the 2d density estimation ggplot (iris, aes (x = Sepal.Length, y = Sepal.Width)) + geom_point ()+ stat_density_2d () |
Output:
Scatter plots with ellipses
To add a circle or ellipse around a cluster of data points, we use the stat_ellipse() function. This function automatically computes the circle/ellipse radius to draw around the cluster of points by categorical data.
Example: Scatterplot with ellipses
R
# Scatter plots with ellipses ggplot (iris, aes (x = Sepal.Length, y = Sepal.Width)) + geom_point ()+ stat_ellipse () |
Output: