Trending October 2023 # Scatter Plot In R Using Ggplot2 (With Example) # Suggested November 2023 # Top 13 Popular | Phuhoabeautyspa.com

# Trending October 2023 # Scatter Plot In R Using Ggplot2 (With Example) # Suggested November 2023 # Top 13 Popular

You are reading the article Scatter Plot In R Using Ggplot2 (With Example) updated in October 2023 on the website Phuhoabeautyspa.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested November 2023 Scatter Plot In R Using Ggplot2 (With Example)

Graphs are the third part of the process of data analysis. The first part is about data extraction, the second part deals with cleaning and manipulating the data. At last, the data scientist may need to communicate his results graphically.

The job of the data scientist can be reviewed in the following picture

The first task of a data scientist is to define a research question. This research question depends on the objectives and goals of the project.

After that, one of the most prominent tasks is the feature engineering. The data scientist needs to collect, manipulate and clean the data

When this step is completed, he can start to explore the dataset. Sometimes, it is necessary to refine and change the original hypothesis due to a new discovery.

When the explanatory analysis is achieved, the data scientist has to consider the capacity of the reader to understand the underlying concepts and models.

His results should be presented in a format that all stakeholders can understand. One of the best methods to communicate the results is through a graph.

Graphs are an incredible tool to simplify complex analysis.

In this tutorial, you will learn-

ggplot2 package

This part of the tutorial focuses on how to make graphs/charts with R.

In this tutorial, you are going to use ggplot2 package. This package is built upon the consistent underlying of the book Grammar of graphics written by Wilkinson, 2005. ggplot2 is very flexible, incorporates many themes and plot specification at a high level of abstraction. With ggplot2, you can’t plot 3-dimensional graphics and create interactive graphics.

In ggplot2, a graph is composed of the following arguments:

data

aesthetic mapping

geometric object

statistical transformations

scales

coordinate system

faceting

You will learn how to control those arguments in the tutorial.

The basic syntax of ggplot2 is:

ggplot(data, mapping=aes()) + geometric object arguments: data: Dataset used to plot the graph mapping: Control the x and y-axis geometric object: The type of plot you want to show. The most common object are: - Point: `geom_point()` - Bar: `geom_bar()` - Line: `geom_line()` - Histogram: `geom_histogram()` Scatterplot

Let’s see how ggplot works with the mtcars dataset. You start by plotting a scatterplot of the mpg variable and drat variable.

Basic scatter plot library(ggplot2) ggplot(mtcars, aes(x = drat, y = mpg)) + geom_point()

Code Explanation

You first pass the dataset mtcars to ggplot.

Inside the aes() argument, you add the x-axis and y-axis.

The + sign means you want R to keep reading the code. It makes the code more readable by breaking it.

Use geom_point() for the geometric object.

Output:

Scatter plot with groups

Sometimes, it can be interesting to distinguish the values by a group of data (i.e. factor level data).

ggplot(mtcars, aes(x = mpg, y = drat)) + geom_point(aes(color = factor(gear)))

Code Explanation

The aes() inside the geom_point() controls the color of the group. The group should be a factor variable. Thus, you convert the variable gear in a factor.

Altogether, you have the code aes(color = factor(gear)) that change the color of the dots.

Output:

Change axis

Rescale the data is a big part of the data scientist job. In rare occasion data comes in a nice bell shape. One solution to make your data less sensitive to outliers is to rescale them.

ggplot(mtcars, aes(x = log(mpg), y = log(drat))) + geom_point(aes(color = factor(gear)))

Code Explanation

You transform the x and y variables in log() directly inside the aes() mapping.

Note that any other transformation can be applied such as standardization or normalization.

Output:

Scatter plot with fitted values

You can add another level of information to the graph. You can plot the fitted value of a linear regression.

my_graph <- ggplot(mtcars, aes(x = log(mpg), y = log(drat))) + geom_point(aes(color = factor(gear))) + stat_smooth(method = "lm", col = "#C42126", se = FALSE, size = 1) my_graph

Code Explanation

graph: You store your graph into the variable graph. It is helpful for further use or avoid too complex line of codes

The argument stat_smooth() controls for the smoothing method

method = “lm”: Linear regression

col = “#C42126”: Code for the red color of the line

se = FALSE: Don’t display the standard error

size = 1: the size of the line is 1

Output:

Note that other smoothing methods are available

glm

gam

loess: default value

rim

So far, we haven’t added information in the graphs. Graphs need to be informative. The reader should see the story behind the data analysis just by looking at the graph without referring additional documentation. Hence, graphs need good labels. You can add labels with labs()function.

The basic syntax for lab() is :

lab(title = "Hello Guru99") argument: - title: Control the title. It is possible to change or add title with: - subtitle: Add subtitle below title - caption: Add caption below the graph - x: rename x-axis - y: rename y-axis Example:lab(title = "Hello Guru99", subtitle = "My first plot") Add a title

One mandatory information to add is obviously a title.

my_graph + labs( title = "Plot Mile per hours and drat, in log" )

Code Explanation

my_graph: You use the graph you stored. It avoids rewriting all the codes each time you add new information to the graph.

You wrap the title inside the lab().

Code for the red color of the line

se = FALSE: Don’t display the standard error

size = 1: the size of the line is 1

Output:

Add a title with a dynamic name

A dynamic title is helpful to add more precise information in the title.

You can use the paste() function to print static text and dynamic text. The basic syntax of paste() is:

paste("This is a text", A) arguments - " ": Text inside the quotation marks are the static text - A: Display the variable stored in A - Note you can add as much static text and variable as you want. You need to separate them with a comma

Example:

A <-2010 paste("The first year is", A)

Output:

## [1] "The first year is 2010" B <-2023 paste("The first year is", A, "and the last year is", B)

Output:

## [1] "The first year is 2010 and the last year is 2023"

You can add a dynamic name to our graph, namely the average of mpg.

mean_mpg <- mean(mtcars\$mpg) my_graph + labs( title = paste("Plot Mile per hours and drat, in log. Average mpg is", mean_mpg) )

Code Explanation

You create the average of mpg with mean(mtcars\$mpg) stored in mean_mpg variable

You use the paste() with mean_mpg to create a dynamic title returning the mean value of mpg

Output:

Two additional detail can make your graph more explicit. You are talking about the subtitle and the caption. The subtitle goes right below the title. The caption can inform about who did the computation and the source of the data.

my_graph + labs( title = "Relation between Mile per hours and drat", subtitle = "Relationship break down by gear class", caption = "Authors own computation" )

Code Explanation

title = “Relation between Mile per hours and drat”: Add title

subtitle = “Relationship break down by gear class”: Add subtitle

caption = “Authors own computation: Add caption

You separate each new information with a comma, ,

Note that you break the lines of code. It is not compulsory, and it only helps to read the code more easily

Output:

Rename x-axis and y-axis

Variables itself in the dataset might not always be explicit or by convention use the _ when there are multiple words (i.e. GDP_CAP). You don’t want such name appear in your graph. It is important to change the name or add more details, like the units.

my_graph + labs( x = "Drat definition", y = "Mile per hours", color = "Gear", title = "Relation between Mile per hours and drat", subtitle = "Relationship break down by gear class", caption = "Authors own computation" )

Code Explanation

x = “Drat definition”: Change the name of x-axis

y = “Mile per hours”: Change the name of y-axis

Output:

Control the scales

You can control the scale of the axis.

The function seq() is convenient when you need to create a sequence of number. The basic syntax is:

seq(begin, last, by = x) arguments: - begin: First number of the sequence - last: Last number of the sequence - by= x: The step. For instance, if x is 2, the code adds 2 to `begin-1` until it reaches `last`

For instance, if you want to create a range from 0 to 12 with a step of 3, you will have four numbers, 0 4 8 12

seq(0, 12,4)

Output:

## [1] 0 4 8 12

You can control the scale of the x-axis and y-axis as below

my_graph + scale_x_continuous(breaks = seq(1, 3.6, by = 0.2)) + scale_y_continuous(breaks = seq(1, 1.6, by = 0.1)) + labs( x = "Drat definition", y = "Mile per hours", color = "Gear", title = "Relation between Mile per hours and drat", subtitle = "Relationship break down by gear class", caption = "Authors own computation" )

Code Explanation

The function scale_y_continuous() controls the y-axis

The function scale_x_continuous() controls the x-axis.

The parameter breaks controls the split of the axis. You can manually add the sequence of number or use the seq()function:

seq(1, 3.6, by = 0.2): Create six numbers from 2.4 to 3.4 with a step of 3

seq(1, 1.6, by = 0.1): Create seven numbers from 1 to 1.6 with a step of 1

Output:

Theme

Finally, R allows us to customize out plot with different themes. The library ggplot2 includes eights themes:

theme_bw()

theme_light()

theme_classis()

theme_linedraw()

theme_dark()

theme_minimal()

theme_gray()

theme_void()

my_graph + theme_dark() + labs( x = "Drat definition, in log", y = "Mile per hours, in log", color = "Gear", title = "Relation between Mile per hours and drat", subtitle = "Relationship break down by gear class", caption = "Authors own computation" )

Output:

Save Plots

After all these steps, it is time to save and share your graph. You add ggsave(‘NAME OF THE FILE) right after you plot the graph and it will be stored on the hard drive.

The graph is saved in the working directory. To check the working directory, you can run this code:

directory <-getwd() directory

Let’s plot your fantastic graph, saves it and check the location

my_graph + theme_dark() + labs( x = "Drat definition, in log", y = "Mile per hours, in log", color = "Gear", title = "Relation between Mile per hours and drat", subtitle = "Relationship break down by gear class", caption = "Authors own computation" )

Output:

ggsave("my_fantastic_plot.png")

Output:

## Saving 5 x 4 in image

Note: For pedagogical purpose only, we created a function called open_folder() to open the directory folder for you. You just need to run the code below and see where the picture is stored. You should see a file names my_fantastic_plot.png.

# Run this code to create the function open_folder <- function(dir) { if (.Platform['OS.type'] == "windows") { shell.exec(dir) } else { system(paste(Sys.getenv("R_BROWSER"), dir)) } } # Call the function to open the folder open_folder(directory) Summary

You can summarize the arguments to create a scatter plot in the table below:

Objective Code

Basic scatter plot

ggplot(df, aes(x = x1, y = y)) + geom_point()

Scatter plot with color group

ggplot(df, aes(x = x1, y = y)) + geom_point(aes(color = factor(x1)) + stat_smooth(method = "lm")

ggplot(df, aes(x = x1, y = y)) + geom_point(aes(color = factor(x1))

ggplot(df, aes(x = x1, y = y)) + geom_point() + labs(title = paste("Hello Guru99"))

ggplot(df, aes(x = x1, y = y)) + geom_point() + labs(subtitle = paste("Hello Guru99"))

Rename x

ggplot(df, aes(x = x1, y = y)) + geom_point() + labs(x = "X1")

Rename y

ggplot(df, aes(x = x1, y = y)) + geom_point() + labs(y = "y1")

Control the scale

ggplot(df, aes(x = x1, y = y)) + geom_point() + scale_y_continuous(breaks = seq(10, 35, by = 10)) + scale_x_continuous(breaks = seq(2, 5, by = 1)

Create logs

ggplot(df, aes(x =log(x1), y = log(y))) + geom_point()

Theme

ggplot(df, aes(x = x1, y = y)) + geom_point() + theme_classic()

Save

ggsave("my_fantastic_plot.png")

You're reading Scatter Plot In R Using Ggplot2 (With Example)

Update the detailed information about Scatter Plot In R Using Ggplot2 (With Example) on the Phuhoabeautyspa.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!