by Scott McCoy

15 chapters, 576 pages, 235 illustrations

Published December 2022

ISBN 978-1-943873-03-6

15 chapters, 576 pages, 235 illustrations

Published December 2022

ISBN 978-1-943873-03-6

List price: $59.50

Teaching professional data analysis skills has never been easier! *Murach’s R for Data Analysis* covers everything your students need to hit the ground running with R and RStudio, even if they’ve never programmed before. Then, it presents a thorough course in data analysis. And it includes three real-world case studies that tie all the coursework together.

The Canvas course file contains all the objectives, quizzes, assignments, and slides that you need to run an effective course. It only takes a few clicks to import it into the Canvas LMS. Then, you can customize it for your course. Learn more.

I really appreciated the case studies. They were a big help for my students as they illustrated all phases of data analysis and visualization.”

- About this Book
- Table of Contents
- Courseware
- FAQs
- Corrections

As we see it, this is the best primary text for any course in which the focus is on using R for data analysis. But it is also the ideal supplementary text for a general course on data analysis because it shows how to use R to apply the concepts and statistical methods to real-world data sets.

Like all our books, this book is designed to make it as easy as possible for your students to learn new skills faster and retain them better. Here are a few of those features:

- All of the information is presented in paired pages, with the essential syntax, guidelines, and examples on the right page and clear explanations on the left page. This helps your students learn faster by reading less.
- The paired-pages format is ideal for reference when your students need to refresh their memories about how to do something.
- The three analyses presented in section 3 use real-world data sets.
- The hundreds of short examples present usable code for tasks that your students are likely to need for their own analyses.
- The exercises at the end of each chapter provide a way for your students to gain valuable hands-on experience without any extra busywork.

To present the essential R and data analysis skills in a manageable progression and at the right pace, this book is divided into 5 sections.

This section gets your students off to a fast start. First, they’ll learn how to use RStudio, a popular program for coding in R. Then, they’ll learn the parts of the R language that they’ll need to analyze data. Next, they’ll learn how to use R with the tidyverse package to create their first analysis.

Most analysis is descriptive analysis, in which you analyze data to better understand it. That’s why section 2 of this book presents the critical descriptive analysis skills that your students need. That includes how to:

- Create powerful visualizations that can guide an analysis
- Get data from CSV files, Excel files, JSON files, and databases
- Clean data by dropping unneeded rows and columns, by using the correct data types, and by finding and fixing missing values and outliers
- Prepare data by adding columns, modifying the data in columns, applying functions and lambda expressions, grouping and aggregating data, and more
- Enhance data visualizations to make them ready for professional presentation

This section presents three complete analyses that show your students how the skills presented in the first two sections can be applied to real-world data sets:

- Polling data for the 2016 US presidential election
- Wildfire data from the US Forest Service
- Basketball shot data from the NBA (National Basketball Association)

These in-depth analyses make sure that your students master the professional skills they’re going to need.

Predictive analysis uses statistical models to predict unknown or future values. Although predictive analysis is a large topic that could be an entire course of its own, this section presents the concepts your students need to get started with it. More specifically, it shows your students how to use linear regression models to predict continuous numeric values and how to use classification models to predict categorical values.

Section 5 shows how to present an analysis. To do that, your students can use R Markdown to convert an analysis into an HTML document, PDF file, or PowerPoint slideshow. This is an important skill because the value of an analysis comes from being able to present the insights gained from it to a target audience.

The only prerequisite for this book is basic computer literacy. That’s because chapters 1 and 2 present the parts of the R language that your students need to start using R for data analysis. However, it’s helpful for your students to have some background in statistics.

To analyze data with R as shown in this book, your students just need to download and install the RStudio program and the R language. Both are available for free. Then, they can install some R packages for data analysis that are also freely available. For information about how to do this, they can consult appendix A for Windows or appendix B for macOS.

“I really appreciated the four case studies. They were a big help for my students as they illustrated all phases of data analysis and visualization.”

— J. Jasperson – Texas A&M University

“In his first at-bat, Scott McCoy smashes this one out of the park! This book is not just informative, it is exciting.”

— Scott Spurlock, Software Engineer, Georgia

“Unlike some other books on data analysis with Python, the explanations of how to perform data analysis are thorough rather than terse or with no explanations.”

— Posted at an online bookseller

“I really like the paired-pages format of detailed information on the left and quick notes on the right. This helps me to quickly find the information I’m looking for.”

— Roxanne T., Student, Washington

“Another awesome book from Murach. Their format makes learning new material easier, and their code examples WORK.”

— Posted at an online bookseller

“I can’t praise this book highly enough. The clarity used in picking what to include, when to introduce it, and how to do so is remarkable.”

— Charles Ferguson, Software Developer, Australia

“This book is very well-organized and easy to follow. It covers the perfect amount of description, and it does not make you bored by providing unnecessary details.”

— Posted at an online bookseller

“You folks make the hard stuff seem easy.”

— Thomas Finn, Sr. Software Developer, Illinois

“This is my first exposure to Murach’s books, and I love them. I like the organization of the content, the consistent approach in each book, and the accuracy of the material.”

— Bob L., Michigan

“Another thing I like is the exercises at the end of each chapter. They’re a great way to reinforce the main points of each chapter and force you to get your hands dirty.”

— Hien Luu, SD Forum/Java SIG

“Your book was indispensable to me. The answers were right there at every turn. All the examples made sense, and they all worked!”

— Alan Vogt, ETL Consultant, Massachusetts

“Your books shine out from the rest—the quality of writing and presentation of information is topnotch, and the consistency of quality across books is impressive.”

— Nolan Tamashiro, Developer

View the table of contents for this book in a PDF: Table of Contents (PDF)

*Click on any chapter title to display or hide its content.*

What data analysis is

The five phases of data analysis

Introduction to RStudio

How to run code in the Console pane

How to run code in the Source pane

How to view variables in the Environment pane

How to create variables

How to work with variables

How to code arithmetic expressions

How to use arithmetic expressions in statements

How to interpret error messages

How to call functions

How to use functions to work with strings

How to use functions to work with numbers

How to work with vectors

How to work with data frames

How to work with lists

How to add values to data structures

How to use the relational operators

How to use the logical operators

How to code if statements

How to code nested if statements

How to code for loops

How to define functions

The child mortality data

How to set the working directory

How to work with packages

How to read the data into a tibble

How to select the top and bottom rows

How to view summary statistics

How to melt the data

How to add, modify, and rename columns

How to save a tibble as an RDS file

How to calculate summary columns

How to create a line plot

How to use the datasets package

How to get the irises data

How to get the chicks data

How to select rows based on a condition

How to create a base plot

Functions for common plot types

How to create a line plot

How to create a scatter plot

How to create a bar plot

How to create a box plot

How to create a histogram

How to create a KDE plot

How to create an ECDF plot

How to create a 2D KDE plot

How to combine plots

How to create a grid of plots

How to view documentation

How to save a plot

How to find the data you want

How to read data from CSV and Excel files

How to download data

How to work with a zip file

How to connect to a database

How to list the tables in a database

How to list the columns of a table

How to code a query

How to use a query to read data

How to read a JSON file into a list

How to get the index for a list

How to get data from a list

How to build a tibble from the data in a list

A general plan for cleaning data

How to display column names and data types

How to examine the unique values for a single column

How to display the unique values for all columns

How to count the unique values for all columns

How to display the value counts

How to sort the data

How to filter and drop rows

How to drop columns

How to rename columns

How to find missing values

How to fix missing values

How to select columns by data type

How to convert strings to numbers

How to convert strings to dates and times

How to work with the factor type

How to assess outliers

How to calculate quartiles and quantiles

How to calculate the fences for the box plot

How to fix the outliers

How to work with date columns

How to use stringr to work with strings

How to work with string and numeric columns

How to use statistical functions

How to summarize data

How to group and summarize data

Another way to group and summarize data

How to rank rows

How to add a cumulative sum

How to bin data

How to define functions that operate on rows

How to define functions that operate on columns

How to use lambda expressions instead of functions

How to add columns by joining tibbles

How to add rows

Get the data

More skills for working with scatter plots

More skills for working with bar plots

How to add an error bar to a bar plot

More skills for working with line plots

How to create a smooth line plot

How to add labels to plots

How to plot shapes

How to plot a baseball field

How to return plot components from a function

How to plot hits on a baseball field

How to plot maps

How to add data to a map

How to zoom in on part of a plot

How to adjust the limits of a plot

How to work with the plot title and axes labels

How to change the position of the legend

How to edit the legend

How to hide the text and ticks for each axis

How to set the colors for the plot

How to change the theme of the plot

How to create a pairwise grid of scatter plots

How to use other plot types in the grid

Load the packages

Get the data

Examine the data

Select and rename the columns

Sort the rows

Select the rows

Improve some columns

Add columns

Pivot the data

Plot the national polls

Plot the polls for swing states

Analyze the polls by voter type

Plot the gap for the last week of the election

Plot the weekly gap over time

Load the packages for this analysis

Unzip the database file

Read the data from the database

Improve column names and data types

Drop duplicate rows

Select rows for large fires

Examine NA values

Add, modify, and select columns

Sort the rows

Plot the largest fire per year in California

Plot the mean and median acres burned in California

Plot the fires per month in California

Plot the total acres burned for the top 10 states

Plot the acres burned per year for the top 4 states

Plot the 20 largest fires in California

Plot all fires in California larger than 500 acres

Plot all fires in the U.S. larger than 100,000 acres

Load the packages

Read the data

Build the tibble

Examine the unique values

Select and rename the columns

Improve the data types for two columns

Add a Season column

Add a Points column

Add some summary columns

Plot shots made per game by season

Plot shots attempted vs. made per game

Plot shots made per game for all seasons

Plot shot statistics by season

Plot shooting percentages per season

Plot shot locations for two games

Define a function for drawing the court

Plot shot locations for two games on a court

Plot shots by zone for one season

Plot shot count by zone

Plot shooting percentage by zone

Plot shot density

Compare shot locations and density for two seasons

Types of predictive models

Introduction to regression analysis

How to get the data

How to examine and clean the data

How to interpret correlation coefficients

How to identify correlations with r-values

How to identify correlations visually

A procedure for working with a regression model

How to split the data

How to drop outliers from the training data set

How to create a model

How to use a model to make predictions

How to plot an equation

How to plot an equation on a scatter plot

How to code formulas

How to plot a formula on a scatter plot

How to create a model for a curved line

How to create and fit the model

How to judge the model by its R^{2} value

How to judge the model by its residuals

More formula operators

How to create and fit the model

How to view the model’s terms

How to remove insignificant terms

How to plot regression coefficients

Five common nonlinear patterns

How to transform variables

How to create, fit, and judge the model

How to examine ordinal variables

How to create, fit, and judge the model

Introduction to classification analysis

How to get the data for this chapter

How to visually investigate the data

How to create a decision tree

How to plot a decision tree

How to judge a model with a confusion matrix

How to use variable importance to select variables

How to adjust the hyperparameters

How to compare decision trees

How to cross validate a model

How to tune hyperparameters with a grid search

How to create an R Markdown file

How to render an R Markdown file

How to code the YAML header

How to add headings and paragraphs

How to add chunks of code

How to run chunks of code

How to format text

How to create dynamic documents

How to specify multiple output formats

The HTML document displayed in a browser

The PDF and Word documents for the same markdown

The R Markdown

How to start a presentation

The first two slides of a presentation

How to install R

How to install RStudio

How to install the files for this book

How to install the packages for this book

How to install R

How to install RStudio

How to install the files for this book

How to install the packages for this book

To learn about the supporting courseware that we provide for our books, please visit About our Courseware.

- Summary bullets
- Step-by-step exercises

This download includes files for:

- The R scripts for all examples and analyses presented in this book
- The starting points for the exercises presented at the end of each chapter
- The solutions to those exercises
- The data for all of the examples, analyses, and exercises

Appendixes A and B show how to install these files on Windows and macOS.

- Instructional objectives by chapter
- Test banks in multiple formats
- Student projects that can be assigned at the end of certain chapters
- Case studies that can be assigned at the end of each section or as a final assignment
- Accessible PowerPoint slides

For a more detailed description of the courseware for this book, please read the Instructor’s Summary.

On this page, we’ll be posting answers to the questions that come up most often about this book. So if you have any questions that you haven’t found answered here at our site, please email us. Thanks!

To view the corrections for this book in a PDF, just click on this link: View the corrections

Then, if you find any other errors, please email us so we can correct them in the next printing of the book. Thank you!

This is our site for college instructors. To buy Murach books, please visit our retail site.