edX - Data Science: Wrangling | MoocLab - Connecting People to Online Learning

Platform: edX

Provider: Harvard University

Effort: 2-4 hours a week

Length: 4 weeks

Language: English

Credentials: Paid Certificate Available

Part of: Professional Certificate Program in Data Science

Course Link: https://www.edx.org/course/data-science-wrangling-harvardx-ph125-6x

Overview
In this course, part of our Professional Certificate Program in Data Science, we cover several standard steps of the data wrangling process like importing data into R, tidying data, string processing, HTML parsing, working with dates and times, and text mining. Rarely are all these wrangling steps necessary in a single analysis, but a data scientist will likely face them all at some point.

Very rarely is data easily accessible in a data science project. It's more likely for the data to be in a file, a database, or extracted from documents such as web pages, tweets, or PDFs. In these cases, the first step is to import the data into R and tidy the data, using the tidyverse package. The steps that convert data from its raw form to the tidy form is called data wrangling.

What You Will Learn

Importing data into R from different file formats
Web scraping
How to tidy data using the tidyverse to better facilitate analysis
String processing with regular expressions (regex)
Wrangling data using dplyr
How to work with dates and times as file formats
Text mining

Taught by
Rafael Irizarry

edX Data Science: Wrangling

Share this resource