- University of Amsterdam
- 1-3 hours a week
- 7 weeks
- Paid Certificate Available
This course covers commonly used statistical inference methods for numerical and categorical data. You will learn how to set up and perform hypothesis tests, interpret p-values, and report the results of your analysis in a way that is interpretable for clients or the public. Using numerous data examples, you will learn to report estimates of quantities in a way that expresses the uncertainty of the quantity of interest. You will be guided through installing and using R and RStudio (free statistical software), and will use this software for lab exercises and a final project. The course introduces practical tools for performing data analysis and explores the fundamental concepts necessary to interpret and report results for both categorical and numerical data
Before we get started...
Comparing two groups
In this second module of week 1 we dive right in with a quick refresher on statistical hypothesis testing. Since we're assuming you just completed the course Basic Statistics, our treatment is a little more abstract and we go really fast! We provide the relevant Basic Statistics videos in case you need a gentler introduction. After the refresher we discuss methods to compare two groups on a categorical or quantitative dependent variable. We use different test for independent and dependent groups.
In this module we tackle categorical association. We'll mainly discuss the Chi-squared test that allows us to decide whether two categorical variables are related in the population. If two categorical variables are unrelated you would expect that categories of these variables don't 'go together'. You would expect the number of cases in each category of one variable to be proportionally similar at each level of the other variable. The Chi-squared test helps us to compare the actual number of cases for each combination of categories (the joint frequencies) to the expected number of cases if the variables are unrelated.
In this module we’ll see how to describe the association between two quantitative variables using simple (linear) regression analysis. Regression analysis allows us to model the relation between two quantitative variables and - based on our sample -decide whether a 'real' relation exists in the population. Regression analysis is more useful than just calculating a correlation coefficient, since it allows us assess how well our regression line fits the data, it helps us to identify outliers and to predict scores on the dependent variable for new cases.
In this module we’ll see how we can use more than one predictor to describe or predict a quantitative outcome variable. In the social sciences relations between psychological and social variables are generally not very strong, since outcomes are generally influences by complex processes involving many variables. So it really helps to be able to describe an outcome variable with several predictors, not just to increase the fit of the model, but also to assess the individual contribution of each predictor, while controlling for the others.
Analysis of variance
In this module we'll discuss analysis of variance, a very popular technique that allows us to compare more than two groups on a quantitative dependent variable. The reason we call it analysis of variance is because we compare two estimates of the variance in the population. If the group means differ in the population then these variance estimates differ. Just like in multiple regression, factorial analysis of variance allows us to investigate the influence of several independent variables.
In this module we'll discuss the last topic of this course: Non-parametric tests. Until now we've mostly considered tests that require assumptions about the shape of the distribution (z-tests, t-tests and F-tests). Sometimes those assumptions don't hold. Non-parametric tests require fewer of those assumptions. There are several non-parametric tests that correspond to the parametric z-, t- and F-tests. These tests also come in handy when the response variable is an ordered categorical variable as opposed to a quantitative variable. There are also non-parametric equivalents to the correlation coefficient and some tests that have no parametric-counterparts.
In this final module there's no new material to study. We advise you to take some extra time to review the material from the previous modules and to practice for the final exam. We've provided a practice exam that you can take as many times as you like. The final exam is structured exactly like the practice exam, so you know what to expect. Please note that you can only take the final exam twice every seven days, so make sure you are fully prepared. Please follow the honor code and do not communicate or confer with others while taking this exam or after. In the open questions of the exam (i.e. those that are not multiple choice) you should report your answers to 3 decimal places, and use 5 decimal places in your calculations. Good luck!
Emiel van Loon and Annemarie Zand Scholten
- Go to Course: