Welcome to Quantitative Methods.
In this course we give you an introduction into common concepts and tools for data analysis in economics and business. Emphasis will be on practical data skills as well as the ability to formulate a question appropriately, provide data analysis relevant to the question and interpret the results correctly. As part of the course you will be asked to provide a small piece of data analysis.
To get your participation mark please participate in the Datathon Forum with either questions or answers or by sharing relevant stuff (e.g. an interesting dataset or article). You find the forum here. You should already be enrolled to participate. But if not let me know. Please only participate with your official Imperial Email address.
We take a first look at the R software package and programming language, which we will be using throughout this module. For those who have not yet been programming this will be a steep learning curve. However, it should be worth it.
You will not be able to analyse data effectively without being familiar with basic programming concepts. What’s more, to grasp many aspects of today’s world an understanding of programming is key. Even if you don’t envisage a career as master coder or data analyst, trying these things for yourself will be useful as you will very likely have to manage coders and analysts or rely on their work at some point in your career.
Some of the best ways to understand data are graphs and visualisations. R has powerful tools for that purpose.
We will be deepening our ability to understand the R programming language by looking at some of its graphics commands. We will also discuss some of the pitfalls as well as deliberate manipulations that people engage in when producing data visualisations.
We have discussed how to estimate the parameters of a simple regression model. In this topic we will discuss how work out the reliability of such estimates. We will examine what determines the distribution of the estimates and how we can use hypothesis testing to explore the data and estimates.
we look at multivariate regressions; i.e. regressions where the dependent variable depends on several – not just one – explanatory variables.
This might be a course about quantitative analysis, but that doesn’t mean we can’t handle qualitative issues as well. To capture qualitative aspects – e.g. an individual in our dataset having a job, the gender of a person, the location of the headquarters of a company, etc. - we can use so called dummy or binary variables; i.e. we simply set variable equal to 1 if the qualitative aspect is true and to 0 if not. We will be discussing what this means for regression analysis. Moreover, we will be looking at some nonlinear regression models.
Suppose you want to know the causal effect of a variable X on a variable Y but there are concerns that X might be endogenous because of omitted variable bias for instance. Instrumental variables are variables that your are not necessarily interested in for their own sake, but that can help you identify the causal effect that you are interested in despite the endogeneity.
You might think studying for this course is so hard that it could only be done by a machine. After this topic, you'll hopefully think again.
You will get a brief introduction into Machine Learning. Compared to the rest of the course, Machine Learning is essentially econometrics where we are more concerned about prediction rather than the identification of causal relationships.
To do any data analysis we need different datapoints. Most data we looked at so far was cross sectional data; i.e. datapoints derived from several data units (e.g. individuals, countries, cities, firms). Alternatively, we might have data for just one data unit but over many different periods of time (e.g. years, quarters, days or even sometimes seconds). This creates some specific issues, which we will discuss. Given that time itself causes a form of confounding and correlation from one datapoint to the next, we might get some very biased estimates if we are not careful.
Alternative download location
As part of this course you are asked to hand in a piece of group coursework. For this you are asked to provide some simple a short report on anything you like as long as it involves a discussion of a dataset and some of the methods we discussed in this class. I.e. think of a good question, some data and a strategy to say something towards the answer of the question using data.
You find a template with further instructions here (RMarkdown Version).
To prepare also look at Exercises 10
- [ back2country_set.dta ]( https://mondpanther.github.io/datastorieshub/data/back2country_set.dta )
- [ driving.csv ]( https://mondpanther.github.io/datastorieshub/data/driving.csv )
- [ ets_thres_final.csv ]( https://mondpanther.github.io/datastorieshub/data/ets_thres_final.csv )
- [ foreigners.csv ]( https://mondpanther.github.io/datastorieshub/data/foreigners.csv )
- [ guns.csv ]( https://mondpanther.github.io/datastorieshub/data/guns.csv )
- [ hals1prep.csv ]( https://mondpanther.github.io/datastorieshub/data/hals1prep.csv )
- [ house.csv ]( https://mondpanther.github.io/datastorieshub/data/house.csv )
- [ lfsclean.dta ]( https://mondpanther.github.io/datastorieshub/data/lfsclean.dta )
- [ maketable1.csv ]( https://mondpanther.github.io/datastorieshub/data/maketable1.csv )
- [ maketable2.csv ]( https://mondpanther.github.io/datastorieshub/data/maketable2.csv )
- [ maketable4.csv ]( https://mondpanther.github.io/datastorieshub/data/maketable4.csv )
- [ migrants.csv ]( https://mondpanther.github.io/datastorieshub/data/migrants.csv )
- [ NH.Ts+dSST.csv ]( https://mondpanther.github.io/datastorieshub/data/NH.Ts+dSST.csv )
- [ oj.csv ]( https://mondpanther.github.io/datastorieshub/data/oj.csv )
- [ populationclean.csv ]( https://mondpanther.github.io/datastorieshub/data/populationclean.csv )
- [ populationdata.csv ]( https://mondpanther.github.io/datastorieshub/data/populationdata.csv )
- [ prod.csv ]( https://mondpanther.github.io/datastorieshub/data/prod.csv )
- [ prod_balanced.csv ]( https://mondpanther.github.io/datastorieshub/data/prod_balanced.csv )
- [ proddata_clean.csv ]( https://mondpanther.github.io/datastorieshub/data/proddata_clean.csv )
- [ production2.csv ]( https://mondpanther.github.io/datastorieshub/data/production2.csv )
- [ statistic_id183497_population-in-the-states-of-the-us-2019.csv ]( https://mondpanther.github.io/datastorieshub/data/statistic_id183497_population-in-the-states-of-the-us-2019.csv )
- [ statistic_id183497_population-in-the-states-of-the-us-2019.xlsx ]( https://mondpanther.github.io/datastorieshub/data/statistic_id183497_population-in-the-states-of-the-us-2019.xlsx )
- [ TableWages.csv ]( https://mondpanther.github.io/datastorieshub/data/TableWages.csv )
- [ TeachingRatings.csv ]( https://mondpanther.github.io/datastorieshub/data/TeachingRatings.csv )
- [ UK Gender Pay Gap Data - 2017 to 2018.csv ]( https://mondpanther.github.io/datastorieshub/data/UK Gender Pay Gap Data - 2017 to 2018.csv )
- [ unempprep.csv ]( https://mondpanther.github.io/datastorieshub/data/unempprep.csv )
- [ unempprep.dta ]( https://mondpanther.github.io/datastorieshub/data/unempprep.dta )
- [ us-states.csv ]( https://mondpanther.github.io/datastorieshub/data/us-states.csv )