Datastories Hub: Quick Guide - Glossary of R commands

Ralf Martin

`abline()`

With a function like ggplot() you have an endless range of possilities for add-ons. The abline command adds a straight line to a scatterplot.

`as.Date()`

Sometimes when you upload data for the first time into R and you have a date variable, R will not understand that it is a date. To plot etc., you will need to convert it to a date variable.

E.g.

mydates <- as.Date(c(“2007-06-22”, “2004-02-13”))

`c()`

This is a function to tie together (or ‘concatenate’) objects, e.g. x = c(3,5,8,9) or y = c(”Jack”,”Queen”,”King”).

`cor()`

Produces a correlation matrix.

`boxplot()`

Produces boxplots with whiskers - which are sometimes neat for showing the structure of a variable. The syntax is: boxplot(x,main=“title”)

`factor()`

Converts a numeric or string variable into a categorical variable. This is useful - for instance - to create sets of dummy varaiables based on a categorical variable.

`for()`

Repeating a series of commands several times

Example


  for(ii in 1:10){
     cat("Round")
     cat(ii)
     cat("\n")
  }

More Information

`getwd()`

Prints current working directory.

`ggplot()`

General purpose plotting command.

`group_by()`

Defines groupings within a dataframe based on one or several categorical variables. Useful to compute statistics at the level of these groupw

`head()`

Shows the first couple of lines in a dataframe

rdf=data.frame(v1=runif(100),v2=runif(100))
head(rdf)

          v1         v2
1 0.02305153 0.12122816
2 0.40087975 0.24056851
3 0.27557525 0.52671220
4 0.27287104 0.98594268
5 0.53335167 0.38574363
6 0.27718645 0.01763647

`help()`

This command will fetch a help sheet on the function specified in the brackets, which includes information about usage and syntax.

`hist()`

Produces a histogram - which can be useful when you’re learning more about your variables and want to assess, for instance, if you should log them.

`inner_join()`

Combine 2 dataframe on the basis of common key variable. Inner join only keeps observations with information in both dataframes. See also left_join(), full_join(), etc. More information

`install.packages()`

Will install a programming package. The name of the package in the backets must be in quotation marks.

`ivreg()`

Instrumental variable 2 stage least squares regression

`library()`

Loads an extension package into memory.

Example

library(car)

Loads the car library that allows you to run the linearHypothesis() command (and more).

More Information

`linearHypothesis()`

Generic function for testing a linear hypothesis. The car package needs to be installed and loaded for it to work.

`lm()`

Implements the a linear regression model using the least squares algorithm

Example


    library(AER)
    library(dplyr)
    data("Affairs")  # Loads `Affairs` dataframe into memory (part of AER library)
    
    reg=lm(affairs~age+gender,data=Affairs)
    
    reg %>% summary()

`log()`

You will sometimes need to transform your variables into natural logs (as explained in the lectures). The log() fundction transforms a variable into a natural logarithm (i.e. base e).

`prop.table()`

Will generate a table of proportions. prop.table(table(data\(var1,data\)var2)) divides each cell by the total of all cells, while the command prop.table(table(data\(var1,data\)var2),1) divides each cell by the total of its row and prop.table(table(data\(var1,data\)var2),2) by the total of its column.

`read.csv()`

Reads csv files (comma separated value; i.e. a basic table format) from your harddrive or the web into an R dataframe.

Example

df=read.csv("https://www.dropbox.com/s/a2opu10e2hz0dps/brexit.csv?dl=1")

Loads the brexit.csv dataset

`read_excel()`

This command will load an excel spreadsheet. Needs to be preceded by library(readxl). The data file name inside the brackets must be in quotation marks.

`seq()`

Create a sequence of numbers; e.g.

seq(0,20,2)

 [1]  0  2  4  6  8 10 12 14 16 18 20

`setwd()`

Will set working directory. E.g. setwd(“c:/folder/folder2”)

`stargazer()`

Possibly the most amazing function in R. Will let you produce very nice looking descriptive statistics tables and regression tables.

`str()`

Will display the structure of an R object.

`subset()`

The subset( ) function is the easiest way to select variables and observations.

E.g. In the following example, we select all rows that have a value of age greater than or equal to 20 or age less then 10. We keep the ID and Weight columns.

newdata <- subset(mydata, age >= 20 | age < 10, select=c(ID, Weight))

`summarise()`

Compute summary statistics

`summary()`

General command to provide summary information about an object.

`table()`

This command will generate a contingency table. The first variable will show the levels of the categorical variable as rows and the second variable will display the levels of the categorical variable as colums.

E.g. table(data\(call,data\)black) will return a 2x2 table where callback frequency is given by the rows (1=callback) and race frequency is given by the columns (1=black).

Quick Guide - Glossary of R commands

abline()

as.Date()

c()

cor()

boxplot()

factor()

for()

Example

getwd()

ggplot()

group_by()

head()

help()

hist()

inner_join()

install.packages()

ivreg()

library()

Example

linearHypothesis()

lm()

Example

log()

prop.table()

read.csv()

Example

read_excel()

seq()

setwd()

stargazer()

str()

subset()

summarise()

summary()

table()

Citation

`abline()`

`as.Date()`

`c()`

`cor()`

`boxplot()`

`factor()`

`for()`

`getwd()`

`ggplot()`

`group_by()`

`head()`

`help()`

`hist()`

`inner_join()`

`install.packages()`

`ivreg()`

`library()`

`linearHypothesis()`

`lm()`

`log()`

`prop.table()`

`read.csv()`

`read_excel()`

`seq()`

`setwd()`

`stargazer()`

`str()`

`subset()`

`summarise()`

`summary()`

`table()`