Quick Guide - Glossary of R commands

Quickguides
Ralf Martin
09-28-2021

abline()

With a function like ggplot() you have an endless range of possilities for add-ons. The abline command adds a straight line to a scatterplot.

as.Date()

Sometimes when you upload data for the first time into R and you have a date variable, R will not understand that it is a date. To plot etc., you will need to convert it to a date variable.

E.g.

mydates <- as.Date(c(“2007-06-22”, “2004-02-13”))

c()

This is a function to tie together (or ‘concatenate’) objects, e.g. x = c(3,5,8,9) or y = c(”Jack”,”Queen”,”King”).

cor()

Produces a correlation matrix.

boxplot()

Produces boxplots with whiskers - which are sometimes neat for showing the structure of a variable. The syntax is: boxplot(x,main=“title”)

factor()

Converts a numeric or string variable into a categorical variable. This is useful - for instance - to create sets of dummy varaiables based on a categorical variable.

for()

Repeating a series of commands several times

Example

for(ii in 1:10){ cat("Round") cat(ii) cat("\n") }
More Information

getwd()

Prints current working directory.

ggplot()

General purpose plotting command.

group_by()

Defines groupings within a dataframe based on one or several categorical variables. Useful to compute statistics at the level of these groupw

Shows the first couple of lines in a dataframe

rdf=data.frame(v1=runif(100),v2=runif(100))
head(rdf)
          v1         v2
1 0.02305153 0.12122816
2 0.40087975 0.24056851
3 0.27557525 0.52671220
4 0.27287104 0.98594268
5 0.53335167 0.38574363
6 0.27718645 0.01763647

help()

This command will fetch a help sheet on the function specified in the brackets, which includes information about usage and syntax.

hist()

Produces a histogram - which can be useful when you’re learning more about your variables and want to assess, for instance, if you should log them.

inner_join()

Combine 2 dataframe on the basis of common key variable. Inner join only keeps observations with information in both dataframes. See also left_join(), full_join(), etc. More information

install.packages()

Will install a programming package. The name of the package in the backets must be in quotation marks.

ivreg()

Instrumental variable 2 stage least squares regression

library()

Loads an extension package into memory.

Example

Loads the car library that allows you to run the linearHypothesis() command (and more).

More Information

linearHypothesis()

Generic function for testing a linear hypothesis. The car package needs to be installed and loaded for it to work.

lm()

Implements the a linear regression model using the least squares algorithm

Example

library(AER) library(dplyr) data("Affairs") # Loads `Affairs` dataframe into memory (part of AER library) reg=lm(affairs~age+gender,data=Affairs) reg %>% summary()

log()

You will sometimes need to transform your variables into natural logs (as explained in the lectures). The log() fundction transforms a variable into a natural logarithm (i.e. base e).

prop.table()

Will generate a table of proportions. prop.table(table(data\(var1,data\)var2)) divides each cell by the total of all cells, while the command prop.table(table(data\(var1,data\)var2),1) divides each cell by the total of its row and prop.table(table(data\(var1,data\)var2),2) by the total of its column.

read.csv()

Reads csv files (comma separated value; i.e. a basic table format) from your harddrive or the web into an R dataframe.

Example

df=read.csv("https://www.dropbox.com/s/a2opu10e2hz0dps/brexit.csv?dl=1")

Loads the brexit.csv dataset

read_excel()

This command will load an excel spreadsheet. Needs to be preceded by library(readxl). The data file name inside the brackets must be in quotation marks.

seq()

Create a sequence of numbers; e.g.

seq(0,20,2)
 [1]  0  2  4  6  8 10 12 14 16 18 20

setwd()

Will set working directory. E.g. setwd(“c:/folder/folder2”)

stargazer()

Possibly the most amazing function in R. Will let you produce very nice looking descriptive statistics tables and regression tables.

str()

Will display the structure of an R object.

subset()

The subset( ) function is the easiest way to select variables and observations.

E.g. In the following example, we select all rows that have a value of age greater than or equal to 20 or age less then 10. We keep the ID and Weight columns.

newdata <- subset(mydata, age >= 20 | age < 10, select=c(ID, Weight))

summarise()

Compute summary statistics

summary()

General command to provide summary information about an object.

table()

This command will generate a contingency table. The first variable will show the levels of the categorical variable as rows and the second variable will display the levels of the categorical variable as colums.

E.g. table(data\(call,data\)black) will return a 2x2 table where callback frequency is given by the rows (1=callback) and race frequency is given by the columns (1=black).

Citation

For attribution, please cite this work as

Martin (2021, Sept. 28). Datastories Hub: Quick Guide - Glossary of R commands. Retrieved from https://mondpanther.github.io/datastorieshub/posts/quickguides/quickguide_Rcommands/

BibTeX citation

@misc{martin2021quick,
  author = {Martin, Ralf},
  title = {Datastories Hub: Quick Guide - Glossary of R commands},
  url = {https://mondpanther.github.io/datastorieshub/posts/quickguides/quickguide_Rcommands/},
  year = {2021}
}