abline()
With a function like ggplot() you have an endless range of possilities for add-ons. The abline command adds a straight line to a scatterplot.
as.Date()
Sometimes when you upload data for the first time into R and you have a date variable, R will not understand that it is a date. To plot etc., you will need to convert it to a date variable.
E.g.
mydates <- as.Date(c(“2007-06-22”, “2004-02-13”))
c()
This is a function to tie together (or ‘concatenate’) objects, e.g. x = c(3,5,8,9) or y = c(”Jack”,”Queen”,”King”).
cor()
Produces a correlation matrix.
boxplot()
Produces boxplots with whiskers - which are sometimes neat for showing the structure of a variable. The syntax is: boxplot(x,main=“title”)
factor()
Converts a numeric or string variable into a categorical variable. This is useful - for instance - to create sets of dummy varaiables based on a categorical variable.
for()
Repeating a series of commands several times
for(ii in 1:10){
cat("Round")
cat(ii)
cat("\n")
}
getwd()
Prints current working directory.
ggplot()
General purpose plotting command.
group_by()
Defines groupings within a dataframe based on one or several categorical variables. Useful to compute statistics at the level of these groupw
head()
Shows the first couple of lines in a dataframe
rdf=data.frame(v1=runif(100),v2=runif(100))
head(rdf)
v1 v2
1 0.02305153 0.12122816
2 0.40087975 0.24056851
3 0.27557525 0.52671220
4 0.27287104 0.98594268
5 0.53335167 0.38574363
6 0.27718645 0.01763647
help()
This command will fetch a help sheet on the function specified in the brackets, which includes information about usage and syntax.
hist()
Produces a histogram - which can be useful when you’re learning more about your variables and want to assess, for instance, if you should log them.
inner_join()
Combine 2 dataframe on the basis of common key variable. Inner join only keeps observations with information in both dataframes. See also left_join()
, full_join()
, etc. More information
install.packages()
Will install a programming package. The name of the package in the backets must be in quotation marks.
ivreg()
Instrumental variable 2 stage least squares regression
library()
Loads an extension package into memory.
Loads the car library that allows you to run the linearHypothesis()
command (and more).
linearHypothesis()
Generic function for testing a linear hypothesis. The car package needs to be installed and loaded for it to work.
lm()
Implements the a linear regression model using the least squares algorithm
library(AER)
library(dplyr)
data("Affairs") # Loads `Affairs` dataframe into memory (part of AER library)
reg=lm(affairs~age+gender,data=Affairs)
reg %>% summary()
log()
You will sometimes need to transform your variables into natural logs (as explained in the lectures). The log() fundction transforms a variable into a natural logarithm (i.e. base e).
prop.table()
Will generate a table of proportions. prop.table(table(data\(var1,data\)var2)) divides each cell by the total of all cells, while the command prop.table(table(data\(var1,data\)var2),1) divides each cell by the total of its row and prop.table(table(data\(var1,data\)var2),2) by the total of its column.
read.csv()
Reads csv files (comma separated value; i.e. a basic table format) from your harddrive or the web into an R dataframe.
df=read.csv("https://www.dropbox.com/s/a2opu10e2hz0dps/brexit.csv?dl=1")
Loads the brexit.csv
dataset
read_excel()
This command will load an excel spreadsheet. Needs to be preceded by library(readxl). The data file name inside the brackets must be in quotation marks.
seq()
Create a sequence of numbers; e.g.
seq(0,20,2)
[1] 0 2 4 6 8 10 12 14 16 18 20
setwd()
Will set working directory. E.g. setwd(“c:/folder/folder2”)
stargazer()
Possibly the most amazing function in R. Will let you produce very nice looking descriptive statistics tables and regression tables.
str()
Will display the structure of an R object.
subset()
The subset( ) function is the easiest way to select variables and observations.
E.g. In the following example, we select all rows that have a value of age greater than or equal to 20 or age less then 10. We keep the ID and Weight columns.
newdata <- subset(mydata, age >= 20 | age < 10, select=c(ID, Weight))
summarise()
Compute summary statistics
summary()
General command to provide summary information about an object.
table()
This command will generate a contingency table. The first variable will show the levels of the categorical variable as rows and the second variable will display the levels of the categorical variable as colums.
E.g. table(data\(call,data\)black) will return a 2x2 table where callback frequency is given by the rows (1=callback) and race frequency is given by the columns (1=black).
For attribution, please cite this work as
Martin (2021, Sept. 28). Datastories Hub: Quick Guide - Glossary of R commands. Retrieved from https://mondpanther.github.io/datastorieshub/posts/quickguides/quickguide_Rcommands/
BibTeX citation
@misc{martin2021quick, author = {Martin, Ralf}, title = {Datastories Hub: Quick Guide - Glossary of R commands}, url = {https://mondpanther.github.io/datastorieshub/posts/quickguides/quickguide_Rcommands/}, year = {2021} }