R Basics
1. Running R Environments
For R programming, you can choose either a local or cloud environment. If you prefer to use a cloud environment, go to RStudio Cloud. To run R on a local environment, you need to install R and RStudio respectively.
R is a programming language for statistics, and is one of the most popular computer languages. R is easy to learn and has been widely used by data scientists, social scientists, and digital humanists.
- Figure 1: RStudio
Figure 1 is a screenshot of RStudio. RStudio comprises of four spaces: code editor
, R console
, workspace & history
, and plots & files
. In code editor
, you can load or save code. The file extension of the code notebook is .R
. You can execute code with cmd
+return
(macOS) or ctrl
+enter
(Windows). If you want to stop running code, use ctrl
+c
.
Values or loaded data are saved in workspace
on the top right. You can delete them with the rm
function. You can see the history of executed code in the History
tab.
All code execution happens in R console
. You can manage files in the Files
tab on the bottom right. All visualizations are printed in the Plots
tab. You can export visualizations through either code execution or the Export
button in the Plots
tab.
2. R Operations
In R, there are two types of operators: arithmetic and logical operators. Arithmetic operators are +
(addition), -
(subtraction), *
(multiplication), /
(division), and ^
/ **
(exponentiation). Logical operators are >
(greater than), >=
(greater than or equal to), ==
(equal to), and !=
(not equal to).
R uses data structures such as scalars, vectors (numerical, character, and logical), matrices, data frames, and lists.
2.1 Scalar
A scalar
is a single value. A scalar
can have three data types: numeric, character, or logical.
x = 10
y = 3
print(x)
print(y)
[1] 10
[1] 3
print(c(x,y))
[1] 10 3
z <- x * 3
print(z)
[1] 30
x <- 1.2
y <- "hello"
z <- TRUE
print(x)
print(y)
print(z)
[1] 1.2
[1] "hello"
[1] TRUE
2.2 Vector
Vectors are a combination of numeric, character, or logical values. All elements must have the same mode (numeric, character , or logical).
x <- c(1.1, 2.2, -5, 4.2, 2)
y <- c(TRUE, FALSE, TRUE)
z <- c("Howdy", "Aggies", "Whoop")
q <- 1.2:7.9
print(x)
print(y)
print(z)
print(q)
[1] 1.1 2.2 -5.0 4.2 2.0
[1] TRUE FALSE TRUE
[1] "Howdy" "Aggies" "Whoop"
[1] 1.2 2.2 3.2 4.2 5.2 6.2 7.2
2.3 Matrix
A matrix consists of rows and columns. All columns in a matrix must have the same type (numeric, character , or logical). A matrix is a homogenous collection of datasets.
x <- matrix(1:20, nrow=5,ncol=4)
print(x)
[,1] [,2] [,3] [,4]
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20
print(x[3,]) # 3rd row of matrix
print(x[,2]) # 2nd column of matrix
[1] 3 8 13 18
[1] 6 7 8 9 10
print(x[2:3,1:3]) # rows 2,3 of columns 1,2,3
[,1] [,2] [,3]
[1,] 2 7 12
[2,] 3 8 13
cells <- c(1812, 1819, 1870, 1880)
y <- matrix(cells, nrow=2, ncol=2, byrow=TRUE)
print(y)
[,1] [,2]
[1,] 1812 1819
[2,] 1870 1880
cells <- c(1812, 1819, 1870, 1880)
rnames <- c("Born", "Death")
cnames <- c("Charles Dickens", "George Eliot")
z <- matrix(cells, nrow=2, ncol=2, byrow=TRUE, dimnames=list(rnames, cnames))
print(z)
Charles Dickens George Eliot
Born 1812 1819
Death 1870 1880
2.4 List
A list
is a collection of elements. A list
can include different types.
x <- list(name="Charles Dickens", gender="M", nationality="English", born=1812, matrix_example=z)
print(x)
$name
[1] "Charles Dickens"
$gender
[1] "M"
$nationality
[1] "English"
$born
[1] 1812
$matrix_example
Charles Dickens George Eliot
Born 1812 1819
Death 1870 1880
2.5 Data Frame
A data frame can contain different data types. Data frames are mostly used for storing data. A data frame is similar to a table in Excel. The data mode must be numeric, character or logical. A data frame is a heterogeneous collection of datasets.
df <- data.frame(num=c(1:3), author=c("Charles Dickens", "George Eliot", "Wilkie Collins"), birth_date=as.Date(c("1812/2/7", "1819/11/22", "1824/1/8")), death_year = as.Date(c("1870/6/9", "1880/12/22", "1889/9/23")), children=c(10, 0, 3))
print(df)
num author birth_date death_year children
1 1 Charles Dickens 1812-02-07 1870-06-09 10
2 2 George Eliot 1819-11-22 1880-12-22 0
3 3 Wilkie Collins 1824-01-08 1889-09-23 3