A language tour
A language born in statistics. Vectors are the atoms. Data frames are the currency. Insight is the goal.
01 โ Vectors First
In R, the scalar doesn't exist โ a single number is a vector of length one. Every operation is vectorised by default. You don't loop over elements; you describe what you want and R applies it to all of them at once.
"R is not a language designed by committee. It's a language designed by statisticians for statisticians โ and that focus on its intended domain gives it a clarity that more general languages lack."
โ Ross Ihaka, co-creator of R# A "scalar" is just a vector of length 1 x <- 42 length(x) # 1 # Vectorised arithmetic โ no loop needed scores <- c(88, 72, 95, 61, 83) curved <- scores * 1.1 # multiply every element passing <- scores[scores >= 70] # logical indexing # Built-in statistics mean(scores) # 79.8 sd(scores) # standard deviation summary(scores) # min, Q1, median, mean, Q3, max # seq and rep โ building vectors elegantly seq(1, 10, by = 2) # 1 3 5 7 9 rep(c("A", "B"), times = 3) # A B A B A B
scores[scores >= 70] creates a logical vector the same length as scores, then uses it as a mask. This pattern โ vectorised comparison as index โ is idiomatic R.
02 โ Data Frames
The data frame โ a table where every column can have a different type โ is R's native data structure. Before Python's pandas or Julia's DataFrames, R had this in the 1990s. It's the right shape for statistical data.
# Create a data frame โ like a spreadsheet in memory df <- data.frame( name = c("Alice", "Bob", "Carol"), score = c(91, 78, 88), grade = c("A", "C", "B") ) # Column access df$score # numeric vector df[df$score > 80, ] # rows where score > 80 # Attach more columns df$percentile <- rank(df$score) / nrow(df) * 100 # Aggregate โ split, apply, combine aggregate(score ~ grade, data = df, FUN = mean)
The ~ tilde operator denotes a formula in R โ "model score as a function of grade." This formula syntax threads throughout R's statistical modelling ecosystem.
03 โ Functional Style
R's apply family lets you apply functions over data structures without explicit loops. The tidyverse's purrr builds on this with a consistent, type-safe API. Functional data transformation is idiomatic R.
# sapply: apply a function over a vector, simplify result words <- c("hello", "world", "r") sapply(words, nchar) # 5 5 1 # lapply: always returns a list matrices <- lapply(1:3, function(n) matrix(0, n, n)) # Reduce: fold a list into a single value Reduce("+", 1:10) # 55 โ sum via fold # Anonymous function shorthand (R 4.1+) doubled <- sapply(1:5, \(x) x * 2) # [1] 2 4 6 8 10 # tapply: apply by groups heights <- c(170, 182, 165, 175) groups <- c("M", "M", "F", "F") tapply(heights, groups, mean) # F: 170 M: 176
\(x) x * 2 is R 4.1's native lambda syntax โ a shorthand for function(x) x * 2. Before 4.1, R had functions but no concise lambda form.
04 โ Statistical Modelling
R was built around the idea that fitting a statistical model should be as natural as writing the equation. The formula interface threads through linear models, GLMs, mixed effects models, survival analysis โ a consistent grammar of modelling.
# Linear regression โ formula: y ~ x1 + x2 model <- lm(mpg ~ wt + hp, data = mtcars) summary(model) # coefficients, Rยฒ, p-values, residuals # Prediction with confidence intervals newcar <- data.frame(wt = 3.0, hp = 120) predict(model, newcar, interval = "confidence") # Logistic regression โ just change the family glm_model <- glm(am ~ wt + hp, data = mtcars, family = binomial()) # ANOVA in one call aov(mpg ~ factor(cyl), data = mtcars)
mtcars is one of R's built-in datasets โ a 1974 Motor Trend data set present in every R installation. Having real data always available makes exploration and teaching effortless.
05 โ Visualisation
R's plot() is remarkably capable for quick exploration. But ggplot2 โ a grammar of graphics โ transformed what data visualisation code could look like. Layers, aesthetics, scales: a visual algebra that maps data to pixels declaratively.
library(ggplot2) # ggplot builds a plot as layers โ each + adds a layer ggplot(mtcars, aes(x = wt, y = mpg, colour = factor(cyl))) + geom_point(size = 3, alpha = 0.8) + geom_smooth(method = "lm", se = TRUE) + scale_colour_brewer(palette = "Set1") + labs( title = "Weight vs Fuel Economy", x = "Weight (1000 lbs)", y = "Miles per Gallon", colour = "Cylinders" ) + theme_minimal()
aes() maps data columns to visual aesthetics โ position, colour, size. Adding geom_smooth(method = "lm") overlays a fitted regression line with a shaded confidence band in a single call.
06 โ The Whole Picture
20,000+ packages on CRAN. If a statistical method exists, there's almost certainly an R package implementing it.
Prose + code + output in one document. Analysis and communication woven together โ reproducible by design.
Categorical data with defined levels and ordering. A type designed for statistics, not an afterthought.
A coherent ecosystem of packages sharing a philosophy: tidy data, consistent interfaces, human-readable code.
R is the language of peer-reviewed statistics. New methods appear in R packages before any other language.
Turn any R analysis into a web app without writing HTML or JavaScript. Interactive dashboards in pure R.