๐Ÿ“Š

A language tour

The Eloquence of R

A language born in statistics. Vectors are the atoms. Data frames are the currency. Insight is the goal.

scroll

01 โ€” Vectors First

Everything is a vector

In R, the scalar doesn't exist โ€” a single number is a vector of length one. Every operation is vectorised by default. You don't loop over elements; you describe what you want and R applies it to all of them at once.

"R is not a language designed by committee. It's a language designed by statisticians for statisticians โ€” and that focus on its intended domain gives it a clarity that more general languages lack."

โ€” Ross Ihaka, co-creator of R
vectors.R
# A "scalar" is just a vector of length 1
x <- 42
length(x)  # 1

# Vectorised arithmetic โ€” no loop needed
scores <- c(88, 72, 95, 61, 83)
curved  <- scores * 1.1  # multiply every element
passing <- scores[scores >= 70]  # logical indexing

# Built-in statistics
mean(scores)    # 79.8
sd(scores)      # standard deviation
summary(scores) # min, Q1, median, mean, Q3, max

# seq and rep โ€” building vectors elegantly
seq(1, 10, by = 2)     # 1 3 5 7 9
rep(c("A", "B"), times = 3)  # A B A B A B

scores[scores >= 70] creates a logical vector the same length as scores, then uses it as a mask. This pattern โ€” vectorised comparison as index โ€” is idiomatic R.


02 โ€” Data Frames

Tables as first-class citizens

The data frame โ€” a table where every column can have a different type โ€” is R's native data structure. Before Python's pandas or Julia's DataFrames, R had this in the 1990s. It's the right shape for statistical data.

dataframes.R
# Create a data frame โ€” like a spreadsheet in memory
df <- data.frame(
  name  = c("Alice", "Bob", "Carol"),
  score = c(91, 78, 88),
  grade = c("A", "C", "B")
)

# Column access
df$score         # numeric vector
df[df$score > 80, ]  # rows where score > 80

# Attach more columns
df$percentile <- rank(df$score) / nrow(df) * 100

# Aggregate โ€” split, apply, combine
aggregate(score ~ grade, data = df, FUN = mean)

The ~ tilde operator denotes a formula in R โ€” "model score as a function of grade." This formula syntax threads throughout R's statistical modelling ecosystem.


03 โ€” Functional Style

Apply, map, transform

R's apply family lets you apply functions over data structures without explicit loops. The tidyverse's purrr builds on this with a consistent, type-safe API. Functional data transformation is idiomatic R.

functional.R
# sapply: apply a function over a vector, simplify result
words <- c("hello", "world", "r")
sapply(words, nchar)  # 5 5 1

# lapply: always returns a list
matrices <- lapply(1:3, function(n) matrix(0, n, n))

# Reduce: fold a list into a single value
Reduce("+", 1:10)  # 55 โ€” sum via fold

# Anonymous function shorthand (R 4.1+)
doubled <- sapply(1:5, \(x) x * 2)
# [1]  2  4  6  8 10

# tapply: apply by groups
heights <- c(170, 182, 165, 175)
groups  <- c("M", "M", "F", "F")
tapply(heights, groups, mean)  # F: 170  M: 176

\(x) x * 2 is R 4.1's native lambda syntax โ€” a shorthand for function(x) x * 2. Before 4.1, R had functions but no concise lambda form.


04 โ€” Statistical Modelling

Models in one line

R was built around the idea that fitting a statistical model should be as natural as writing the equation. The formula interface threads through linear models, GLMs, mixed effects models, survival analysis โ€” a consistent grammar of modelling.

models.R
# Linear regression โ€” formula: y ~ x1 + x2
model <- lm(mpg ~ wt + hp, data = mtcars)
summary(model)   # coefficients, Rยฒ, p-values, residuals

# Prediction with confidence intervals
newcar <- data.frame(wt = 3.0, hp = 120)
predict(model, newcar, interval = "confidence")

# Logistic regression โ€” just change the family
glm_model <- glm(am ~ wt + hp, data = mtcars,
                  family = binomial())

# ANOVA in one call
aov(mpg ~ factor(cyl), data = mtcars)

mtcars is one of R's built-in datasets โ€” a 1974 Motor Trend data set present in every R installation. Having real data always available makes exploration and teaching effortless.


05 โ€” Visualisation

Plots as language

R's plot() is remarkably capable for quick exploration. But ggplot2 โ€” a grammar of graphics โ€” transformed what data visualisation code could look like. Layers, aesthetics, scales: a visual algebra that maps data to pixels declaratively.

ggplot2.R
library(ggplot2)

# ggplot builds a plot as layers โ€” each + adds a layer
ggplot(mtcars, aes(x = wt, y = mpg, colour = factor(cyl))) +
  geom_point(size = 3, alpha = 0.8) +
  geom_smooth(method = "lm", se = TRUE) +
  scale_colour_brewer(palette = "Set1") +
  labs(
    title    = "Weight vs Fuel Economy",
    x        = "Weight (1000 lbs)",
    y        = "Miles per Gallon",
    colour   = "Cylinders"
  ) +
  theme_minimal()

aes() maps data columns to visual aesthetics โ€” position, colour, size. Adding geom_smooth(method = "lm") overlays a fitted regression line with a shaded confidence band in a single call.


06 โ€” The Whole Picture

Why R endures

๐Ÿ“ฆ

CRAN

20,000+ packages on CRAN. If a statistical method exists, there's almost certainly an R package implementing it.

๐Ÿ““

R Markdown

Prose + code + output in one document. Analysis and communication woven together โ€” reproducible by design.

๐Ÿ”ข

Factor Types

Categorical data with defined levels and ordering. A type designed for statistics, not an afterthought.

๐ŸŒŠ

The Tidyverse

A coherent ecosystem of packages sharing a philosophy: tidy data, consistent interfaces, human-readable code.

๐Ÿงช

Academia

R is the language of peer-reviewed statistics. New methods appear in R packages before any other language.

๐Ÿ”—

Shiny

Turn any R analysis into a web app without writing HTML or JavaScript. Interactive dashboards in pure R.