R's Normal Distribution Functions: rnorm and pals

Submitted by dylan on Wed, 2010-07-14 17:10.

The rnorm() function in R is a convenient way to simulate values from the normal distribution, characterized by a given mean and standard deviation. I hadn't previously used the associated commands dnorm() (normal density function), pnorm() (cumulative distribution function), and qnorm() (quantile function) before-- so I made a simple demo. The *norm functions generate results based on a well-behaved normal distribution, while the corresponding functions density(), ecdf(), and quantile() compute empirical values. The following example could be extended to graphically describe departures from normality (or some other distribution-- see rt(), runif(), rcauchy() etc.) in a data set.

Simple Example

# sample a normal distribution, with a mean of 5 and sd of 2, 100 times
x <- rnorm(100, mean=5, sd=2)

# sort in ascending order
x.sorted <- sort(x)

# compute the empirical cumulative distribution function
x.ecdf <- ecdf(x.sorted)

# plot the expected and actual probability density
plot(x.sorted, dnorm(x.sorted, mean=5, sd=2), type='l', ylim=c(0,1), ylab='Probability', xlab='Value', main='rnorm(), dnorm(), pnorm(), and qnorm()')
lines(density(x), col=1, lty=2)

# add the expected and actual cumulative probability
lines(x.sorted, pnorm(x.sorted, mean=5, sd=2), type='l', col=2)
lines(x.sorted, x.ecdf(x.sorted), type='l', col=2, lty=2)

# add the expected and actual p=0.5 (median) and p=0.95 quantiles
abline(v=qnorm(c(0.5, 0.95), mean=5, sd=2), col=3)
abline(v=quantile(x, probs=c(0.5, 0.95)), col=3, lty=2)

# add the original x values

# annotate
legend('topleft', legend=c('Probability Density','Cumulative Probability','[0.5, 0.95] Quantiles'), lty=1, col=1:3, bty='n')

rnorm() and pals: Solid lines are expected values, dashed lines are actual valuesrnorm() and pals: Solid lines are expected values, dashed lines are actual values

( categories: )

QQ plots

The human eye is not great at differentiating curves. Any two vaguely bell-shaped curves look pretty similar. A better approach to visually checking normality of sample data is using a qq plot, which is easily done in R. Try the following with your example:


QQ Plots

Thanks for the tip Sean. I agree, QQ plots are a much better approach for comparing distributions. This post was mostly about comparing those functions responsible for generating theoretical vs. estimated parameters.