Normal distribution in R | Examples
The Normal distribution functions
dnorm(x, mean = 0, sd = 1, log = FALSE)
pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
rnorm(n, mean = 0, sd = 1)
If mean or sd are not specified they assume the default values of 0 and 1, respectively.
The normal distribution has density:
$f(x)=\Large \frac{1}{\sqrt{2 \pi \sigma^{2}}} e^{-\frac{(x-\mu)^{2}}{2 \sigma^{2}}}$
where $\mu$ is the mean of the distribution and $\sigma$ the standard deviation.
dnormgives the PDF densitypnormgives the CDF functionqnormgives the quantile functionrnormgenerates random deviates
For
rnormthe length of the result is determined byn, and is the maximum of the lengths of the numerical arguments for the other functions.
For sd = 0 this gives the limit as sd decreases to 0, a point mass at $\mu$. sd < 0 is an error and returns NaN.
Example: Area under PDF based on condition
What is the percentage of students with the height > 170 if height is distributed normally $\mathcal N(150,20)$.
We can solve this very easy if we just know the percentage inside $[\mu-\sigma, \mu+\sigma]$ is ~68.2%.
Our percentage should be $(1-68.2)/2 \simeq 15.9$
In R we can integrate the dnorm to get the output.
integrate(dnorm, mean=150, sd=20, lower= 170, upper= Inf, abs.tol = 0)$value
Out
0.1586553
The same answer we get with:
1-pnorm(170, mean=150, sd=20)
dnormis PDF,pnormis CDF
Example: Random variable difference
Men have a mean height of 178cm with a standard deviation of 8cm. Womnen have a mean height of 170cm with a standard deviation of 6cm. Male and female heights are normally distributed. What is the probability that the woman is taller than the man?
$M = \mathcal N(178, 8), W = \mathcal N(170, 6)$
We are interested to find the new random variable $D = M-W$ of the difference.
To calculate:
$\mu_D = \mu_M -\mu_W=8 \ \sigma_D^2 = \sigma_M^2 +\sigma_W^2=100$
$\therefore \sigma_D = 10, \ D \sim \mathcal N(2,10)$
To calculate the probability woman is taller than man:
$\mathbb P(W \gt M) = \mathbb P(M-W < 0) = \mathbb P( D \lt 0)$
pnorm(0, mean=8, sd=10 )
Out:
0.211855398583397
Example:_ Combine two random variables_
Summer drives to work and back. The amount of fuel he uses follows a normal distribution:
To work: $\quad \mu_{W}=10 \mathrm{~L} \quad \sigma_{W}=1.5 \mathrm{~L}$
To home: $\quad \mu_{H}=10 \mathrm{~L} \quad \sigma_{H}=2 \mathrm{~L}$
If he has $25L$ of fuel and he intends to drive to work and back home. What is the probability that he runs out of fuel?
To calculate this we identify two random variables.
$W \sim \mathcal N(10, 1.5) \
H \sim \mathcal N(10, 2)
\therefore
B = \mathcal N(10+10, \sqrt{1.5^2+2^2}) = \mathcal N(20, 2.5)$
To run out of fuel we need $\mathbb P(B>25)$
1- pnorm(25, mean=20, sd=2.5 )
Out:
0.0227501319481792
Example: Calculate the $\sigma$ interval around the mean percentage
Get the percentage in area $[\mu - \sigma, \mu + \sigma ]$
bef <- pnorm(-1, mean=0, sd=1 )
bef
aft <- 1 - pnorm(1, mean=0, sd=1 )
aft
# finally
1-(bef+aft)
Out:
0.158655253931457
0.158655253931457
0.682689492137086
Example: A random variable $X \sim \mathcal N(37,7)$. Find the following probabilities:
- a) $\mathbb P(x<25)$
- b) $\mathbb P(x>42)$
- e) $\mathbb P(25<x<42)$
par(mfrow=c(1,2))
curve(dnorm(x,35,7), 10, 60, lwd=2, ylab="PDF", main="NORM(35,7)")
abline(h=0,col="green2"); abline(v = 25, col="red", lty="dashed")
curve(dnorm(x,35,7), 10, 60, lwd=2, ylab="PDF", main="NORM(35,7)")
abline(h=0,col="green2"); abline(v = 42, col="blue", lty="dashed")

For the a) case we can integrate dnorm where $x<25$
integrate(dnorm, mean=35, sd=7, lower= -Inf, upper=25, abs.tol = 0)$value
Out:
0.07656373
Exact same result would be to call pnorm(25, mean=35, sd=7 )
b) Again we can integrate but different region from $x>42$
integrate(dnorm, mean=35, sd=7, lower=42, upper=Inf, abs.tol = 0)$value
Out:
0.1586553
The same result we may get using the pnorm function:
1-pnorm(42, mean=35, sd=7)
par(mfrow=c(1,1))
curve(dnorm(x,35,7), 10, 60, lwd=2, ylab="PDF", main="NORM(35,7)")
abline(h=0,col="green2");
abline(v = c(25,42), col="darkgreen", lty="dashed")

c) To solve the region $24<x<42$ we may integrate dnorm again:
integrate(dnorm, mean=35, sd=7, lower=25, upper=42, abs.tol = 0)$value
Out:
0.764781
Or we may use:
1 - pnorm(25, mean=35, sd=7) -(1-pnorm(42, mean=35, sd=7))
Example: Product of two random variables
If we have two random variables $X \sim \mathcal N(\mu_1, \sigma_1)$ and $Y \sim \mathcal N(\mu_2, \sigma_2)$
Then the effective $\mu$ and $\sigma$ of the product would be:
\[\left(\sigma_{1}^{2}+\sigma_{2}^{2}\right) \mu=\mu_{1} \sigma_{2}^{2}+\mu_{2} \sigma_{1}^{2}, \quad \frac{1}{\sigma^{2}}=\frac{1}{\sigma_{1}^{2}}+\frac{1}{\sigma_{2}^{2}}\]Based on the fact Gaussian exponents are quadratic:
\[\frac{1}{\sigma_{1}^{2}}\left(x-\mu_{1}\right)^{2}+\frac{1}{\sigma_{2}^{2}}\left(x-\mu_{2}\right)^{2}=\frac{1}{\sigma^{2}}(x-\mu)^{2}+C\]…
tags: pdf & category: r