Let's say you're reading a scientific paper and you see this value: \( 37.9 \pm 1.5\). What does that mean? What does that little plus/minus tell you and why is it there?
That plus/minus exists because no measurement taken is ever exact. If someone were to measure your height several times, they wouldn't get the same result every time. You're not growing and shrinking; there's just a source of error --in this case, human error-- in the measurement. The plus/minus --or rather the number after it-- tells you how uncertain that particular measurement is. In fact, it's sometimes called the uncertainty, or the sigma (\(\sigma\)), or the standard deviation.
A sigma is a standardized convention used in statistics to say how far away a certain measurement is from the mean, or mu (\(\mu\)).
Does this look familiar?
This is a gaussian curve, sometimes called a normal distribution. I'll tell you why it probably looks so familiar in a bit, but first I'll tell you what it means.
The y-axis of the curve above is frequency; it tells you how many times a specific value (different measurement values run along the x-axis) was recorded. Going back to the height example, the mean, or \(\mu\), is the average of all of the height measurements taken. If someone asked you how tall you are, you would probably tell them this value, because it's the one that gets repeated the most. Similarly, when scientists report a value without an error, they're likely reporting \(\mu\).
There is an equation that relates \(\mu\), \(\sigma\), and these percentages (p):
$p = \frac{1}{\sigma \sqrt{2\pi}}e^{\left [ -\frac{1}{2}\left ( \frac{A-\mu}{\sigma} \right )^{2} \right ]}$
where A is a specific measurement of, for example, your height.
In a specific form of this equation where \(\mu = 0\) and \(\sigma = 1\), the probability will tell you the percentages illustrated in the figure above.
In its most general form, it gives you the probability that you will take a certain measurement given a specific \(\mu\) and \(\sigma\). For example, the probability of a nurse telling you that you're 4' tall when you're usually measured to be 6'2" (\(\pm\) an inch or two) is really low.
You might be able to tell by looking at the plot that the values of \(\mu\) and \(\sigma\) really affect the shape of the curve. (Increasing \(\sigma\) makes the curve fatter, and this makes sense because more data will fall within that 68.3% cutoff.) Because of this, and because the area under a gaussian is so easy to measure, gaussians are often used to "fit" or model data sets.
This might seem kind of naive, assuming that a shape so simple can be used to model complex natural systems, but it's really not! Gaussians actually occur all over the place in nature because of something called the Central Limit Theorem. The theorem states that, given certain conditions, the arithmetic mean of a sufficiently large number of iterates of independent random variables, each with a well-defined expected value and well-defined variance, will be approximately normally distributed.
In other, simpler words, let's say a nurse measures your height 50,000 times. Each measurement is independent of all the others. If you were to plot those height measurements against their frequency (how many times that specific value was recorded), it would look like a gaussian.
That's why the gaussian probably looked so familiar to you! Because it exists all around you. I mean, it's actually because gaussians are in all of the books on math, science, and statistics, but I like the first reason more.
No comments:
Post a Comment