Never trust statistics
What to look for, not in any particular order
This YouTube clip and my blog post will overlap, but I didn’t check the clip.
I learned from Professor Jan van Gool (Pathophysiology, Wilhelmina Gasthuis, Amsterdam) that large studies can mislead, while the study of one sick person can find a causality you could never find with statistics. Don’t underestimate the power of following one patient.
You don’t need to be really smart to judge most statistics; common sense and general knowledge will be enough.
You don’t need to be great at math or calculus; not having total dyscalculia or innumeracy suffices. Don’t get intimidated by an accuracy of plus or minus 0.001%. If the confidence interval is calculated as 95% significant, that doesn’t mean there is a 95% chance that the conclusion is accurate. Rather, it only means that if the sampling process were repeated many times, 95% of the calculated intervals would confirm the conclusion. But don’t worry about this. Generally, these numbers are cooked up and checked by statisticians, and that’s not where the mistakes are.
It helps if you like doing puzzles.
Check where the numbers come from. Do the financer, researcher, source, or reporter have a stake or interest in the outcome? Be almost paranoid.
How random was the sampling? Was the randomness checked? I was once phoned for a poll. When I told my age, I was excluded. ‘We already have too many in your age group.’ The one paying for the poll or anyone else could never find out this deceit, as the age representation ‘was correct.’
Was the poll done over the phone, in person, anonymously, etc.? How would that influence the findings? Did the subjects feel safe to talk about personal or sensitive things? Going by calling or ‘random’ street interviews selects a certain section of the population. Polls by certain websites or publications say something about their consumers and not more.
Then, how exact are the numbers? Typically, they are too precise in order to feign accuracy. It’s ridiculous to say that 79.67% of the elderly use eyeglasses for reading. Some 80% would be fine.
Do the numbers come from experts, people self-reporting, or a poll? How accurate are these judgments? Were the questions neutral or steering? What definitions were used? In the previous paragraph, who are defined as elderly? How is depression defined? How is blood pressure measured?
How large was the sample? Are any conclusions justified with such small samples? If 1 in 3 Americans think something, does that still matter if you asked only 6 people? Be extra suspicious if the numbers are huge. Computer databanks now hold millions of people’s data, but sampling them is highly questionable, especially in medical research, because subjects aren’t randomly split into two groups, of which one gets placebos.
Small doesn’t mean insignificant. Studying molecules in the blood that could cause atherosclerosis overlooked one that was low-level. Later, they found that the blood level was low, but the turnover was large: much was used up in the tissue and rapidly replaced, and highly important.
Large samples can hide what you’re looking for because the different effects in different groups can cancel out. But small samples can overvalue the importance of outliers or errors in data collection. Faulty assumptions or ignorance about the sections of the population can mess up the results. It is known that right-wingers always underreport. Rich and poor people don’t just differ in income. You can’t just measure taller/shorter, thinner/fatter, and younger/older people by the same yardstick.
The gold standard is double-blind testing. This means you take a population, divide it randomly into two groups, and give one a treatment and the other a placebo treatment without the provider or the subject knowing who belongs to which group, to be revealed only after the experiment. The placebo effect reveals what percentage of healing comes not from the treatment. The placebo effect is not fake. Hope can help to heal or improve illnesses. But selecting a proper control group is one of the easiest things to do wrong in statistics. One researcher claimed that after October 7, a large percentage of Israelis is sleep deprived. They didn’t compare. Before that attack, most Israelis also slept too little. Testing sperm donors and finding they’re all mature males is a self-fulfilling result.
Retroactively or prospectively? It can make a difference if you follow two randomly selected groups of subjects and see what could have happened (retroactively) or what will happen (prospectively), or if you look back at. E.g., for years, medical science held that busybodies were prone to heart attacks. This turns out untrue. Yes, you find far more busybodies among survivors of heart attacks. But if you follow a group of men and look at who gets heart attacks, you find being busy doesn’t matter statistically. But of those with an attack, busybodies have a better chance to survive!
Is a margin of insecurity given? E.g., the number is 12% plus or minus 2%. Polls are worthless that say that half the parties participating in the elections will get 3 seats plus or minus 5 seats when the electoral threshold is at 3 seats. Larger polls give more accuracy but are also more expensive and take longer to hold. Pooling different research projects together to get to big numbers is often on shaky ground because each project had different definitions and locations that you can’t just add up.
Watch out that finding no significant correlation doesn’t prove there’s no causality. Maybe only a larger sample would find it.
Political (un)popularity polls are held in Israel by different institutions several times a week. Reported is then who went up and who went down, but they compare apples and oranges. If any of it has any value to begin with, you need to compare the numbers from one source over time.
A side effect is the term for an undesired result. It doesn’t mean a minor outcome. If medication works well for 30% of the patients, how many are worse off, and how bad are the downsides of the drug? If a new test finds 99.84% of all carriers, then what is the false positive rate, meaning those wrongly flagged as carriers? Not reporting this is 100% bad science.
Testing and testing until you get results you want elevates chance findings to false significance. So, ask if this test was a one-off or one of many tests.
Common sense. They ‘found’ by searching among half a million Britons in a large database that drinking decaf is less healthy than coffee from freshly milled coffee beans. Then you know the difference lies not in what they drank because it has the same chemical composition. How can it be? Well, who has time to mill beans, use a filter, and wait until the coffee is ready, and who has money for that? Perfectionists, people working from home, and the richer part of the population. If you work a manual job or have less free time or money, you’ll drink decaf. The richer are healthier. Surprised? Don’t expect common sense as part of Artificial Intelligence.
Causality. Some things may correlate, but that doesn’t mean that one causes the other. Causality must have a logical vector. My father used to say, ‘When the number of storks diminished, also the average number of children per household fell. That doesn’t prove storks bring babies.’
Wishful thinking. Is the proposed causality the only possible explanation? Maybe a simpler or more obvious variable is overlooked.
One-off. A statistical correlation means nothing if it can’t be repeated.
Implications matter. If there is ‘only’ a 0.000003% chance something would break, but that would bring down a plane, and there are 35 million flights yearly, that would cause 100 plane crashes a year—quite unacceptable.
Here is a report from today about young people not seeking work in Israel. It has many of the above-mentioned flaws. But if you think about it for one minute with a little general knowledge, you’ll notice also that the hundreds of thousands of Chareidi young adult men are completely ignored.
Never trust statistics. Use them like Where is Wally? Find the mistakes.
