Intro
Statistics is a subfield of mathematics. It refers to a collection of methods for working with data and using data to answer questions. Eyeball statistics
Statistics drives practice, policy and laws
- In almost every aspect of our lives, practice and policy is determined by statistics
- In fiscal policy, governments decide taxation and spending, based upon statistical assessments of their effects on the economy
- In health care, questions of what drugs should be approved and what treatment patients should get is based upon statistics (often in medical research)
- In our personal lives, our choices of where to work, live and how to save/invest are at least loosely driven by statistics
- Making good policy and personal decisions requires an understanding of statistics and data. If we misread the statistics or the statistics are unreliable, policy will as well
Lying with Statistics
- Agendra-driven data: As access to data has increased, the misuse of that data has also gone up, especially when people have agendas and want to further them. These people mislead, without technically lying, as the selectively pick and choose which data they use, and how they present that data
- Social media as magnifier: Bad data can take on a life of its won, especially with social media operating as a weapon to widen its reach
- Gresham's Law
- A monetary principle stating that "bad money drives out good". It is primarily used for consideration and application in currency markets. Gresham's law was originally based on the composition of minted coins and the value of the precious metals used in them.
- Gresham's Law
- Caveat emptor: As people weaponize data and use selective and slanted statistics, based upon that data, we need to be able to protect ourselves from misinformation
- Understanding statistics allows us to
- Look for red flags that can be used to detect data manipulation
- Asking the right questions to separate fact from fiction
- Understanding statistics allows us to
Mathematical Thinking
Mathematical thinking is about seeing the world in a different way. Which means sometimes seeing beyond our intuition or gut feeling
Population
- Collection of all items of interest of our study
- Denoted by N
- Numbers obtained are called parameters
Sample
- A subset of the population
- Denoted by n
- Numbers obtained are called statistics
Time Series vs Cross Section
The data that you are trying to study can be a phenomenon that you observe over time (time series data) or across different subjects at a point in time (cross sectional data)
- Time series example: If stock returns over time is your population, stock returns from 1960-2021 is a sample
- Cross sectional example: If all publicly traded companies is your population, looking at only US companies or companies with market caps that exceed $10 million is a sample
Regression toward the mean
In statistics, regression toward the mean(also calledregression to the mean, reversion to the mean, andreversion to mediocrity) is the phenomenon that arises if a sample point of a random variable is extreme (nearly an outlier), a future point is likely to be closer to the mean or average.To avoid making incorrect inferences, regression toward the mean must be considered when designing scientific experiments and interpreting data.
https://en.wikipedia.org/wiki/Regression_toward_the_mean
Controlled Experiments
- Randomness
- Allocation bias and Selection bias
- Randomized block design (where number of items in each group can be forced to be equal)
- Control groups
- Placebo effects - Placebo meaning - I shall please
- Single blind study
- Double blind study
- Matched-pair experiments
- Repeated measures design
Placebo Effect
THE EXPECTATION EFFECT by David Robson | Core Message - YouTube
Henrietta Lacks, the Tuskegee Experiment, & Ethical Data Collection
- Informed Consent
- Nuremberg code
- Beneficence
Outliers
Courses
https://www.youtube.com/watch?v=VPZD_aij8H0
https://www.youtube.com/watch?v=xxpc-HPKN28
https://www.youtube.com/watch?v=Vfo5le26IhY
http://people.stern.nyu.edu/adamodar/New_Home_Page/webcaststatistics.htm
https://365datascience.teachable.com/courses/enrolled/233979
https://www.khanacademy.org/math/ap-statistics
Outline
References
Abraham Wald and the Missing Bullet Holes
https://datascienceprep.com/blog/stat-guide-for-data-science-interviews