Skip to main content

Intro

Statistics is a subfield of mathematics. It refers to a collection of methods for working with data and using data to answer questions. Eyeball statistics

Statistics drives practice, policy and laws

  • In almost every aspect of our lives, practice and policy is determined by statistics
    • In fiscal policy, governments decide taxation and spending, based upon statistical assessments of their effects on the economy
    • In health care, questions of what drugs should be approved and what treatment patients should get is based upon statistics (often in medical research)
    • In our personal lives, our choices of where to work, live and how to save/invest are at least loosely driven by statistics
  • Making good policy and personal decisions requires an understanding of statistics and data. If we misread the statistics or the statistics are unreliable, policy will as well

Lying with Statistics

  • Agendra-driven data: As access to data has increased, the misuse of that data has also gone up, especially when people have agendas and want to further them. These people mislead, without technically lying, as the selectively pick and choose which data they use, and how they present that data
  • Social media as magnifier: Bad data can take on a life of its won, especially with social media operating as a weapon to widen its reach
    • Gresham's Law
      • A monetary principle stating that "bad money drives out good". It is primarily used for consideration and application in currency markets. Gresham's law was originally based on the composition of minted coins and the value of the precious metals used in them.
  • Caveat emptor: As people weaponize data and use selective and slanted statistics, based upon that data, we need to be able to protect ourselves from misinformation
    • Understanding statistics allows us to
      • Look for red flags that can be used to detect data manipulation
      • Asking the right questions to separate fact from fiction

Mathematical Thinking

Mathematical thinking is about seeing the world in a different way. Which means sometimes seeing beyond our intuition or gut feeling

Population

  • Collection of all items of interest of our study
  • Denoted by N
  • Numbers obtained are called parameters

Sample

  • A subset of the population
  • Denoted by n
  • Numbers obtained are called statistics

Time Series vs Cross Section

The data that you are trying to study can be a phenomenon that you observe over time (time series data) or across different subjects at a point in time (cross sectional data)

  • Time series example: If stock returns over time is your population, stock returns from 1960-2021 is a sample
  • Cross sectional example: If all publicly traded companies is your population, looking at only US companies or companies with market caps that exceed $10 million is a sample

Regression toward the mean

In statistics, regression toward the mean(also calledregression to the mean, reversion to the mean, andreversion to mediocrity) is the phenomenon that arises if a sample point of a random variable is extreme (nearly an outlier), a future point is likely to be closer to the mean or average.To avoid making incorrect inferences, regression toward the mean must be considered when designing scientific experiments and interpreting data.

https://en.wikipedia.org/wiki/Regression_toward_the_mean

Controlled Experiments

  • Randomness
  • Allocation bias and Selection bias
  • Randomized block design (where number of items in each group can be forced to be equal)
  • Control groups
  • Placebo effects - Placebo meaning - I shall please
  • Single blind study
  • Double blind study
  • Matched-pair experiments
  • Repeated measures design

Placebo Effect

THE EXPECTATION EFFECT by David Robson | Core Message - YouTube

Henrietta Lacks, the Tuskegee Experiment, & Ethical Data Collection

  • Informed Consent
  • Nuremberg code
  • Beneficence

Outliers

https://www.freecodecamp.org/news/what-is-an-outlier-definition-and-how-to-find-outliers-in-statistics

Courses

https://www.youtube.com/watch?v=VPZD_aij8H0

https://www.youtube.com/watch?v=xxpc-HPKN28

https://www.youtube.com/watch?v=Vfo5le26IhY

Statistics 101

http://people.stern.nyu.edu/adamodar/New_Home_Page/webcaststatistics.htm

https://365datascience.teachable.com/courses/enrolled/233979

https://www.khanacademy.org/math/ap-statistics

Outline

References

Abraham Wald and the Missing Bullet Holes

https://medium.com/@penguinpress/an-excerpt-from-how-not-to-be-wrong-by-jordan-ellenberg-664e708cfc3d

https://datascienceprep.com/blog/stat-guide-for-data-science-interviews