View on GitHub

data8

My notes for Data8 - Fall 2019

Notes about Statistics

The stats part of data science

Probability

Multiplication Rule: P(A and B) = P(A) * P(B|A)

Addition Rule: P(A or B) = P(A) + P(B) - P(A and B)

Complement Rule: P(A) = 1 - P(A⁻¹)

Bayes’ Rule: P(A|B) = P(B|A) * P(A) / P(B)

Sampling

Distributions

Inference

Models

Tools for Making Inferences

Hypothesis Testing

A/B Testing

Randomized Controlled Experiment:

Random Assignment Vs. Shuffling

Data Generation Sample Data Hypothesis Testing Conclusions  
Observational Sample Numerical Data with 2 Samples Shuffle labels to simulate from null Association  
Random Control Experiment Numerical Data with 2 Samples Shuffle labels to simulate from null Causation  

Quantifying Uncertainty

Estimation: If we want to get a value of a parameter for a population, a census is usually unrealistic. Therefore, estimation is a useful tool.

Bootstrap: A technique/process for simulating repeated random sampling

Confidence Intervals: An x% confidence interval means that we are x% confident that the true value lies within a certain range, given estimates of a parameter

Definitions of Center and Spread:

Normal Distributions: Bell shaped distribution.

Central limit theorem:

Sample Proportions

Sample SD and mean As a sample size approaches the population size:

Classification and Regression: