A Comprehensive Guide to Probability & Statistics for Data Science

25 min readJun 11, 2021

This is a long long blog post, hence I have divided it into parts, this first part covers the context, table of content for the overall blog post. We will be covering majorly these topics:

Probability
Descriptive Statistics
Inferential Statistics
Bayesian Statistics
Statistical Learning

When I look at the literature available on probability & statistics, I find it too theoretical and generalized. I have felt that there should be some content on probability & statistics specifically focused on data science.

I want to cover here everything about probability & statistics from basics to statistical learning. I would like to mention that my focus in these posts would be to give intuition on every topic and how it relates to data science rather than going deep into mathematical formulas or their implementation in the real world.

This blog post contains 6 parts, this one is the first part which gives an overview and sets the context of subsequent parts.

The second part will cover probability & its types, random variables & probability distributions, and how they are important from a data science perspective.

Probability

Introduction
Conditional Probability
Random Variables
Probability Distributions

The third, Fourth & Fifth parts will cover every topic related to statistics & its significance in data science.

Statistics

Introduction
Descriptive Statistics
Inferential Statistics
Bayesian Statistics

The sixth (final) part will cover statistical learning, it will be about looking at machine learning or data science from a statistical perspective.

Statistical Learning

Introduction
Prediction & Inference
Parametric & Non-parametric methods
Prediction Accuracy and Model Interpretability
Bias-Variance Trade-Off

This is the 2nd part of the blog post ‘Probability & Statistics for Data Science’, this part covers these topics related to probability and their significance in data science.

Introduction
Conditional Probability
Random Variables
Probability Distributions

Probability

Probability is the chance that something will happen — how likely it is that some event will happen.

Probability of an event happening P(E) = Number of ways it can happen n(E)/ Total number of outcomes n(T)

Probability is the measure of the likelihood that an event will occur. Probability is quantified as a number between 0 and 1, where 0 indicates impossibility and 1 indicates certainty.

Why probability is important?

Uncertainty and randomness occur in many aspects of our daily life and having a good knowledge of probability help us make sense of these uncertainties. Learning about probability helps us make informed judgments on what is likely to happen, based on a pattern of data collected previously or an estimate.

How Probability is used in Data Science?

Data science often uses statistical inferences to predict or analyze trends from data, while statistical inferences use probability distributions of data. Hence knowing probability and its applications are important to work effectively on data science problems.

Conditional Probability

Conditional probability is a measure of the probability of an event (some particular situation occurring) given that (by assumption, presumption, assertion, or evidence) another event has occurred.

The probability of event B given event A equals the probability of event A and event B divided by the probability of event A.

How conditional probability is used in data science?

Many data science techniques (i.e. Naive Bayes) rely on Bayes’ theorem.

Bayes’ theorem is a formula that describes how to update the probabilities of hypotheses when given evidence.

Using the Bayes’ theorem, it's possible to build a learner that predicts the probability of the response variable belonging to some class, given a new set of attributes.

Random Variables

A random variable is a set of possible values from a random experiment.

A random variable (random quantity, aleatory variable, or stochastic variable) is a variable whose possible values are outcomes of a random phenomenon.

Random variables can be discrete or continuous. Discrete random variables can only take certain values while continuous random variables can take any value (within a range).

Probability Distributions

The probability distribution for a random variable describes how the probabilities are distributed over the values of the random variable.

For a discrete random variable, x, the probability distribution is defined by a probability mass function, denoted by f(x). This function provides the probability for each value of the random variable.

For a continuous random variable, since there is an infinite number of values in any interval, the probability that a continuous random variable will lie within a given interval is considered. So here, the probability distribution is defined by the probability density function, also denoted by f(x).

Both probability functions must satisfy two requirements:

(1) f(x) must be non-negative for each value of the random variable, and
(2) the sum of the probabilities for each value (or integral overall values) of the random variable must equal one.

Types of probability distributions

A binomial distribution is a statistical experiment that has the following properties: The experiment consists of n repeated trials. Each trial can result in just two possible outcomes. We call one of these outcomes a success and the other, a failure. The probability of success, denoted by P, is the same on every trial.

The normal distribution, also known as the Gaussian distribution, is a probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean. It has the following properties:

The normal curve is symmetrical about the mean μ;
The mean is at the middle and divides the area into halves;
The total area under the curve is equal to 1;
It is completely determined by its mean and standard deviation σ (or variance σ2)

Other common probability distributions are Bernoulli, Uniform, Poisson, and Exponential distributions, which are not in the scope of this blog post.

How random variables & probability distributions are used in data science?

Data science often uses statistical inferences to predict or analyze trends from data, while statistical inferences use probability distributions of data. Hence knowing random variables & their probability distributions are important to work effectively on data science problems.

This is the 3rd part of the blog post ‘Probability & Statistics for Data Science’, this part covers these topics related to descriptive statistics and their significance in data science.

Introduction to Statistics
Descriptive Statistics
Uni-variate Analysis
Bi-variate Analysis
Multivariate Analysis
Function Models
Significance in Data Science

Statistics Introduction

Statistics is a mathematical body of science that pertains to the collection, analysis, interpretation or explanation, and presentation of data.

Statistics, in short, is the study of data. It includes descriptive statistics (the study of methods and tools for collecting data, and mathematical models to describe and interpret data) and inferential statistics (the systems and techniques for making probability-based decisions and accurate predictions.

Population vs Sample

Population means the aggregate of all elements under study having one or more common characteristics while sample is a part of population chosen at random for participation in the study.

Descriptive Statistics

A descriptive statistic is a summary statistic that quantitatively describes or summarizes features of a collection of information. Descriptive statistics are just descriptive. They do not involve generalizing beyond the data at hand.

Types of Variable

Dependent and Independent Variables: An independent variable (experimental or predictor) is a variable that is being manipulated in an experiment in order to observe the effect on a dependent variable (outcome).

Categorical and Continuous Variables: Categorical variables (qualitative) represent types of data that may be divided into groups. Categorical variables can be further categorized as either nominal, ordinal or dichotomous. Continuous variables (quantitative) can take any value. Continuous variables can be further categorized as either interval or ratio variables.

Central Tendency

Central tendency is a central or typical value for distribution. It may also be called a center or location of the distribution. The most common measures of central tendency are the arithmetic mean, the median, and the mode.

Mean is the numerical average of all values, median is directly in the
middle of the data set while mode is the most frequent value in the data set.

Spread or Variance

Spread (dispersion or variability) is the extent to which a distribution is stretched or squeezed. Common examples of measures of statistical dispersion are the variance, standard deviation, and inter-quartile range (IQR).

Inter-quartile range (IQR) is the distance between the 1st quartile and 3rd quartile and gives us the range of the middle 50% of our data. Variance is the average of the squared differences from the meanwhile standard deviation is the square root of the variance.

Upper outliers: Q3+1.5 ·IQR
Lower outliers: Q1–1.5 ·IQR

Standard Score or Z score: For an observed value x, the Z score finds the number of standard deviations x is away from the mean.

The Central Limit Theorem is used to help us understand the following facts regardless of whether the population distribution is normal or not:

the mean of the sample means is the same as the population mean
the standard deviation of the sample means is always equal to the standard error.
the distribution of sample means will become increasingly more normal as the sample size increases.

Univariate Analysis

In univariate analysis, appropriate statistic depends on the level of measurement. For nominal variables, a frequency table and a listing of the mode(s) are sufficient. For ordinal variables, the median can be calculated as a measure of central tendency and the range (and variations of it) as a measure of dispersion. For interval level variables, the arithmetic mean (average) and standard deviation are added to the toolbox and, for ratio level variables, we add the geometric mean and harmonic mean as measures of central tendency and the coefficient of variation as a measure of dispersion.

For interval and ratio level data, further descriptors include the variable’s skewness and kurtosis. Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point. Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution.

Mainly, bar graphs, pie charts, and histograms are used for univariate analysis.

Bivariate Distribution

Bivariate analysis involves the analysis of two variables (often denoted as X, Y), for the purpose of determining the empirical relationship between them.

For two continuous variables, a scatter plot is a common graph. When one variable is categorical and the other continuous, a *box-plot or violin-plot (*also Z-test and t-test) is common and when both are categorical a mosaic plot is common (also chi-square test).

Multivariate Analysis

Multivariate analysis involves observation and analysis of more than one statistical outcome variable at a time. Multi-variate scatter plot, grouped box-plot (or grouped violin-plot), heat-map is used for multivariate analysis.

Function Models

A function can be expressed as an equation, as shown below. In the equation, f represents the function name and x represents the independent variable and y represents the dependent variable.

A linear function has the same average rate of change on every interval. When a linear model is used to describe data, it assumes a constant rate of change.

Exponential functions have variable that appears as the exponent (or power) instead of the base.

The logistic function has the effect of limiting the upper bound, a curve that grows exponentially at first and then slows down and hardly grows at all.

Significance in Data Science

Descriptive Statistics helps you to understand your data and is the initial & very important step of Data Science. This is due to the fact that Data Science is all about making predictions and you can’t predict if you can’t understand the patterns in existing data.

This is the 4th part of the blog post ‘Probability & Statistics for Data Science’, this part covers these topics related to inferential statistics and their significance in data science.

Inferential Statistics
Sampling Distributions & Estimation
Hypothesis Testing (One and Two Group Means)
Hypothesis Testing (Categorical Data)
Hypothesis Testing (More Than Two Group Means)
Quantitative Data (Correlation & Regression)
Significance in Data Science

Inferential Statistics

Inferential statistics allows you to make inferences about the population from the sample data.

Population & Sample

A sample is a representative subset of a population. Conducting a census on the population is an ideal but impractical approach in most cases. Sampling is much more practical, however, it is prone to sampling error. A sample non-representative of the population is called bias, the method chosen for such sampling is called sampling bias. Convenience bias, judgment bias, size bias, response bias are the main types of sampling bias. The best technique for reducing bias in sampling is randomization. Simple random sampling is the simplest of randomization techniques, cluster sampling & stratified sampling are other systematic sampling techniques.

Sampling Distributions

Sample means to become more and more normally distributed around the true mean (the population parameter) as we increase our sample size. The variability of the sample means decreases as the sample size increases.

Central Limit Theorem

The Central Limit Theorem is used to help us understand the following facts regardless of whether the population distribution is normal or not:

the mean of the sample means is the same as the population mean.
the standard deviation of the sample means is always equal to the standard error.
the distribution of sample means will become increasingly more normal as the sample size increases.

Confidence Intervals

A sample mean can be referred to as a point estimate of a population mean. A confidence interval is always centered around the mean of your sample. To construct the interval, you add a margin of error. The margin of error is found by multiplying the standard error of the mean by the z-score of the percent confidence level:

The confidence level indicates the number of times out of 100 that the mean of the population will be within the given interval of the sample mean.

Hypothesis Testing

Hypothesis testing is a kind of statistical inference that involves asking a question, collecting data, and then examining what the data tells us about how to proceed. The hypothesis to be tested is called the null hypothesis and given the symbol Ho. We test the null hypothesis against an alternative hypothesis, which is given the symbol Ha.

When a hypothesis is tested, we must decide on how much of a difference between means is necessary in order to reject the null hypothesis. Statisticians first choose a level of significance or alpha(α) level for their hypothesis test.

Critical values are the values that indicate the edge of the critical region. Critical regions describe the entire area of values that indicate you reject the null hypothesis.

These are the four basic steps we follow for (one & two group means) hypothesis testing:

State the null and alternative hypotheses.
Select the appropriate significance level and check the test assumptions.
Analyze the data and compute the test statistic.
Interpret the result.

Hypothesis Testing (One and Two Group Means)

Hypothesis Test on One Sample Mean When the Population Parameters are Known

We find the z-statistic of our sample mean in the sampling distribution and determine if that z-score falls within the critical(rejection) region or not. This test is only appropriate when you know the true mean and standard deviation of the population.

Hypothesis Tests When You Don’t Know Your Population Parameters

The Student’s t-distribution is similar to the normal distribution, except it is more spread out and wider in appearance, and has thicker tails. The differences between the t-distribution and the normal distribution are more exaggerated when there are fewer data points and therefore fewer degrees of freedom.

Estimation as a follow-up to a Hypothesis Test

When a hypothesis is rejected, it is often useful to turn to estimation to try to capture the true value of the population mean.

Two-Sample T-Tests

Independent Vs Dependent Samples

When we have independent samples we assume that the scores of one sample do not affect the other.

In two dependent samples of data, each score in one sample is paired with a specific score in the other sample.

Hypothesis Testing (Categorical Data)

Chi-square test is used for categorical data and it can be used to estimate how closely the distribution of a categorical variable matches an expected distribution (the goodness-of-fit test), or to estimate whether two categorical variables are independent of one another (the test of independence).

degree of freedom (d f) = no. of categories(c)−1

degree of freedom (df) = (rows−1)(columns−1)

Hypothesis Testing (More Than Two Group Means)

Analysis of Variance (ANOVA) allows us to test the hypothesis that multiple population means and variances of scores are equal. We can conduct a series of t-tests instead of ANOVA but that would be tedious due to various factors.

We follow a series of steps to perform ANOVA:

Calculate the total sum of squares (SST )
Calculate the sum of squares between (SSB)
Find the sum of squares within groups (SSW ) by subtracting
Next solve for degrees of freedom for the test
Using the values, you can now calculate the Mean Squares Between (MSB) and Mean Squares Within (MSW ) using the relationships below
Finally, calculate the F statistic using the following ratio
It is easy to fill in the Table from here — and also to see that once the SS and df are filled in, the remaining values in the table for MS and F are simple calculations
Find F critical

If F-value from the ANOVA test is greater than the F-critical value, so we would reject our Null Hypothesis.

One-Way ANOVA

One-way ANOVA method is the procedure for testing the null hypothesis that the population means and variances of a single independent variable are equal.

Two-Way ANOVA

Two-way ANOVA method is the procedure for testing the null hypothesis that the population means and variances of two independent variables are equal. With this method, we are not only able to study the effect of two independent variables, but also the interaction between these variables.

We can also do two separate one-way ANOVA but two-way ANOVA gives us Efficiency, Control & Interaction.

Quantitative Data (Correlation & Regression)

Correlation

Correlation refers to a mutual relationship or association between quantitative variables. It can help in predicting one quantity from another. It often indicates the presence of a causal relationship. It used as a basic quantity and foundation for many other modeling techniques.

Regression

Regression analysis is a set of statistical processes for estimating the relationships among variables.

Simple Regression

This method uses a single independent variable to predict a dependent variable by fitting the best relationship.

Multiple Regression

This method uses more than one independent variable to predict a dependent variable by fitting the best relationship.

It works best when multicollinearity is absent. It’s a phenomenon in which two or more predictor variables are highly correlated.

Nonlinear Regression

In this method, observational data are modeled by a function which is a nonlinear combination of the model parameters and depends on one or more independent variables.

Significance in Data Science

In data science, inferential statistics is used is many ways:

Making inferences about the population from the sample.
Concluding whether a sample is significantly different from the population.
If adding or removing a feature from a model will really help to improve the model.
If one model is significantly better than the other?
Hypothesis testing in general.

This is the 5th part of the blog post ‘Probability & Statistics for Data Science’, this part covers these topics related to Bayesian statistics and their significance in data science.

Frequentist Vs Bayesian Statistics
Bayesian Inference
Test for Significance
Significance in Data Science

Frequentist Vs Bayesian Statistics

Frequentist Statistics test whether an event (hypothesis) occurs or not. It calculates the probability of an event in the long run of the experiment. A very common flaw found in the frequentist approach i.e. dependence of the result of an experiment on the number of times the experiment is repeated.

Frequentist statistics suffered some great flaws in its design and interpretation which posed a serious concern in all real-life problems:

p-value & Confidence Interval (C.I) depend heavily on the sample size.
Confidence Intervals (C.I) are not probability distributions

Bayesian statistics is a mathematical procedure that applies probabilities to statistical problems. It provides people the tools to update their beliefs in the evidence of new data.

Bayesian Inference

To understand Bayesian Inference, you need to understand Conditional Probability & Bayes Theorem, if you want to review these concepts, please refer to my earlier post in this series.

Bayesian inference is a method of statistical inference in which Bayes’ theorem is used to update the probability for a hypothesis as more evidence or information becomes available.

An important part of Bayesian Inference is the establishment of parameters and models. Models are the mathematical formulation of the observed events. Parameters are the factors in the models affecting the observed data. To define our model correctly, we need two mathematical models beforehand. One represents the likelihood function and the other representing the distribution of prior beliefs. The product of these two gives the posterior belief distribution.

Likelihood Function

A likelihood function is a function of the parameters of a statistical model, given specific observed data. Probability describes the plausibility of a random outcome, without reference to any observed data while Likelihood describes the plausibility of a model parameter value, given specific observed data.

Prior & Posterior Belief distribution

Prior Belief distribution is used to represent our strengths on beliefs about the parameters based on the previous experience. Posterior Belief distribution is derived from the multiplication of likelihood function & Prior Belief distribution.

As we collect more data, our posterior belief moves towards prior belief from likelihood:

Test for Significance

Bayes factor

Bayes factor is the equivalent of p-value in the Bayesian framework. The null hypothesis in Bayesian framework assumes ∞ probability distribution only at a particular value of a parameter (say θ=0.5) and a zero probability else where. The alternative hypothesis is that all values of θ are possible, hence a flat curve representing the distribution.

Using Bayes Factor instead of p-values is more beneficial in many cases since they are independent of intentions and sample size.

High Density Interval (HDI)

High Density Interval (HDI) or Credibility Interval is equivalent to Confidence Interval (CI) in Bayesian framework. HDI is formed from the posterior distribution after observing the new data.

Using High Density Interval (HDI) instead of Confidence Interval (CI) is more beneficial since they are independent of intentions and sample size.

Moreover, there is a nice article published on AnalyticsVidhya on this which elaborate on these concepts with examples:

Significance in Data Science

Bayesian statistics encompasses a specific class of models that could be used for Data Science. Typically, one draws on Bayesian models for one or more of a variety of reasons, such as:

having relatively few data points
having strong prior intuitions
having high levels of uncertainty

This is the 6th & last post of blog post series ‘Probability & Statistics for Data Science’, this post covers these topics related to Statistical Learning and their significance in data science.

Introduction
Prediction & Inference
Parametric & Non-parametric methods
Prediction Accuracy and Model Interpretability
Bias-Variance Trade-Off

Introduction

Statistical learning is a framework for understanding data based on statistics, which can be classified as supervised or unsupervised. Supervised statistical learning involves building a statistical model for predicting, or estimating, an output based on one or more inputs, while in unsupervised statistical learning, there are inputs but no supervising output; but we can learn relationships and structure from such data.

One of the simple ways to understand statistical learning is to determine the association between predictors (independent variables, features) & response (dependent variable) and developing an accurate model that can predict response variable (Y) on basis of predictor variables (X).

Y = f(X) + ɛ where X = (X1,X2, . . .,Xp), f is an unknown function & ɛ is random error (reducible & irreducible).

Prediction & Inference

In situations where a set of inputs X are readily available, but the output Y is not known, we often treat f as black box (not concerned with the exact form of f), as long as it yields accurate predictions for Y . This is prediction.

There are situations where we are interested in understanding the way that Y is affected as X change. In this situation we wish to estimate f, but our goal is not necessarily to make predictions for Y . Here we are more interested in understanding relationship between X and Y. Now f cannot be treated as a black box, because we need to know its exact form. This is inference.

In real life, will see a number of problems that fall into the prediction setting, the inference setting, or a combination of the two.

Parametric & Non-parametric methods

When we make an assumption about the functional form of f and try to estimate f by estimating the set of parameters, these methods are called parametric methods.

f(X) = β0 + β1X1 + β2X2 + . . . + βpXp

Non-parametric methods do not make explicit assumptions about the form of f, instead, they seek an estimate of f that gets as close to the data points as possible.

Prediction Accuracy and Model Interpretability

Of the many methods that we use for statistical learning, some are less flexible, or more restrictive. When inference is the goal, there are clear advantages to using simple and relatively inflexible statistical learning methods. When we are only interested in prediction, we use flexible models available.

Assessing Model Accuracy

There is no free lunch in statistics, which means no one method dominates all others over all possible data sets. In the regression setting, the most commonly-used measure is the mean squared error (MSE). In the classification setting, the most commonly-used measure is the confusion matrix. The fundamental property of statistical learning is that, as model flexibility increases, training error will decrease, but the test error may not.

Bias & Variance

Bias are the simplifying assumptions made by a model to make the target function easier to learn. Parametric models have a high bias making them fast to learn and easier to understand but generally less flexible. Decision Trees, k-Nearest Neighbors, and Support Vector Machines are low-bias machine learning algorithms. Linear Regression, Linear Discriminant Analysis, and Logistic Regression are high-bias machine learning algorithms.

Variance is the amount that the estimate of the target function will change if different training data was used. Non-parametric models that have a lot of flexibility have a high variance. Linear Regression, Linear Discriminant Analysis, and Logistic Regression are low-variance machine learning algorithms. Decision Trees, k-Nearest Neighbors, and Support Vector Machines are high-variance machine learning algorithms.

Bias-Variance Trade-Off

The relationship between bias and variance in statistical learning is such that:

Increasing bias will decrease variance.
Increasing variance will decrease bias.

There is a trade-off at play between these two concerns and the models we choose and the way we choose to configure them are finding different balances in this trade-off for our problem.

In both the regression and classification settings, choosing the correct level of flexibility is critical to the success of any statistical learning method. The bias-variance trade-off, and the resulting U-shape in the test error, can make this a difficult task.

References

What is the use of probability in data science?

Answer (1 of 3): Probability is the determining factor in Predictive Analytics. There are basically two hypothesis. 1…

www.quora.com

How important knowing Probability and Statistics is for Data Science?

Data Science is a subject that is dear to me and I have found a lot of resonance, since I quit my corporate outsourcing…

www.linkedin.com

Bayes' Theorem and Conditional Probability | Brilliant Math & Science Wiki

Bayes' theorem is a formula that describes how to update the probabilities of hypotheses when given evidence. It…

brilliant.org

Random Variables

random variable , usually written X, is a variable whose possible values are numerical outcomes of a random phenomenon…

www.stat.yale.edu

Statistics - Random variables and probability distributions

Statistics - Statistics - Random variables and probability distributions: A random variable is a numerical description…

www.britannica.com

Probability Distribution: List of Statistical Distributions - Statistics How To

Statistics Definitions > Contents: Definition of a Probability Distribution A to Z List of Distributions A probability…

www.statisticshowto.datasciencecentral.com

What are probability distributions used for?

Answer (1 of 5): You can use probability distributions to model and predict the outcomes of your system. The most…

www.quora.com

Introduction to Statistics

Statistics is a mathematical science including methods of collecting, organizing and analyzing data in such a way that…

www.analyzemath.com

Population vs Sample

The study of statistics revolves around the study of data sets. This lesson describes two important types of data sets…

stattrek.com

Understanding Descriptive Statistics

Statistics is a branch of mathematics that deals with collecting, interpreting, organization and interpretation of…

towardsdatascience.com

Types of Variables in Statistics and Research - Statistics How To

A List of Common and Uncommon Types of Variables Watch the video for a brief overview of several common types of…

www.statisticshowto.datasciencecentral.com

Statistical Language

Statistical Language helps you to understand a range of statistical concepts and terms with simple explanations…

www.abs.gov.au

Univariate Analysis: Definition, Examples - Statistics How To

Statistics Definitions > Univariate Analysis Univariate analysis is the simplest form of analyzing data. "Uni" means…

www.statisticshowto.datasciencecentral.com

Bivariate Analysis

Bivariate analysis is the simultaneous analysis of two variables (attributes). It explores the concept of relationship…

www.saedsayad.com

inferential statistics

Inferential Statistics Author(s) Mikki Hebl and David Lane Prerequisites Descriptive Statistics Learning Objectives…

onlinestatbook.com

Population vs Sample

The study of statistics revolves around the study of data sets. This lesson describes two important types of data sets…

stattrek.com

Introduction to Sampling Distributions

Introduction to Sampling Distributions Author(s) David M. Lane Prerequisites Distributions, Inferential Statistics…

onlinestatbook.com

Confidence Intervals

An interval of 4 plus or minus 2 A Confidence Interval is a range of values we are fairly sure our true value lies in…

www.mathsisfun.com

Your Guide to Master Hypothesis Testing in Statistics

Introduction - the difference in mindset I started my career as a MIS professional and then made my way into Business…

www.analyticsvidhya.com

1.3.5.3. Two-Sample

1. Exploratory Data Analysis 1.3. EDA Techniques 1.3.5. Quantitative Techniques Purpose: Test if two population means…

www.itl.nist.gov

ANOVA Test: Definition, Types, Examples - Statistics How To

Watch the video for an introduction to ANOVA. Can't see the video? Click here. Still having trouble? Chegg.com will…

www.statisticshowto.datasciencecentral.com

One-Way vs Two-Way ANOVA: Differences, Assumptions and Hypotheses

A key statistical test in research fields including biology, economics and psychology, Analysis of Variance (ANOVA) is…

www.technologynetworks.com

Frequentist vs. Bayesian Inference - The Basics of Bayesian Statistics | Coursera

Video created by Duke University for the course "Bayesian Statistics". Welcome! Over the next several weeks, we will…

www.coursera.org

Likelihood function - Wikipedia

In statistics, the likelihood function (often simply called the likelihood) measures the goodness of fit of a…

en.wikipedia.org

(Pesky?) Priors

When I tell people I am learning Bayesian statistics, I tend to get one of two responses: either people look at me…

jimgrange.wordpress.com

Replacing p-values with Bayes-Factors: A Miracle Cure for the Replicability Crisis in Psychological…

How Science Should Work Lay people, undergraduate students, and textbook authors have a simple model of science…

replicationindex.wordpress.com

What's the relationship between bayesian statistics and machine learning?

Answer (1 of 3): Ok, let me try this without going into much technical details... Machine learning is this generic term…

www.quora.com

What is the difference between an inference and a prediction?

Answer (1 of 5): In both the cases, you have some input variables and then you have the response on the other side…

www.quora.com

Parametric and Nonparametric Machine Learning Algorithms - Machine Learning Mastery

What is a parametric machine learning algorithm and how is it different from a nonparametric machine learning…

machinelearningmastery.com

What is an intuitive explanation for bias-variance tradeoff?

Answer (1 of 24): Trade-off: a balance achieved between two desirable but incompatible features; a compromise. The bias…

www.quora.com

Gentle Introduction to the Bias-Variance Trade-Off in Machine Learning - Machine Learning Mastery

Supervised machine learning algorithms can best be understood through the lens of the bias-variance trade-off. In this…

machinelearningmastery.com

Ankit Rathi is a Principal Data Scientist, published author & well-known speaker. His interest lies primarily in building end-to-end AI applications/products following best practices of Data Engineering and Architecture.

A Comprehensive Guide to Probability & Statistics for Data Science

Probability

Statistics

Statistical Learning

Probability

Why probability is important?

How Probability is used in Data Science?

Conditional Probability

How conditional probability is used in data science?

Random Variables

Probability Distributions

Types of probability distributions

How random variables & probability distributions are used in data science?

Statistics Introduction

Population vs Sample

Descriptive Statistics

Types of Variable

Central Tendency

Spread or Variance

Univariate Analysis

Bivariate Distribution

Multivariate Analysis

Function Models

Significance in Data Science

Inferential Statistics

Population & Sample

Sampling Distributions

Central Limit Theorem

Confidence Intervals

Hypothesis Testing

Hypothesis Testing (One and Two Group Means)

Two-Sample T-Tests

Hypothesis Testing (Categorical Data)

Hypothesis Testing (More Than Two Group Means)

One-Way ANOVA

Two-Way ANOVA

Quantitative Data (Correlation & Regression)

Significance in Data Science

Frequentist Vs Bayesian Statistics

Bayesian Inference

Likelihood Function

Prior & Posterior Belief distribution

Test for Significance

Significance in Data Science

Introduction

Prediction & Inference

Parametric & Non-parametric methods

Prediction Accuracy and Model Interpretability

Assessing Model Accuracy

Bias & Variance

Bias-Variance Trade-Off

What is the use of probability in data science?

Answer (1 of 3): Probability is the determining factor in Predictive Analytics. There are basically two hypothesis. 1…

How important knowing Probability and Statistics is for Data Science?

Data Science is a subject that is dear to me and I have found a lot of resonance, since I quit my corporate outsourcing…

Bayes' Theorem and Conditional Probability | Brilliant Math & Science Wiki

Bayes&#39; theorem is a formula that describes how to update the probabilities of hypotheses when given evidence. It…

Random Variables

random variable , usually written X, is a variable whose possible values are numerical outcomes of a random phenomenon…

Statistics - Random variables and probability distributions

Statistics - Statistics - Random variables and probability distributions: A random variable is a numerical description…

Probability Distribution: List of Statistical Distributions - Statistics How To

Statistics Definitions > Contents: Definition of a Probability Distribution A to Z List of Distributions A probability…

What are probability distributions used for?

Answer (1 of 5): You can use probability distributions to model and predict the outcomes of your system. The most…

Introduction to Statistics

Statistics is a mathematical science including methods of collecting, organizing and analyzing data in such a way that…

Population vs Sample

The study of statistics revolves around the study of data sets. This lesson describes two important types of data sets…

Understanding Descriptive Statistics

Statistics is a branch of mathematics that deals with collecting, interpreting, organization and interpretation of…

Types of Variables in Statistics and Research - Statistics How To

A List of Common and Uncommon Types of Variables Watch the video for a brief overview of several common types of…

Statistical Language

Statistical Language helps you to understand a range of statistical concepts and terms with simple explanations…

Univariate Analysis: Definition, Examples - Statistics How To

Statistics Definitions > Univariate Analysis Univariate analysis is the simplest form of analyzing data. "Uni" means…

Bivariate Analysis

Bivariate analysis is the simultaneous analysis of two variables (attributes). It explores the concept of relationship…

inferential statistics

Bayes' theorem is a formula that describes how to update the probabilities of hypotheses when given evidence. It…