Regression And Correlation Study Cards

Enhance Your Learning with Regression and Correlation Flash Cards for quick understanding



Regression

A statistical method used to model the relationship between a dependent variable and one or more independent variables.

Correlation

A statistical measure that describes the strength and direction of a relationship between two variables.

Simple Linear Regression

A regression model that assumes a linear relationship between the dependent variable and a single independent variable.

Multiple Linear Regression

A regression model that assumes a linear relationship between the dependent variable and multiple independent variables.

Correlation Coefficient

A numerical measure that quantifies the strength and direction of the linear relationship between two variables.

Scatterplot

A graphical representation of the relationship between two variables, where each data point is plotted on a Cartesian plane.

Residual Analysis

The examination of the differences between observed and predicted values in a regression model to assess the model's fit.

Interpretation of Regression Results

The process of analyzing the coefficients, p-values, and other statistics in a regression model to draw conclusions about the relationship between variables.

Assumptions of Regression

The conditions that must be met for regression analysis to produce valid and reliable results, including linearity, independence, and homoscedasticity.

Regression Diagnostics

The process of evaluating the assumptions and checking for potential issues in a regression model, such as multicollinearity and influential observations.

Correlation vs. Causation

The distinction between a correlation, which indicates a relationship between variables, and causation, which implies that one variable directly affects the other.

Coefficient of Determination

A measure that represents the proportion of the variance in the dependent variable that can be explained by the independent variables in a regression model.

Outliers

Extreme values that deviate significantly from the other data points, potentially influencing the results of a regression analysis.

Heteroscedasticity

A violation of the assumption of homoscedasticity, where the variability of the residuals differs across the range of the independent variable.

Multicollinearity

A situation where two or more independent variables in a regression model are highly correlated, leading to issues with interpretation and estimation of coefficients.

Interaction Effect

The combined effect of two or more independent variables on the dependent variable, which is not simply the sum of their individual effects.

Standard Error

A measure of the variability of the estimated regression coefficients, indicating the precision of the estimates.

Null Hypothesis

The hypothesis that there is no relationship between the independent and dependent variables in a regression model.

Alternative Hypothesis

The hypothesis that there is a relationship between the independent and dependent variables in a regression model.

Confidence Interval

A range of values within which the true population parameter is likely to fall with a certain level of confidence.

P-value

A measure of the strength of evidence against the null hypothesis, indicating the probability of obtaining the observed results by chance alone.

Type I Error

The error of rejecting the null hypothesis when it is actually true, indicating a false positive result.

Type II Error

The error of failing to reject the null hypothesis when it is actually false, indicating a false negative result.

Confounding Variable

An extraneous variable that is related to both the independent and dependent variables, leading to a spurious relationship.

Covariate

A variable that is included in a regression model to control for its potential influence on the relationship between the independent and dependent variables.

Homoscedasticity

The assumption that the variability of the residuals is constant across the range of the independent variable.

Independence

The assumption that the observations in a regression model are independent of each other, with no systematic relationship or influence.

Normality

The assumption that the residuals in a regression model are normally distributed, allowing for valid statistical inference.

Linearity

The assumption that the relationship between the independent and dependent variables in a regression model can be adequately represented by a straight line.

R-squared

A measure of the proportion of the total variation in the dependent variable that is explained by the independent variables in a regression model.

Adjusted R-squared

A modified version of R-squared that adjusts for the number of independent variables in a regression model, providing a more accurate measure of model fit.

F-statistic

A statistical test that compares the overall fit of a regression model to the null hypothesis of no relationship between the independent and dependent variables.

Durbin-Watson Test

A test for the presence of autocorrelation in the residuals of a regression model, indicating whether there is a systematic relationship between the residuals.

Variance Inflation Factor (VIF)

A measure of multicollinearity that quantifies how much the variance of the estimated regression coefficients is inflated due to high correlation between independent variables.

Standardized Residuals

The residuals of a regression model that have been transformed to have a mean of zero and a standard deviation of one, allowing for easier interpretation and comparison.

Cook's Distance

A measure of the influence of each observation on the regression coefficients, indicating how much the coefficients would change if the observation were removed.

Hypothesis Testing

The process of using sample data to make inferences about the population parameters, such as testing the significance of regression coefficients.

Confounding Bias

A bias that occurs when the relationship between the independent and dependent variables is distorted by the presence of a confounding variable.

Sampling Bias

A bias that occurs when the sample used in a study is not representative of the population, leading to inaccurate or misleading results.

Cross-Validation

A technique used to assess the performance of a regression model by splitting the data into training and testing sets, allowing for evaluation of the model's predictive ability.

Overfitting

A situation where a regression model is too complex and captures noise or random fluctuations in the data, resulting in poor generalization to new data.

Underfitting

A situation where a regression model is too simple and fails to capture the underlying patterns or relationships in the data, resulting in poor predictive performance.

Collinearity

A situation where two or more independent variables in a regression model are highly correlated, making it difficult to distinguish their individual effects on the dependent variable.

Confidence Level

The probability that a confidence interval will contain the true population parameter, often expressed as a percentage.

Hypothesis

A statement or assumption about the relationship between variables, which can be tested using statistical methods.

Significance Level

The threshold used to determine whether a result is statistically significant, typically set at 0.05 or 0.01.

Power

The probability of correctly rejecting the null hypothesis when it is false, indicating the ability of a statistical test to detect a true relationship.

Sampling Distribution

The distribution of a statistic, such as the mean or regression coefficient, calculated from multiple samples drawn from the same population.

Central Limit Theorem

A fundamental concept in statistics that states that the sampling distribution of the mean approaches a normal distribution as the sample size increases, regardless of the shape of the population distribution.