# What is regression analysis? Definition and meaning

The definition and meaning of regression analysis, in statistical modeling, is a way of mathematically sorting out a series of variables to determine which ones have an impact and how they relate to one another. In other words, regression analysis helps you determine which factors matter most and which can be ignored. It also helps us determine which factors interact with each other – and possibly most importantly, it helps us find out how certain we are about all the factors we are examining.

Regression analysis is a statistical measure that is used in investing, finance, sales, marketing, science, mathematics and a number of other disciplines that tries to determine how strongly related one *dependent variable* is to a series of other changing variables, usually referred to as *independent variables*.

The *dependent variable* is the one that you focus on – you want to know whether it is being affected, by how much, and by what. *Independent variables* are the factors that may or may not affect the dependent variable (*Dependent *receives the impact, while *Independent* provides – or not – the impact)

Financial and investment managers say that it helps them value assets and understand how different variables are related, such as the price of commodities and the shares of companies that deal in those commodities.

*The dependent variable (e.g. sales figures) is on the y axis, and the independent variable (e.g. price) is on the x axis. We try to form a relationship between these two variables and draw a line. A positive relationship is one where both the independent and dependent variables go up and down together. You get a negative relationship when they move in opposite directions.*

**Regression analysis in sales**

Imagine you are a sales manager and you are trying to predict next month’s figures. You know that there are dozens, and maybe even hundreds of factors that can impact the number, from the time of year to rumors that a new improved model is about to be unveiled to a competitor’s promotion.

Maybe work colleagues add their own variables to the mix, by saying that when it snows the company sells more, or that sales take a nosedive about six weeks after a competitor’s promotion.

Regression analysis helps you determine which factors really matter, how those factors that matter are related and what their effect are on sales figures.

We call all these factors *variables*. There is a *dependent variable* – the main factor that we are trying to predict or understand. In your case as the sales manager, the dependent variable is monthly sales.

*This regression analysis chart relates to the the fictitious situation described in this text in which you are a sales manager. It appears that the rumors among your colleagues that snowfall has an impact on sales figures, was accurate. Each red dots represents one month’s worth of data – how many sales were made and how much it snowed that same month.*

There are also *independent variables*, these are other factors which you believe may potentially have an impact on the dependent variable.

For your regression analysis, you have to gather all the information on the variables. You collect all data on your monthly sales numbers for the past quarter, half year, year or three years, plus any data on the independent variables that you want to consider.

For example, if you think snow might impact sales, you will need snowfall data for the past three years. You then plot all that information on a graph.

In an article published in the *Harvard Business Review* in November, 2015, – *A Refresher on Regression Analysis* – Amy Gallo wrote:

“Most companies use regression analysis to explain a phenomenon they want to understand (e.g. why did customer service calls drop last month?); predict things about the future (e.g. what will sales look like over the next six months?); or to decide what to do (e.g. should we go with this promotion or a different one?).”

*According to the Environmental Systems Research Institute (ESRI): “Regressive analyses attempt to demonstrate the degree to which one or more variables potentially promote positive or negative change in another variable.” (Image: adapted from: resources.esri.com)*

According to *BusinessDictionary.com*, regression analysis (RA) by definition is:

“Statistical approach to forecasting change in a dependent variable (sales revenue, for example) on the basis of change in one or more independent variables (population and income, for example).”

“Known also as curve fitting or line fitting because a regression analysis equation can be used in fitting a curve or line to data points, in a manner such that the differences in the distances of data points from the curve or line are minimized.”

**History of regression analysis**

The earliest form of regression that we know of was published by the French mathematician Adrien-Marie Legendre (1752-1833) in 1805, and German mathematician Johann Carl Friedrich Gauss (1777-1855) in 1809 – they both wrote about the *method of the least squares*. The *method of the least squares* is a standard approach in regression analysis when there are more equations than unknowns.

Gauss and Legendre applied the method to the problem of finding out – according to astronomical observations – what the orbits were of various celestial bodies, mainly comets, around the Sun.

In 1821, Gaus published an additional development to the *theory of least squares* in 1821, which included a version of what we call the *Gauss-Markov theorem*.

Sir Francis Galton (1922-1911), a British statistician, who was an expert in several scientific and political fields, coined the term *Regression Analysis* in the 19th century to show that the heights of descendants of very tall ancestors tended to move downward towards a normal average – what we call *regression toward the mean*.

As far as Galton was concerned, regression was only applicable with the meaning used to describe the biological phenomenon that he had discovered. However, later his work was extended by Karl Pearson (1857-1936), an influential British mathematician and biostatistician, and George Udny Yule (1871-1951), also a British (Scottish) statistician, to a more general statistical context.

By the middle of the 20th century, economists were using electromechanical desk calculators for regression analysis calculations. Up to 1970, it could take up to twenty-four hours to obtain the result from one regression.

Today, regression methods are still being actively researched. Over the past few decades, statisticians have developed new methods for:

– **Robust Regression:** – regression involving responses that correlate, such as growth curves and time series.

– **More Complex Regression:** regression in which the independent variable (the predictor) or response variables are images, curves, graphs, or other complex data items.

– **Methods that Address Data Problems:** such as Bayesian methods for regression, non-parametric regression, regression with a greater number of predictor variables than observation, regression in which the predictor variables are incorrectly measured, and causal inference with regression.

**Regressive Analysis in other languages:** Analyse régressive (French), Análisis regresivo (Spanish), Regressive Analyse (German), Análise regressiva (Portuguese), analisi regressiva (Italian), регрессионный анализ (Russian), regressiv analys (Swedish, Danish & Norwegian), regressieve analyse (Dutch), analiza regresyjna (Polish), regressiivinen analyysi (Finnish), 退行的分析 (Japanese), 回归分析 (Chinese), تحليل ارتدادي (Arabic), प्रतिगामी विश्लेषण (Hindi), analisis regresif (Malay), analisis regresi (Indonesian), umuurong analysis (Filipino), and uchambuzi regressive (Swahili).

**Video – Regression Analysis – Definition and Meaning**

In this *Statistics is Fun** video*, the tutor explains what regression analysis is using simple language and easy-to-understand examples.