Correlation Analysis

In statistical survey, variables are observed. If observations concern a single variable, the resulting data is a univariate data. On the other hand, if observations concern two or more variables, the resulting data is multivariate data. In the case of two variables, it is bivariate data. Eg: Height and Weight of Students, Price and Demand of the product etc., In these, there are a few paired observations of variables x and y. If the number of such pairs of observations is large, the data can be tabulated. The resulting frequency distribution is called bivariate frequency distribution.

[CORRELATION]

[When data regarding two or more variables are available, we may study the related variation of these variables. For example, in a data regarding height (x) and weight (y) of students of a college, we find that those students who have greater heights would have greater weight, also, students who have lesser height would have lesser weight. This type of related variation among variables is called correlation.

Two variables are said to be correlated if they vary such that

1) The higher values of one variable correspond to the higher values of the other and the lower values of the variable correspond to the lower values of the other.
or
2) The higher values of one variable correspond to the lower values of the other and vice versa.

Generally, it can be seen that, those who are tall will have greater weight, and those who are short will have lesser weight. Thus, height (x) and weight (y) of persons show related variation. And so, they are correlated. On the other hand, production (x) and price (y) of vegetables show variation in opposite directions. Here, the higher the production the lower would be the price. In both the above examples, the variables x and y show related variation. And they are correlated.

Definition

“Correlation is concerned with describing the degree of relation between two variable” - Ferguson

“If two or more quantities vary in sympathy so that the movements in one tends to be accompanied by corresponding movements in others then they are said to be correlated” – L R Conner]

[Types of Correlation

On the basis of Direction

Positive/Direct Correlation: If variables vary in the same direction, that is, if they increase and decrease together, it is said to be Positive correlation. Eg: Height and Weight of students.

Negative/Indirect Correlation: If variables vary in the opposite directions, that is, if one variable increases and the other decreases, it is said to be Negative correlation. It is also known as Inverse correlation. Eg: Demand and Price of a product.

Zero Correlation: If variables do not show related variation, they are said to be non-correlated. Eg: Weight and Color of a person.]

On the basis of Number of Sets

Simple Correlation: The relationship between only two variables are studied.

Multiple Correlation: The relationship is studied among three or more variables.

Multiple Correlation may be Partial or Total correlation.

Partial: The relationship of two or more variables is studied in such a way that only one dependent variable and one independent variable is considered. Other variable are kept constant.
Complete: Under this, relationship exists among all the variables is studied.

On the basis of Change

Linear Correlation: When the amount of change in one variable tends to keep a constant ration to the amount of change in the other variable, then it is said to be linear correlation.

Non-Linear Correlation: When the amount of change in one variable does not bear a constant ratio to the amount of change in the other variable, then it is said to be non-linear correlation. 
This distinction is based upon the consistency of the ratio of change between the variable.


Correlation & Causation

A ‘cause and effect’ relationship between two variable is called causation. Eg: Production and price of electronic goods is having cause and effect relationship. Because an increase in production causes decrease in price. But in some correlation cases, even when there is absence of causation, variable may show correlation. This correlation in the absence of causation is called non-sense correlation or spurious correlation. Eg: Population of two countries. It may be due to the reasons like, Pure change correlation, When the correlated variables are influenced by one or more variables or When the variable mutually influence each other so that neither can be called the cause of other.

Measurement of Correlation

1. Scatter diagram or dot diagram method.
2. Karl Pearson’s or Product moment coefficient of correlation method.
3. Spearman’s coefficient of rank correlation method.

Scatter diagram or dot diagram method.

The graphical presentation of bivariate data is called Scatter diagram or dot diagram. The two variables are taken along the two axes and every pair of value in the data is represented by a point on the graph. 

Deekshith Kumar
Scatter diagrams of correlation

If the points form a line with positive slope, the variables are Perfect positive correlated. (r=+1)
If the points form a line with negative slope, the variables are Perfect negative correlated. (r=-1)
If the points cluster around a line with positive slope, the variables are positively correlated (r>0)
If the points cluster around a line with negative slope, the variables are negatively correlated (r<0)
Any other curve form of spread of points indicates curvilinear relation between the variables. 
If the points spread all over the graph, the variables are non-correlated. (r=0)

[Karl Pearson’s coefficient of correlation

Karl Pearson (1867-1936) was an English mathematician and biostastician. It is a measure of linear relationship between the two variables. It indicates the degree of correlation between the two variables. The coefficient of correlation between x and y is denoted by ‘rxy Or ‘r’

Coefficient of correlation between two variables x and y is 


Simplified formula



Interpretation of coefficient of correlation

  • A positive value of ’r’ indicates positive correlation.
  • A negative value of ’r’ indicates negative correlation.
  • r = + 1 means, correlation is perfect positive.
  • r = - 1 means, correlation is perfect negative.
  • r = 0 (or low) means, the variable are non-correlated.

Properties of coefficient of correlation

  • The coefficient of correlation is independent of the units of measurement of the variable.
  • The coefficient of correlation is independent of the origin and scales of measurement of the variable.
  • The coefficient of correlation is a value between -1 and +1. ]


=========================================

Do Not Write below this note
=========================================

Redrafted for Educational Purpose.


Deekshith Kumar,
Assistant Professor of Commerce


Book Reference:

1. Business Statistics by Raj Mohan

2. Business Statistics by S P Gupta and M P Gupta

3. Elementary Statistical methods by S P Gupta



Comments

Popular posts from this blog

ಪೂರ್ಣಚಂದ್ರ ತೇಜಸ್ವಿ ಎಂಬ ಪರಿಸರದ ಕೂಸು

Indian Corporate Law Quiz Series