Correlation Analysis
In statistical survey, variables are observed. If observations concern a single variable, the resulting data is a univariate data. On the other hand, if observations concern two or more variables, the resulting data is multivariate data. In the case of two variables, it is bivariate data. Eg: Height and Weight of Students, Price and Demand of the product etc., In these, there are a few paired observations of variables x and y. If the number of such pairs of observations is large, the data can be tabulated. The resulting frequency distribution is called bivariate frequency distribution.
[CORRELATION]
[When data regarding two or more variables are available, we may study the related variation of these variables. For example, in a data regarding height (x) and weight (y) of students of a college, we find that those students who have greater heights would have greater weight, also, students who have lesser height would have lesser weight. This type of related variation among variables is called correlation.
Two variables are said to be correlated if they vary such that
Generally, it can be seen that, those who are tall will have greater weight, and those who are short will have lesser weight. Thus, height (x) and weight (y) of persons show related variation. And so, they are correlated. On the other hand, production (x) and price (y) of vegetables show variation in opposite directions. Here, the higher the production the lower would be the price. In both the above examples, the variables x and y show related variation. And they are correlated.
Definition
“Correlation is concerned with describing the degree of relation between two variable” - Ferguson
“If two or more quantities vary in sympathy so that the movements in one tends to be accompanied by corresponding movements in others then they are said to be correlated” – L R Conner]
[Types of Correlation
Positive/Direct Correlation: If variables vary in the same direction, that is, if they increase and decrease together, it is said to be Positive correlation. Eg: Height and Weight of students.
Zero Correlation: If variables do not show related variation, they are said to be non-correlated. Eg: Weight and Color of a person.]
On the basis of Number of Sets
Simple Correlation: The relationship between only two variables are studied.
Multiple Correlation: The relationship is studied among three or more variables.
Multiple Correlation may be Partial or Total correlation.
Partial: The relationship of two or more variables is studied in such a way that only one dependent variable and one independent variable is considered. Other variable are kept constant.Complete: Under this, relationship exists among all the variables is studied.
On the basis of Change
Linear Correlation: When the amount of change in one variable tends to keep a constant ration to the amount of change in the other variable, then it is said to be linear correlation.
Non-Linear Correlation: When the amount of change in one variable does not bear a constant ratio to the amount of change in the other variable, then it is said to be non-linear correlation.
This distinction is based upon the consistency of the ratio of change between the variable.
A ‘cause and effect’ relationship between two variable is called causation. Eg: Production and price of electronic goods is having cause and effect relationship. Because an increase in production causes decrease in price. But in some correlation cases, even when there is absence of causation, variable may show correlation. This correlation in the absence of causation is called non-sense correlation or spurious correlation. Eg: Population of two countries. It may be due to the reasons like, Pure change correlation, When the correlated variables are influenced by one or more variables or When the variable mutually influence each other so that neither can be called the cause of other.
Measurement of Correlation
Scatter diagram or dot diagram method.
The graphical presentation of bivariate data is called Scatter diagram or dot diagram. The two variables are taken along the two axes and every pair of value in the data is represented by a point on the graph.
Scatter diagrams of correlation |
If the points form a line with negative slope, the variables are Perfect negative correlated. (r=-1)
[Karl Pearson’s coefficient of correlation
Karl Pearson (1867-1936) was an English mathematician and biostastician. It is a measure of linear relationship between the two variables. It indicates the degree of correlation between the two variables. The coefficient of correlation between x and y is denoted by ‘rxy’ Or ‘r’
Coefficient of correlation between two variables x and y is
Simplified formula
Interpretation of coefficient of correlation
- A positive value of ’r’ indicates positive correlation.
- A negative value of ’r’ indicates negative correlation.
- r = + 1 means, correlation is perfect positive.
- r = - 1 means, correlation is perfect negative.
- r = 0 (or low) means, the variable are non-correlated.
Properties of coefficient of correlation
- The coefficient of correlation is independent of the units of measurement of the variable.
- The coefficient of correlation is independent of the origin and scales of measurement of the variable.
- The coefficient of correlation is a value between -1 and +1. ]
Redrafted for Educational Purpose.
Book Reference:
1. Business Statistics by Raj Mohan
2. Business Statistics by S P Gupta and M P Gupta
3. Elementary Statistical methods by S P Gupta
Comments
Post a Comment