How do you manually calculate correlation coefficient?

If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

There are many questions to ask when looking at a scatterplot. One of the most common is wondering how well a straight line approximates the data. To help answer this, there is a descriptive statistic called the correlation coefficient. We will see how to calculate this statistic.

The Correlation Coefficient

The correlation coefficient, denoted by r, tells us how closely data in a scatterplot fall along a straight line. The closer that the absolute value of r is to one, the better that the data are described by a linear equation. If r =1 or r = -1 then the data set is perfectly aligned. Data sets with values of r close to zero show little to no straight-line relationship.

Due to the lengthy calculations, it is best to calculate r with the use of a calculator or statistical software. However, it is always a worthwhile endeavor to know what your calculator is doing when it is calculating. What follows is a process for calculating the correlation coefficient mainly by hand, with a calculator used for the routine arithmetic steps.

Steps for Calculating r

We will begin by listing the steps to the calculation of the correlation coefficient. The data we are working with are paired data, each pair of which will be denoted by (xi,yi).

  1. We begin with a few preliminary calculations. The quantities from these calculations will be used in subsequent steps of our calculation of r:
    1. Calculate x̄, the mean of all of the first coordinates of the data xi.
    2. Calculate ȳ, the mean of all of the second coordinates of the data
    3. yi.
    4. Calculate s x the sample standard deviation of all of the first coordinates of the data xi.
    5. Calculate s y the sample standard deviation of all of the second coordinates of the data yi.
  2. Use the formula (zx)i = (xi – x̄) / s x and calculate a standardized value for each xi.
  3. Use the formula (zy)i = (yi – ȳ) / s y and calculate a standardized value for each yi.
  4. Multiply corresponding standardized values: (zx)i(zy)i
  5. Add the products from the last step together.
  6. Divide the sum from the previous step by n – 1, where n is the total number of points in our set of paired data. The result of all of this is the correlation coefficient r.

This process is not hard, and each step is fairly routine, but the collection of all of these steps is quite involved. The calculation of the standard deviation is tedious enough on its own. But the calculation of the correlation coefficient involves not only two standard deviations, but a multitude of other operations.

An Example

To see exactly how the value of r is obtained we look at an example. Again, it is important to note that for practical applications we would want to use our calculator or statistical software to calculate r for us.

We begin with a listing of paired data: (1, 1), (2, 3), (4, 5), (5,7). The mean of the x values, the mean of 1, 2, 4, and 5 is x̄ = 3. We also have that ȳ = 4. The standard deviation of the

x values is sx = 1.83 and sy = 2.58. The table below summarizes the other calculations needed for r. The sum of the products in the rightmost column is 2.969848. Since there are a total of four points and 4 – 1 = 3, we divide the sum of the products by 3. This gives us a correlation coefficient of r = 2.969848/3 = 0.989949.

Table for Example of Calculation of Correlation Coefficient

xyzxzyzxzy11-1.09544503-1.1618949581.27279205723-0.547722515-0.3872983190.212132009450.5477225150.3872983190.212132009571.095445031.1618949581.272792057

Cite this Article

Format

Your Citation

Taylor, Courtney. "Calculating the Correlation Coefficient." ThoughtCo. https://www.thoughtco.com/how-to-calculate-the-correlation-coefficient-3126228 (accessed January 2, 2023).

Pearson correlation coefficient, also known as Pearson R statistical test, measures the strength between the different variables and their relationships. Therefore, whenever any statistical test is conducted between the two variables, it is always a good idea for the person analyzing to calculate the value of the correlation coefficient to know how strong the relationship between the two variables is.

Table of contents

Pearson’s correlation coefficient returns a value between -1 and 1. The interpretation of the correlation coefficient is as under:

  • If the correlation coefficient is -1, it indicates a strong negative relationship. It implies a perfect negative relationship between the variables.
  • If the correlation coefficient is 0, it indicates no relationship.
  • If the correlation coefficient is 1, it indicates a strong positive relationship. It implies a perfect positive relationship between the variables.

A higher absolute value of the correlation coefficient indicates a stronger relationship between variables. Thus, a correlation coefficient of 0.78 indicates a stronger positive correlationPositive CorrelationPositive Correlation occurs when two variables display mirror movements, fluctuating in the same direction, and are positively related. In layman's terms, if one variable increases by 10%, the other variable grows by 10% as well, and vice versa.read more than a value of 0.36. Similarly, a correlation coefficient of -0.87 indicates a stronger negative correlationNegative CorrelationA negative correlation is an effective relationship between two variables in which the values of the dependent and independent variables move in opposite directions. For example, when an independent variable increases, the dependent variable decreases, and vice versa.read more than a correlation coefficient of -0.40.

How do you manually calculate correlation coefficient?

You are free to use this image on your website, templates, etc., Please provide us with an attribution linkHow to Provide Attribution?Article Link to be Hyperlinked
For eg:
Source: Pearson Correlation Coefficient (wallstreetmojo.com)

In other words, if the value is in the positive range, the relationship between variables is positively correlated, and both values decrease or increase together. On the other hand, if the value is in the negative range, it shows that the relationship between variables is negatively correlated, and both values will go in the opposite direction.

Pearson Correlation Coefficient Formula

Pearson’s Correlation Coefficient formula is as follows,

How do you manually calculate correlation coefficient?
How do you manually calculate correlation coefficient?

You are free to use this image on your website, templates, etc., Please provide us with an attribution linkHow to Provide Attribution?Article Link to be Hyperlinked
For eg:
Source: Pearson Correlation Coefficient (wallstreetmojo.com)

Where,

  • r = Pearson Coefficient
  • n= number of the pairs of the stock
  • ∑xy = sum of products of the paired stocks
  • ∑x = sum of the x scores
  • ∑y= sum of the y scores
  • ∑x2 = sum of the squared x scores
  • ∑y2 = sum of the squared y scores

Explanation

The steps to calculate Pearson correlation coefficient are as follows.

  1. Find out the number of pairs of variables denoted by n. Suppose x consists of 3 variables – 6, 8, 10. Suppose y consists of corresponding three variables: 12, 10, and 20.
  2. List down the variables in two columns.


    How do you manually calculate correlation coefficient?

  3. Find out the product of x and y in the 3rd column.


    How do you manually calculate correlation coefficient?

  4. Find the sum of values of all x and y variables. Write the results at the bottom of the 1st and 2nd columns. Then, write the sum of x*y in the 3rd column.


    How do you manually calculate correlation coefficient?

  5. Find out x2 and y2 in the 4th and 5th columns and their sum at the bottom of the columns.


    How do you manually calculate correlation coefficient?

  6. Insert the values found above in the formula and solve it.

    r = 3*352-24*42 / √(3*200-24^2)*(3*644-42^2)
    = 0.7559

Example of Pearson Correlation Coefficient R

You can download this Pearson Correlation Coefficient Excel Template here – 

Example 1

With the help of the following details in the table, the six people have different ages and weights given below for the calculation of the value of the Pearson R.

Sr NoAge (x)Weight (y)140782217032560431555388064766

Solution:

For the Calculation of the Pearson Correlation Coefficient, we will first calculate the following values,

How do you manually calculate correlation coefficient?

Here the total number of people is 6 so, n=6

Now the calculation of the Pearson R is as follows,

How do you manually calculate correlation coefficient?
  • r = (n (∑xy)- (∑x)(∑y))/(√ [n ∑x2-(∑x)2][n ∑y2– (∑y)2 )
  • r = (6 * (13937)- (202)(409)) / (√ [6 *7280 -(202)2] * [6 * 28365- (409)2 )
  • r = (6 * (13937)- (202) * (409))/(√ [6 *7280 -(202)2] * [6 * 28365- (409)2 )
  • r = (83622- 82618)/(√ [43680 -40804] * [170190- 167281 )
  • r = 1004/(√ [2876] * [2909 )
  • r = 1004 / (√ 8366284)
  • r = 1004 / 2892.452938
  • r = 0.35

Thus the value of the Pearson correlation coefficient is 0.35

Example #2

There are 2 stocks – A and B. Their share prices on particular days are as follows:

Stock A (x)Stock B (y)459508538587605

Find out the Pearson correlation coefficient from the above data.

Solution:

First, we will calculate the following values.

How do you manually calculate correlation coefficient?

The calculation of the Pearson coefficient is as follows,

How do you manually calculate correlation coefficient?
  • r = (5*1935-266*37)/((5*14298-(266)^2)*(5*283-(37)^2))^0.5
  • = -0.9088

Therefore the Pearson correlation coefficient between the two stocks is -0.9088.

Advantages

  • It helps in knowing how strong the relationship between the two variables is. The presence or absence of the correlationCorrelationCorrelation is a statistical measure between two variables that is defined as a change in one variable corresponding to a change in the other. It is calculated as (x(i)-mean(x))*(y(i)-mean(y)) / ((x(i)-mean(x))2 * (y(i)-mean(y))2.read more between the two variables indicates using the Pearson correlation coefficient. It also determines the exact extent to which those variables are correlated.
  • Using this method, one can ascertain the direction of correlation, i.e., whether the correlation between two variables is negative or positive.

Disadvantages

  • The Pearson correlation coefficient R is insufficient to tell the difference between the dependent and independent variables as the correlation coefficient between the variables is symmetric. For example, if a person is trying to know the correlation between high stress and blood pressure, one might find a high value of the correlation, which shows that high stress causes blood pressure. Now, if the variable switches around, then the result, in that case, will also be the same, which shows that stress is due to blood pressure, which makes no sense. Thus, the researcher should be aware of the data he uses for the analysis.
  • Using this method, one cannot get information about the slope of the line as it only states whether any relationship between the two variables exists or not.
  • The Pearson correlation coefficient may likely be misinterpreted, especially in the case of homogeneous data.
  • Compared with the other calculation methods, this method takes much time to arrive at the results.

Important Points

  • The values can range from the value +1 to the value -1, where +1 indicates the perfect positive relationship between the variables considered, -1 indicates the perfect negative relationship between the variables considered, and 0 value indicates that no relationship exists between the variables considered.
  • It is independent of the unit of measurement of the variables. For example, suppose the unit of measurement of one variable is in years while the unit of measurement of the second variable is in kilograms. In that case, even then, the value of this coefficient does not change.
  • The correlation coefficient between the variables is symmetric, which means that the value of the correlation coefficient between Y and X or X and Y will remain the same.

Conclusion

The Pearson correlation coefficient represents the relationship between the two variables, measured on the same interval or ratio scale. It measures the strength of the relationship between the two continuous variables.

It not only states the presence or absence of the correlation between the two variables but also determines the exact extent to which those variables are correlated. It is independent of the unit of measurement of the variables where the values of the correlation coefficient can range from the value +1 to the value -1. However, it is insufficient to tell the difference between the dependent and independent variablesIndependent VariablesIndependent variable is an object or a time period or a input value, changes to which are used to assess the impact on an output value (i.e. the end objective) that is measured in mathematical or statistical or financial modeling.read more.

This article is a guide to the Pearson Correlation Coefficient and its definition. Here, we discuss calculating the Pearson correlation coefficient R using its formula and example. You can learn more about Excel modeling from the following articles: –

How to calculate a correlation coefficient?

How Do You Calculate the Correlation Coefficient? The correlation coefficient is calculated by determining the covariance of the variables and dividing that number by the product of those variables' standard deviations.

What is the formula of simple correlation coefficient?

The correlation coefficient formula is: r=n∑XY−∑X∑Y√(n∑X2−(∑X)2)⋅(n∑Y2−(∑Y)2) r = n ∑ X Y − ∑ X ∑ Y ( n ∑ X 2 − ( ∑ X ) 2 ) ⋅ ( n ∑ Y 2 − ( ∑ Y ) 2 ) .

How to calculate correlation coefficient from mean and standard deviation?

Pearson's Correlation Coefficient r = covariance/(standard deviation x)(standard deviation y) or use r = Sxy/(S2x)(S2y).