Introduction
A few weeks ago, I needed to find the correlation between two variables at my work. I began searching the net ‘Net for the words “correlation” and “pearson” but couldn’t find any decent piece of code.
Let me define the word correlation first:
“A correlation gives the strength of the relationship between variables” (mathworld.wolfram.com).
A normalized correlation is called the Pearson Value. The Pearson Value ranges between -1 to +1. A correlation of +1 means that there is a perfect positive linear relationship between the variables. If it’s -1, there is a perfect negative relationship. 0 means no relationship at all.
To get the covariance and Pearson, you need to get a few things first.
Implementation
Average
You sum up all values and divide the sum by the number of values.
/// <summary> /// Get average /// </summary> public static double GetAverage( double[] data ) { int len = data.Length; if ( len == 0 ) throw new Exception("No data"); double sum = 0; for ( int i = 0; i < data.Length; i++ ) sum += data[i]; return sum / len; }
Variance & Standard Deviation
The variance is the squared differences from the average. The standard deviation is the square root of the variance.
/// <summary> /// Get variance /// </summary> public static double GetVariance( double[] data ) { int len = data.Length; // Get average double avg = GetAverage( data ); double sum = 0; for ( int i = 0; i < data.Length; i++ ) sum += Math.Pow( ( data[i] - avg ), 2 ); return sum / len; } /// <summary> /// Get standard deviation /// </summary> public static double GetStdev( double[] data ) { return Math.Sqrt( GetVariance( data ) ); }
Covariance & Pearson
To calculate covariance, you need to get the average and standard deviation for each variable. You sum the multiplication of x - Avg(x) and y - Avg(y) and finally divide it by the length of the variables. To get the Pearson value, you divide the covariance by the multiplication of stDevX and stDevY.
/// <summary> /// Get correlation /// </summary> public static void GetCorrelation( double[] x, double[] y, ref double covXY, ref double pearson) { if ( x.Length != y.Length ) throw new Exception("Length of sources is different"); double avgX = GetAverage( x ); double stdevX = GetStdev( x ); double avgY = GetAverage( y ); double stdevY = GetStdev( y ); int len = x.Length; for ( int i = 0; i < len; i++ ) covXY += ( x[i] - avgX ) * ( y[i] - avgY ); covXY /= len; pearson = covXY / ( stdevX * stdevY ); }