Tuesday, November 19, 2013

Spearman's Rank Correlation Coeeficient

In statisticsSpearman's rank correlation coefficient or Spearman's rho, named after Charles Spearman and often denoted by the Greek letter \rho (rho) or as r_s, is a nonparametric measure of statistical dependence between two variables. It assesses how well the relationship between two variables can be described using a monotonic function. If there are no repeated data values, a perfect Spearman correlation of +1 or −1 occurs when each of the variables is a perfect monotone function of the other.
Spearman's coefficient, like any correlation calculation, is appropriate for both continuous and discrete variables, including ordinal variables.

Definition and Calculation
The Spearman correlation coefficient is defined as the Pearson correlation coefficient between the ranked variables. For a sample of size n, the n raw scores X_i, Y_i are converted to ranks x_i, y_i, and ρ is computed from these:
 \rho = \frac{\sum_i(x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum_i (x_i-\bar{x})^2 \sum_i(y_i-\bar{y})^2}}
Identical values (rank ties or value duplicates) are assigned a rank equal to the average of their positions in the ascending order of the values. In the table below, notice how the rank of values that are the same is the mean of what their ranks would otherwise be:
Variable X_iPosition in the ascending orderRank x_i
0.811
1.22\frac{2+3}{2}=2.5\
1.23\frac{2+3}{2}=2.5\
2.344
1855
In applications where duplicate values (ties) are known to be absent, a simpler procedure can be used to calculate ρ. Differences d_i = x_i - y_i between the ranks of each observation on the two variables are calculated, and ρ is given by:
 \rho = 1- {\frac {6 \sum d_i^2}{n(n^2 - 1)}}.
Note that this latter method should not be used in cases where the data set is truncated; that is, when the Spearman correlation coefficient is desired for the top X records (whether by pre-change rank or post-change rank, or both), the user should use the Pearson correlation coefficient formula given above.
The standard error of the coefficient (σ) was determined by Pearson in 1907 and Gosset in 1920. It is
 \sigma = \frac{ 0.6325 }{ ( n - 1 )^{ \frac{ 1 }{ n } } }

No comments:

Post a Comment