This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## Correlation matrix with p-values. See http://goo.gl/nahmV for documentation of this function | |
cor.prob <- function (X, dfr = nrow(X) - 2) { | |
R <- cor(X, use="pairwise.complete.obs") | |
above <- row(R) < col(R) | |
r2 <- R[above]^2 | |
Fstat <- r2 * dfr/(1 - r2) | |
R[above] <- 1 - pf(Fstat, 1, dfr) | |
R[row(R) == col(R)] <- NA | |
R | |
} | |
## Use this to dump the cor.prob output to a 4 column matrix | |
## with row/column indices, correlation, and p-value. | |
## See StackOverflow question: http://goo.gl/fCUcQ | |
flattenSquareMatrix <- function(m) { | |
if( (class(m) != "matrix") | (nrow(m) != ncol(m))) stop("Must be a square matrix.") | |
if(!identical(rownames(m), colnames(m))) stop("Row and column names must be equal.") | |
ut <- upper.tri(m) | |
data.frame(i = rownames(m)[row(m)[ut]], | |
j = rownames(m)[col(m)[ut]], | |
cor=t(m)[ut], | |
p=m[ut]) | |
} | |
# get some data from the mtcars built-in dataset | |
mydata <- mtcars[, c(1,3,4,5,6)] | |
# correlation matrix | |
cor(mydata) | |
# correlation matrix with p-values | |
cor.prob(mydata) | |
# "flatten" that table | |
flattenSquareMatrix(cor.prob(mydata)) | |
# plot the data | |
library(PerformanceAnalytics) | |
chart.Correlation(mydata) |
The cor() function will produce a basic correlation matrix. 12 years ago Bill Venables provided a function on the R help mailing list for replacing the upper triangle of the correlation matrix with the p-values for those correlations (based on the known relationship between t and r). The cor.prob() function will produce this matrix.
Finally, the flattenSquareMatrix() function will "flatten" this matrix to four columns: one column for variable i, one for variable j, one for their correlation, and another for their p-value (thanks to Chris Wallace on StackOverflow for helping out with this one).
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
> cor(mydata) | |
mpg disp hp drat wt | |
mpg 1.0000000 -0.8475514 -0.7761684 0.6811719 -0.8676594 | |
disp -0.8475514 1.0000000 0.7909486 -0.7102139 0.8879799 | |
hp -0.7761684 0.7909486 1.0000000 -0.4487591 0.6587479 | |
drat 0.6811719 -0.7102139 -0.4487591 1.0000000 -0.7124406 | |
wt -0.8676594 0.8879799 0.6587479 -0.7124406 1.0000000 | |
> cor.prob(mydata) | |
mpg disp hp drat wt | |
mpg NA 9.380327e-10 1.787835e-07 1.776240e-05 1.293958e-10 | |
disp -0.8475514 NA 7.142679e-08 5.282022e-06 1.222322e-11 | |
hp -0.7761684 7.909486e-01 NA 9.988772e-03 4.145827e-05 | |
drat 0.6811719 -7.102139e-01 -4.487591e-01 NA 4.784260e-06 | |
wt -0.8676594 8.879799e-01 6.587479e-01 -7.124406e-01 NA | |
> flattenSquareMatrix(cor.prob(mydata)) | |
i j cor p | |
1 mpg disp -0.8475514 9.380327e-10 | |
2 mpg hp -0.7761684 1.787835e-07 | |
3 disp hp 0.7909486 7.142679e-08 | |
4 mpg drat 0.6811719 1.776240e-05 | |
5 disp drat -0.7102139 5.282022e-06 | |
6 hp drat -0.4487591 9.988772e-03 | |
7 mpg wt -0.8676594 1.293958e-10 | |
8 disp wt 0.8879799 1.222322e-11 | |
9 hp wt 0.6587479 4.145827e-05 | |
10 drat wt -0.7124406 4.784260e-06 |
Finally, the chart.Correlation() function from the PerformanceAnalytics package produces a very nice scatterplot matrix, with histograms, kernel density overlays, absolute correlations, and significance asterisks (0.05, 0.01, 0.001):