CCA¶
- class hyppo.independence.CCA¶
Cannonical Correlation Analysis (CCA) test statistic and p-value.
This test can be thought of inferring information from cross-covariance matrices 1. It has been thought that virtually all parametric tests of significance can be treated as a special case of CCA 2. The method was first introduced by Hotelling3.
Notes
The statistic can be derived as follows 4:
Let
and be samples of random variables and . We can center and and then calculate the sample covariance matrix and the variance matrices for and are defined similarly. Then, the CCA test statistic is found by calculating vectors and that maximizeThe p-value returned is calculated using a permutation test using
hyppo.tools.perm_test
.References
- 1
Wolfgang Karl Härdle and Léopold Simar. Canonical Correlation Analysis. In Wolfgang Karl Härdle and Léopold Simar, editors, Applied Multivariate Statistical Analysis, pages 443–454. Springer, Berlin, Heidelberg, 2015. doi:10.1007/978-3-662-45171-7_16.
- 2
Thomas R. Knapp. Canonical correlation analysis: A general parametric significance-testing system. Psychological Bulletin, 85(2):410–416, 1978. doi:10.1037/0033-2909.85.2.410.
- 3
Harold Hotelling. Relations Between Two Sets of Variates, pages 162–190. Springer New York, New York, NY, 1992. URL: https://doi.org/10.1007/978-1-4612-4380-9_14, doi:10.1007/978-1-4612-4380-9_14.
- 4
David R. Hardoon, Sandor Szedmak, and John Shawe-Taylor. Canonical Correlation Analysis: An Overview with Application to Learning Methods. Neural Computation, 16(12):2639–2664, December 2004. doi:10.1162/0899766042321814.
Methods Summary
|
Helper function that calculates the CCA test statistic. |
|
Calculates the CCA test statistic and p-value. |
- CCA.statistic(x, y)¶
Helper function that calculates the CCA test statistic.
- CCA.test(x, y, reps=1000, workers=1, random_state=None)¶
Calculates the CCA test statistic and p-value.
- Parameters
x,y (
ndarray
offloat
) -- Input data matrices.x
andy
must have the same number of samples and dimensions. That is, the shapes must be(n, p)
where n is the number of samples and p is the number of dimensions.reps (
int
, default:1000
) -- The number of replications used to estimate the null distribution when using the permutation test used to calculate the p-value.workers (
int
, default:1
) -- The number of cores to parallelize the p-value computation over. Supply-1
to use all cores available to the Process.
- Returns
Examples
>>> import numpy as np >>> from hyppo.independence import CCA >>> x = np.arange(7) >>> y = x >>> stat, pvalue = CCA().test(x, y) >>> '%.1f, %.2f' % (stat, pvalue) '1.0, 0.00'