CCA

class hyppo.independence.CCA

Cannonical Correlation Analysis (CCA) test statistic and p-value.

This test can be thought of inferring information from cross-covariance matrices 1. It has been thought that virtually all parametric tests of significance can be treated as a special case of CCA 2. The method was first introduced by Hotelling3.

Notes

The statistic can be derived as follows 4:

Let x and y be (n,p) samples of random variables X and Y. We can center x and y and then calculate the sample covariance matrix Σ^xy=xTy and the variance matrices for x and y are defined similarly. Then, the CCA test statistic is found by calculating vectors aRp and bRq that maximize

CCAn(x,y)=maxaRp,bRqaTΣ^xybaTΣ^xxabTΣ^yyb

The p-value returned is calculated using a permutation test using hyppo.tools.perm_test.

References

1

Wolfgang Karl Härdle and Léopold Simar. Canonical Correlation Analysis. In Wolfgang Karl Härdle and Léopold Simar, editors, Applied Multivariate Statistical Analysis, pages 443–454. Springer, Berlin, Heidelberg, 2015. doi:10.1007/978-3-662-45171-7_16.

2

Thomas R. Knapp. Canonical correlation analysis: A general parametric significance-testing system. Psychological Bulletin, 85(2):410–416, 1978. doi:10.1037/0033-2909.85.2.410.

3

Harold Hotelling. Relations Between Two Sets of Variates, pages 162–190. Springer New York, New York, NY, 1992. URL: https://doi.org/10.1007/978-1-4612-4380-9_14, doi:10.1007/978-1-4612-4380-9_14.

4

David R. Hardoon, Sandor Szedmak, and John Shawe-Taylor. Canonical Correlation Analysis: An Overview with Application to Learning Methods. Neural Computation, 16(12):2639–2664, December 2004. doi:10.1162/0899766042321814.

Methods Summary

CCA.statistic(x, y)

Helper function that calculates the CCA test statistic.

CCA.test(x, y[, reps, workers, random_state])

Calculates the CCA test statistic and p-value.


CCA.statistic(x, y)

Helper function that calculates the CCA test statistic.

Parameters

x,y (ndarray of float) -- Input data matrices. x and y must have the same number of samples and dimensions. That is, the shapes must be (n, p) where n is the number of samples and p is the number of dimensions.

Returns

stat (float) -- The computed CCA statistic.

CCA.test(x, y, reps=1000, workers=1, random_state=None)

Calculates the CCA test statistic and p-value.

Parameters
  • x,y (ndarray of float) -- Input data matrices. x and y must have the same number of samples and dimensions. That is, the shapes must be (n, p) where n is the number of samples and p is the number of dimensions.

  • reps (int, default: 1000) -- The number of replications used to estimate the null distribution when using the permutation test used to calculate the p-value.

  • workers (int, default: 1) -- The number of cores to parallelize the p-value computation over. Supply -1 to use all cores available to the Process.

Returns

  • stat (float) -- The computed CCA statistic.

  • pvalue (float) -- The computed CCA p-value.

Examples

>>>
>>> import numpy as np
>>> from hyppo.independence import CCA
>>> x = np.arange(7)
>>> y = x
>>> stat, pvalue = CCA().test(x, y)
>>> '%.1f, %.2f' % (stat, pvalue)
'1.0, 0.00'

Examples using hyppo.independence.CCA