Tuesday, September 11, 2012

New version of phyl.cca and some other updates

First off, I've had a small number of user comments about phyl.cca (my function for phylogenetic canonical correlation analysis; see Revell & Harrison 2008). One is that the rows of the input data matrices have to be in the same order as the tip labels in the tree. I can find no reason why this should be the case (and was not able to replicate the problem with simulating and randomly permuted data), but I have made some changes to the function in the way that the rows of X & Y are sorted, so hopefully this eliminates the problem.

In troubleshooting this bug, however, I also discovered a more serious issue. phyl.cca can perform joint estimation of λ prior to CCA. The problem is that in previous version of the function I computed log(det(V)) instead of determinant(V,logarithm=TRUE). Why is this significant? Well, for trees of even a modest size (and depending on the scale of the tree), det(vcv.phylo(tree)) goes to 0 or Inf pretty easily. This makes log(det(vcv.phylo(tree))) -Inf or Inf, respectively. Since this can sometimes be true over the entire range of λ, this means we will have no power to estimate λ. I've realized that this can be a problem in likelihood calculation for quite some time, so all my other functions computing the log-likelihood use determinant(...,logarithm=TRUE), but until recently I didn't realize this might not have been fixed in phyl.cca. (It is also a problem in phyl.pca since both functions use the same function for computing the likleihood.)

A quick note to users - R automatically spits errors for these calculations at the limit of numerical precision, so you would probably have to have ignored these for this to have been a problem in your analyses. This bug was still somewhat difficult to detect - particularly for phyl.cca, as this function did not previously return the log-likelihood of the fitted λ model.

I also moved the code for the likelihood function for joint λ estimation out of the source files phyl.cca.R & phyl.pca.R, and into a common source file utilities.R. All are available on my phytools page, but a better move for updating the functions would be to install the latest non-static (i.e., non-CRAN) version of phytools, v0.1-93 (here). Remember, to install from source, just download the package zipped tarball, and then type:

> install.packages("phytools_0.1-93.tar.gz", type="source", repos=NULL)

All the affected functions should work just as they did before - if users find that this is not the case, please do report it! The only difference should be that (1) the bug is fixed; and (2) that phyl.cca now reports the log-likelihood of the fitted joint λ model (as mentioned above). To give you a sense of what this looks like, I'll demo the function with some simulated (in this case, uncorrelated) data.

> require(phytools)
> require(geiger)
> tree<-pbtree(n=100)
> X<-fastBM(lambdaTree(tree,0.7),nsim=5)
> Y<-fastBM(lambdaTree(tree,0.7),nsim=4)
> cca<-phyl.cca(tree,X,Y,fixed=FALSE)
> cca
$cor
[1] 0.31327528 0.24224365 0.19474713 0.05333107
$xcoef
           CA1        CA2        CA3         CA4
[1,] -0.5431358  0.2006402 -0.2864722 -0.66959452
[2,] -0.5769728 -0.7635696 -0.4518404  0.39029667
...
$lambda
      lambda          logL
   0.6496061 -1502.6777738
$chisq
[1] 19.296734  9.586766  3.902187  0.267736
$p
[1] 0.5026200 0.6521626 0.6899113 0.8747055


One thing worthy of noting is that (as with many comparative tests) we tend to get very high type I error rates both if phylogenetic signal is simulated and ignored; or if phylogenetic simulation is not simulated, but assumed. For example:

> # first, simulate with phylogenetic signal
> # but conduct standard (i.e., non-phylogenetic) CCA
> X<-fastBM(tree,nsim=5)
> Y<-fastBM(tree,nsim=4)
> cca<-phyl.cca(tree,X,Y,lambda=0,fixed=TRUE)
> cca
$cor
[1] 0.4475067 0.4013825 0.2543455 0.1886160
...
$lambda
 lambda     logL
   0.00 -1582.95
$chisq
[1] 47.211293 26.204985  9.691704  3.405079
$p
[1] 0.0005482501 0.0100392599 0.1382499000 0.1822202083

> # now incorporate phylogeny
> cca<-phyl.cca(tree,X,Y)
> cca
$cor
[1] 0.3510597 0.2884059 0.1348753 0.1122388
...
$lambda
  lambda      logL
   1.000 -1131.117
$chisq
[1] 23.443939 11.080562  2.917425  1.191692
$p
[1] 0.2675254 0.5220274 0.8191373 0.5510960

> # now do the converse - simulating without signal
> X<-fastBM(lambdaTree(tree,0),nsim=5)
> Y<-fastBM(lambdaTree(tree,0),nsim=4)
> cca<-phyl.cca(tree,X,Y)
> cca
$cor
[1] 0.78268746 0.50448553 0.33628950 0.09279904
...
$lambda
  lambda      logL
   1.000 -2276.737
$chisq
[1] 128.8425842  39.7027005  12.0941745   0.8130019
$p
[1] 6.424246e-18 8.055169e-05 5.990056e-02 6.659765e-01

> # now estimate using standard (non-phylogenetic) CCA
> cca<-phyl.cca(tree,X,Y,lambda=0,fixed=TRUE)
> cca
$cor
[1] 0.38150517 0.23933981 0.11302862 0.04147243
...
$lambda
  lambda      logL
   0.000 -1765.266
$chisq
[1] 21.7010083  6.9154811  1.3704468  0.1618157
$p
[1] 0.3569526 0.8631480 0.9676315 0.9222787


The final thing I did was add the likelihood function for joint λ to the NAMESPACE of phytools. This enables the phytools user to write an incredibly simple function for joint λ estimation. I will detail this in a separate post (to improve search).

No comments:

Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.