Tuesday, August 16, 2011

Discrepancy between read.simmap() and SIMMAP

Sam Price yesterday pointed out to me that my functions read.simmap() and write.simmap() assume the opposite order of the edge states along internal branches created by Jonathan Bollback's program SIMMAP. Yikes! This was news to me, but evidently the mapping ":{1,0.10:0,0.90}" in SIMMAP indicates that the branch started in state 0 and then changed to state 1. Quelle surprise! I had assumed the contrary - that the branch, in this case, started in state 1 (in which it remained for 0.10 units of time) before changing to state 2 for the remainder of the edge.

The good news is that misordering the states along the branches is irrelevant for BM based analyses, such as those implemented in brownie.lite() and evol.vcv(). In fact, in the earliest versions of read.simmap(), I discarded this information entirely. However, trees misread in this way look extremely peculiar when plotted using plotSimmap(). For instance, consider the stochastic map below which was created by a colleague using SIMMAPv1.0 and read into R using read.simmap() from the present version of "phytools." Oops!



(In case it's not immediately apparent what is wrong with this image, close examination of almost any branch with a transition shows that this transition occurs at the node, not along the edge as it should.)

Fortunately, this can easily be solved and I have added it to the latest version of read.simmap(). For backward compatibility, and because I think this ordering makes more sense anyway (no offense intended to JB), I have also retained the option of reading in trees in which the mapped states along an edge are order from root to tip in a left-to-right fashion. This alternative format (still produced by write.simmap() until I fix that too) can be read in by setting the option rev.order=FALSE (default is TRUE).

The fix was very easy to code. Basically, I apply it after reading the tree as before in the following way:

tree$maps<-lapply(tree$maps,function(x) x<-x[length(x):1])

All this does is goes through the list of mapped edges and reverses the ordering of the states, and time spent in each state, for each edge.

This can also be used to convert any tree that has already been read into memory, or even to modify a tree in memory so that it can be written with write.simmap() and then read into another program that accepts standard Newick style SIMMAP format trees.

The new version of read.simmap() is available from my webpage (direct link to code here). I will also put this in the next version of "phytools."

1 comment:

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.