Friday, March 4, 2011

Prune tree to a list of taxa

Using the {ape} function drop.tip() we can easily excise a single taxon or a list of taxa from our "phylo" tree object in R. However, it is not immediately obvious how to prune the tree to include, rather than exclude, a specific list of tips. Trina Roberts (now at NESCent) shared a trick to do this with me some time ago, and I thought I'd pass it along to the readers of this blog.

First, let's start with a tree of 10 species:

> tree<-rtree(10)
> write.tree(tree)
[1] "(t8:0.22,((((t3:0.9,(t7:0.48,t2:0.5):0.12):0.47,t6:0.55):0.08,(t5:0.49,(t9:0.71,t10:0.13):0.15):0.7):0.87,(t1:0.72,t4:0.62):0.55):0.47);"


Now, say we want to keep the species t2, t4, t6, t8, and t10 in our pruned tree, we just put these tip names into a vector:

> species<-c("t2","t4","t6","t8","t10")

[More commonly, this vector will probably come from the row names in our data matrix, or we might read it from a text file.]

We create the pruned tree with one command:

> pruned.tree<-drop.tip(tree,tree$tip.label[-match(species, tree$tip.label)])

Now we have our pruned tree, as desired:

> write.tree(pruned.tree)
[1] "(t8:0.22,(((t2:1.09,t6:0.55):0.08,t10:0.98):0.87,t4:1.17):0.47);"

31 comments:

  1. If there are tips in the "species" vector that are not in the tree, match(species,tree$tip.label) will one or mulitple NAs, and the procedure will fail. To avoid this problem, one can just do:
    > pruned.tree<-drop.tip(tree, tree$tip.label[-na.omit(match(species, tree$tip.label))])

    ReplyDelete
  2. Hi Liam-

    Even less code than the -match trick:

    pruned.tree<-drop.tip(tree, setdiff(tree$tip.label, species));

    setdiff is very handy....(as is intersect and %in%)

    ReplyDelete
  3. Cool! Very handy indeed.

    Dan's method will also work even if some of the labels in "species" are not in "tree."

    ReplyDelete
  4. Hey Liam, congratulations for this great blog, just one question related to this, suppose you want to prune a bunch of taxa from a large amount of trees (for example, you want to use the trees from the posterior of BEAST but you want to get rid of some taxa for all the trees). I am a beginner in R and I have no clue of how to do it.

    Any idea?

    thanks a lot!

    J.

    ReplyDelete
  5. Hi. Yes, this is pretty easy. I will write a quick post about this now.

    ReplyDelete
  6. Hi Liam,
    I have a bit of a tricky question. When I use the drop.tip function, the (dropped-tip) tree object will keep all elements from the (original) lists, and therefore these elements (e.g. 95% HPD or ancestral state reconstructions on nodes) do not match the new object anymore (i.e. they do not change along). Do you know a way to make these lists change according to the dropped tips and the new node numbers? I haven't been able to figure it out, unfortunately. Cheers and best wishes, Renske

    ReplyDelete
    Replies
    1. Yup. This is an issue. I have an idea, but it depends how the node information is stored in the 'phylo' object. If you email me your saved workspace (.Rdata) then I will investigate it. Or you could post your question to R-sig-phylo since it doesn't have anything in particular to do with phytools.

      Delete
    2. Hi Renske and Liam, I have the same question and came across this post. Has there been any clarification on this topic? Thanks! All the best, Sara

      Delete
    3. Hi Liam,

      I am still having the same issue in that when i trim my tree the list of node labels is not trimmed along with it. Do you know of a way to trim the node labels as well? I want to plot posterior probabilities in my phylo object onto my tree. Thanks!

      Delete
    4. Hi Kayleigh,
      I solved it in this way. I used the drop.tip in the file with all the trees (if BEAST2 is used it'll be the one from LogCombiner after removing the burn-in). Then you just follow this other post: http://blog.phytools.org/2011/09/dropping-same-set-of-taxa-from-set-of.html
      Then, you have a new file ready for TreeAnnotator which will calculate all you need.

      Delete
  7. The package phyloch with its function read.beast and drop.tips2 used to do the trick… I'm trying to do it again and it doesnt work anymore, apparently due to a failure (hopefully easy to fix) of the function readBeasttable (embedded in read.beast) on last versions of .trees files…

    ReplyDelete
    Replies
    1. Or perhaps it is just because I used the BSSVS procedure that eliminates zero-rates in the rate matrix… Making the length of the "characters state" string inconsistent and thus preventing readBeasttable to properly parse the file…

      Delete
  8. Is it possible to remove a set of labels matching some regular expression - say, a set of labels with a given prefix?

    ReplyDelete
    Replies
    1. Hi Steven.

      Yes - this is possible. I have posted about this here. Let me know if this is what you want to do.

      - Liam

      Delete
  9. This comment has been removed by the author.

    ReplyDelete
  10. A quick question: some of my species' names contain dot, underscore, and number. E.g; Species.n.sp_2. After performing prune; those species' names with dot, underscore and number are not included in the final result and that was my attention. I want to keep these taxa. Thanks for your time and help!

    ReplyDelete
  11. This comment has been removed by the author.

    ReplyDelete
  12. Thanks for this Liam! Is there a way to do this on a multiPhylo object? I tried:
    pruned.tree <- lapply(tree, drop.tip(tree,tree$tip.label[-match(species, tree$tip.label)]))

    and also Dan's approach:
    pruned.tree<-lapply(tree, drop.tip(tree, setdiff(tree$tip.label, species)))

    But keep getting this error: Error in drop.tip(tree, tree$tip.label[-match(species, tree$tip.label)]) :
    object "phy" is not of class "phylo"

    Any ideas?

    Thanks!

    Simon

    ReplyDelete
    Replies
    1. Sure, but you have to make the second argument in your lapply call a function. For instance:

      tips ## my tips to keep
      pruned<-lapply(trees,function(t,tips)
      drop.tip(t,setdiff(t$tip.label,tips)),tips=tips)
      class(pruned)<-"multiPhylo"

      That should work.

      - Liam

      Delete
    2. Perfect, Liam! It works great except when `tips` has just a few species. Then it complains with:

      Error in phy$edge[, 2] : incorrect number of dimensions
      In addition: Warning message:
      In drop.tip(t, setdiff(t$tip.label, tips)) :
      drop all tips of the tree: returning NULL

      It worked for what I needed it, just wanted to warn others who might find this later on.

      Simon

      Delete
  13. Hi Liam,
    I am trying to visualize species tree of 1200+ species with characters mapped on them and am unable to see it clearly due to large number of taxas. Could you please suggest some way to deal with it?
    I have already tried exporting it PNG and PDF formats but those didn't help.

    ReplyDelete
  14. Thanks, Liam. This was really HANDY!

    ReplyDelete
  15. Hi Liam,
    Thank you for your great blog. While I was dropping the selected taxa, faced such an error:
    In cbind(e1, e2, deparse.level = 0) :
    number of rows of result is not a multiple of vector length (arg 1)
    Would you please guide me? there might be a problem with the tree, but not sure and also do now know how to fix it. Thank you in advance.
    >

    ReplyDelete
    Replies
    1. Hi again, Thank to your post in http://blog.phytools.org/2015/12/small-update-to-readnewick.html
      I could fix the error. Thanks Liam!

      Delete
  16. Hi Liam, I have been trying to create an object using my pruned tree, and data. But I get error in ace(x, a, method= "pic": length of phenotype and of phylogenetic data do not match. Any idea of what to do?

    ReplyDelete
    Replies
    1. You could try the name.check( ) function in geiger to see if that tells you where the discrepancy is.

      x should be a named vector & a is your tree. You can try:

      setdiff(a$tip.label,names(x))
      setdiff(names(x),a$tip.label)

      This might also help show you were the problem lies.

      Delete
  17. Hello,

    I am somewhat new to R, the code works great! I am wondering, are branch lengths are preserved this way?

    ReplyDelete
    Replies
    1. Hi!

      Did you finally get an answer about the branch length? If so, I would be interested to know

      Delete
    2. Yes, drop.tip preserves branch lengths.

      Delete
  18. Hello friends,
    I'm trying to plot LTT graph using Ape, but my results output are not corret, the graph I have obtained is not showing the diversification rate. can some help me out

    ReplyDelete

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.