bootscore: A Bootstrap Tree Scoring Utility
bootscore is a program that maps non-parametric bootstrap support for nodes/branches on topology (typically a topology resulting from a best tree search on phylogenetic data).
bootscore has now been superceded by SumTrees.
SumTrees does everything bootscore does, but incorporates numerous bugfixes, a more robust NEXUS parser, as well as additional features, such as consensus tree building, burn-in, etc.
The latest version works in one of two modes. In its default bipartition-counting mode, it identifies all distinct bipartitions in the tree to be evaluated, and then scans through a file of bootstrap replicates to identify the percentage or proportional frequency of occurance of each of those bipartitions in each of the bootstrap trees. It can also operate in a clade-counting mode, in which it identifies all distinct monophyletic groups in the tree being assessed, and then counts the number of bootstrap trees in which that particular monophyletic group is recovered. The program outputs a NEXUS treefile with the topology of the given tree and with the bipartition/clade support indicated via node labels or branch lengths (in the former case, the branch lengths of the original tree is retained).
It currently does not take into account tree weight comments in assessing support, though this feature will probably be added in the next version. And no, I have not yet written a GUI for it, and probably will not get around to writing one for it anytime soon.
How to Get the Program
The latest public release of this program can be downloaded from:
How to Set Up the Program
Install Python If You Do Not Already Have It
The program is a Python script, and, as such, you will need to have a Python interpreter installed on your system. If you are using a Linux, Unix or Macintosh (OS X) operating system, you should already have a Python interpreter ready to go, and you can skip this step. Otherwise, you must download and install Python from: http://www.python.org. Microsoft Windows users should also refer to the Python Windows FAQ (http://www.python.org/doc/faq/windows.html) after installing Python, and pay particular attention to the "How do I run a Python program under Windows?" section, as it will help them greatly in getting Python up and running on the system path.
I have developed and tested it using Python version 2.4. The previous versions of the program did not work under Python 2.3, but this one should.
Extract the Script File
This step varies depending on the operating system and the particular programs that you have installed. In most cases, simply double-clicking on the file that you have downloaded should kick off the process. Otherwise, open up a terminal window, navigate to the directory in which you have downloaded the package, and then type:
$ tar xvf bootscore-3.0.tar.gz
This should create a folder
called "bootscore-1.0" with the main
program script file, "bootscore.py", and supporting files.
Install the Script as an Executable on the System Path
You will make life easier for yourself by making the script executable and placing it on the system path. On Linux and Macintosh systems, the following command will do the trick, assuming that you have administrator privileges:
$ chmod a+x bootscore.py /usr/bin $ sudo cp bootscore.py /usr/bin
You will be prompted for your password, after which a copy of the script file will be placed on the system path, meaning that you will be able to invoke the program from any location on your computer without needing a local copy in your current folder.
Microsoft Windows users should refer to "How do I make python scripts executable?" in the Python Windows FAQ for details on how to achieve the same ends.
Data Required by the Program
You will need to provide:
- A tree file specifying the topology (or topologies) that you want to evaluate, in Newick or NEXUS format.
- A file of bootstrap trees, also in Newick or NEXUS format.
Typically, (1) will be the result of your best tree search or searches, while (2) will be the result of your non-parametric bootstrap runs. You can have more than one tree specified in the treefile given in (1); "bootscore" will score each tree independentally, but save them all in a single tree block in the same output file. "bootscore" can handle TRANSLATE blocks without a problem, but apart from that, the taxon labels must be identical down to the last character and in case for the same taxa across all tree statements for the analysis to be valid.
In version 2.0, the NEXUS file-parsing engine was rewritten from scratch, and it can now handle any NEXUS or Newick compliant files, including those containing comment blocks, special characters in identifier names, etc.
Program Usage Description and Examples
The current version of the program is a command-line utility that is used as follows:
$ bootscore.py [OPTIONS] -t-b -o
The above assumes that you have set up bootscore to an executable on the system path. If not, you will need to pass the script filename to the python interpreter:
$ python bootscore.py [OPTIONS] -t-b -o
(Assuming that the "bootscore.py" script file is
located in the current folder as well.)
You can also use the long version of the options or parameters, which involve more typing, but are less cryptic and easier to remember. The long versions of the options are preceded by two dashes instead of one, and are followed by an equals sign before the option value (if any) is specified:
$ bootscore.py [options] --tree=--bootstraps= --output=<
Or, if you have not installed the bootscore script as an executable on the system path, but have placed it in the current folder:
$ python bootscore.py [options] --tree=--bootstraps= --output=<
By default, bootscore outputs a treefile where the topology and branch lengths correspond to that given in the
original treefile, and with bipartition or clade support in terms of percentages given by internal node labels.
So for example, assuming that you have a a best estimated tree topology
file given by "hyla16s.best.tre", and a set of
non-parametric bootstrap trees given by
"hyla16s.bs100.tre", and copies of both files are
sitting directly in the current folder, then the following command will
create a a treefile "hyla16s.bestbs.tre" in the
same folder, with bootstrap support for each clade (i.e., the
percentage of bootstrap trees in which that in that clade was found)
indicated by node labels:
$ bootscore.py -t hyla16s.best.tre -b hyla16s.bs100.tre -o hyla16s.bestbs.tre
Of course, the data files do not have to reside in the same folder, as long as you provide the full path (relative or absolute) from the current directory for each of the files:
$ bootscore.py -t /home/jeet/data/hyla/hsearch/hyla16s.best.tre -b /home/jeet/data/hyla/boots/hyla16s.bs100.tre -o /home/jeet/data/hyla/support/hyla16s.bestbs.tre
If you want the support values indicated by proportional frequencies
instead of percentages, use the proportion option, "-p",
or "--proportions":
$ bootscore.py -t hyla16s.best.tre -b hyla16s.bs100.tre -o hyla16s.bestbs.tre -p
$ bootscore.py --tree=hyla16s.best.tre --bootstraps=hyla16s.bs100.tre --output=hyla16s.bestbs.tre --proportions
If you want the support values indicated by proportional frequencies
instead of percentages, use the branch length option, "-v",
or "--support-as-lengths":
$ bootscore.py -t hyla16s.best.tre -b hyla16s.bs100.tre -o hyla16s.bestbs.tre -v
$ bootscore.py --tree=hyla16s.best.tre --bootstraps=hyla16s.bs100.tre --output=hyla16s.bestbs.tre --support-as-lengths
Or, combining the parameters:
$ bootscore.py -t hyla16s.best.tre -b hyla16s.bs100.tre -o hyla16s.bestbs.tre -v -p
$ bootscore.py --tree=hyla16s.best.tre --bootstraps=hyla16s.bs100.tre --output=hyla16s.bestbs.tre --support-as-lengths --proportions
Other Options and Settings
As noted, the current version of the program counts support in terms of bipartitions or splits shared between the tree being assessed and the bootstrap trees, while the previous versions
counted clades. If you wish to assess support in terms of clades, then you can revert to the previous metric by using the "--clade-support-mode" option.
Other option settings allow you to specify the
decimal places of precision with which to report support values ("-d" or "--decimals"),
disable the inclusion of taxa blocks in the results tree file ("--no-taxa-block"),
save the results as Phylip format file rather than NEXUS ("--phylip"),
automatically overwrite the output file if it already exists ("-r" or "--replace"),
or run without progress messages ("-q" or "--quiet"). Finally, invoking the
help option ("--help") provides a summary of all the options and parameters.
How the Program Works
The first version of the program (1.0) was essentially a lexical processor: trees were maintained and manipulated as simple strings (the tree statement). This was slow, wieldy, error-prone and relatively inflexible. Subsequents versions of the program, however, employ a full-fledged n-ary tree data model, which not only makes processing much faster, but also makes programming and debugging a lot easier.
The third version of the program changed how support was assessed, to bring it in line with PAUP* scoring model. Currently, the set of bootstrap trees are examined to see which proportion of them contain internal nodes that recover the same bipartitions as internal nodes on the tree to be assessed. A "bipartition" is defined by the two groups of terminals formed if the (unrooted) tree is bisected or split at the given node. This procedure yields the exact same support values as the PAUP* model. The previous versions of the program employed a different metric, counting clades (i.e. monophyletic groups defined by each node) rather than bipartitions (splits).
Limitations
While I do not consider it a limitation in any way, some people might find it discouraging that "bootscore" is a command-line program rather than one with a graphical interface. Unfortunately, I find GUI programming tedious and time-consuming, taking several times longer to code and debug than the part that does the actual work, as well as being less interesting intellectually. As such, while I am entertaining the idea of eventually writing a graphical front-end for this program, I do not see this happening any time soon. Also, the program is rather slow. Some of is probably due to the inherent slowness of an interpreted language like Python, though I have no doubt that the code itself could use a healthy dose of refactoring and optimization.
Bugs, Suggestions, Comments, etc.
If you encounter any problems, errors, crashes etc. while using this program, please let me know at jeetsukumaran@frogweb.org. If you include the term "bootscore" anywhere on the subject line (e.g. "Problem such-and-such with bootscore"), it would help greatly with getting through the spam filter. Please include all the datafiles involved, as well the complete command used (with all the options and parameters) and the complete error message returned (simply cutting-and-pasting the terminal text should work fine). Please feel free to contact me if you have any other questions, suggestions or comments as well.
How to Cite this Program
If you use this program in your analysis, please cite it as:
Copyright, License and Warranty
© 2007 Jeet Sukumaran.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.
feed
Comments
0 comments postedPost new comment