88 research outputs found

    PHY·FI: fast and easy online creation and manipulation of phylogeny color figures

    Get PDF
    BACKGROUND: The need to depict a phylogeny, or some other kind of abstract tree, is very frequently experienced by researchers from a broad range of biological and computational disciplines. Thousands of papers and talks include phylogeny figures, and often during everyday work, one would like to quickly get a graphical display of, e.g., the phylogenetic relationship between a set of sequences as calculated by an alignment program such as ClustalW or the phylogenetic package Phylip. A wealth of software tools capable of tree drawing exists; most are comprehensive packages that also perform various types of analysis, and hence they are available only for download and installing. Some online tools exist, too. RESULTS: This paper presents an online tool, PHY·FI, which encompasses all the qualities of existing online programs and adds functionality to hopefully eliminate the need for post-processing the phylogeny figure in some other general-purpose graphics program. PHY·FI is versatile, easy-to-use and fast, and supports comprehensive graphical control, several download image formats, and the possibility of dynamically collapsing groups of nodes into named subtrees (e.g. "Primates"). The user can create a color figure from any phylogeny, or other kind of tree, represented in the widely used parenthesized Newick format. CONCLUSION: PHY·FI is fast and easy to use, yet still offers full color control, tree manipulation, and several image formats. It does not require any downloading and installing, and thus any internet user regardless of computer skills, and computer platform, can benefit from it. PHY·FI is free for all and is available from this web address

    A survey on feature weighting based K-Means algorithms

    Get PDF
    This is a pre-copyedited, author-produced PDF of an article accepted for publication in Journal of Classification [de Amorim, R. C., 'A survey on feature weighting based K-Means algorithms', Journal of Classification, Vol. 33(2): 210-242, August 25, 2016]. Subject to embargo. Embargo end date: 25 August 2017. The final publication is available at Springer via http://dx.doi.org/10.1007/s00357-016-9208-4 © Classification Society of North America 2016In a real-world data set there is always the possibility, rather high in our opinion, that different features may have different degrees of relevance. Most machine learning algorithms deal with this fact by either selecting or deselecting features in the data preprocessing phase. However, we maintain that even among relevant features there may be different degrees of relevance, and this should be taken into account during the clustering process. With over 50 years of history, K-Means is arguably the most popular partitional clustering algorithm there is. The first K-Means based clustering algorithm to compute feature weights was designed just over 30 years ago. Various such algorithms have been designed since but there has not been, to our knowledge, a survey integrating empirical evidence of cluster recovery ability, common flaws, and possible directions for future research. This paper elaborates on the concept of feature weighting and addresses these issues by critically analysing some of the most popular, or innovative, feature weighting mechanisms based in K-Means.Peer reviewedFinal Accepted Versio

    Armadillo 1.1: An Original Workflow Platform for Designing and Conducting Phylogenetic Analysis and Simulations

    Get PDF
    In this paper we introduce Armadillo v1.1, a novel workflow platform dedicated to designing and conducting phylogenetic studies, including comprehensive simulations. A number of important phylogenetic and general bioinformatics tools have been included in the first software release. As Armadillo is an open-source project, it allows scientists to develop their own modules as well as to integrate existing computer applications. Using our workflow platform, different complex phylogenetic tasks can be modeled and presented in a single workflow without any prior knowledge of programming techniques. The first version of Armadillo was successfully used by professors of bioinformatics at Université du Quebec à Montreal during graduate computational biology courses taught in 2010–11. The program and its source code are freely available at: <http://www.bioinfo.uqam.ca/armadillo>

    PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Phylogenies, i.e., the evolutionary histories of groups of taxa, play a major role in representing the interrelationships among biological entities. Many software tools for reconstructing and evaluating such phylogenies have been proposed, almost all of which assume the underlying evolutionary history to be a tree. While trees give a satisfactory first-order approximation for many families of organisms, other families exhibit evolutionary mechanisms that cannot be represented by trees. Processes such as horizontal gene transfer (HGT), hybrid speciation, and interspecific recombination, collectively referred to as <it>reticulate evolutionary events</it>, result in <it>networks</it>, rather than trees, of relationships. Various software tools have been recently developed to analyze reticulate evolutionary relationships, which include SplitsTree4, LatTrans, EEEP, HorizStory, and T-REX.</p> <p>Results</p> <p>In this paper, we report on the PhyloNet software package, which is a suite of tools for analyzing reticulate evolutionary relationships, or <it>evolutionary networks</it>, which are rooted, directed, acyclic graphs, leaf-labeled by a set of taxa. These tools can be classified into four categories: (1) evolutionary network representation: reading/writing evolutionary networks in a newly devised compact form; (2) evolutionary network characterization: analyzing evolutionary networks in terms of three basic building blocks – trees, clusters, and tripartitions; (3) evolutionary network comparison: comparing two evolutionary networks in terms of topological dissimilarities, as well as fitness to sequence evolution under a maximum parsimony criterion; and (4) evolutionary network reconstruction: reconstructing an evolutionary network from a species tree and a set of gene trees.</p> <p>Conclusion</p> <p>The software package, PhyloNet, offers an array of utilities to allow for efficient and accurate analysis of evolutionary networks. The software package will help significantly in analyzing large data sets, as well as in studying the performance of evolutionary network reconstruction methods. Further, the software package supports the proposed eNewick format for compact representation of evolutionary networks, a feature that allows for efficient interoperability of evolutionary network software tools. Currently, all utilities in PhyloNet are invoked on the command line.</p

    Subsampling effects in neuronal avalanche distributions recorded in vivo

    Get PDF
    Background Many systems in nature are characterized by complex behaviour where large cascades of events, or avalanches, unpredictably alternate with periods of little activity. Snow avalanches are an example. Often the size distribution f(s) of a system's avalanches follows a power law, and the branching parameter sigma, the average number of events triggered by a single preceding event, is unity. A power law for f(s), and sigma=1, are hallmark features of self-organized critical (SOC) systems, and both have been found for neuronal activity in vitro. Therefore, and since SOC systems and neuronal activity both show large variability, long-term stability and memory capabilities, SOC has been proposed to govern neuronal dynamics in vivo. Testing this hypothesis is difficult because neuronal activity is spatially or temporally subsampled, while theories of SOC systems assume full sampling. To close this gap, we investigated how subsampling affects f(s) and sigma by imposing subsampling on three different SOC models. We then compared f(s) and sigma of the subsampled models with those of multielectrode local field potential (LFP) activity recorded in three macaque monkeys performing a short term memory task. Results Neither the LFP nor the subsampled SOC models showed a power law for f(s). Both, f(s) and sigma, depended sensitively on the subsampling geometry and the dynamics of the model. Only one of the SOC models, the Abelian Sandpile Model, exhibited f(s) and sigma similar to those calculated from LFP activity. Conclusions Since subsampling can prevent the observation of the characteristic power law and sigma in SOC systems, misclassifications of critical systems as sub- or supercritical are possible. Nevertheless, the system specific scaling of f(s) and sigma under subsampling conditions may prove useful to select physiologically motivated models of brain function. Models that better reproduce f(s) and sigma calculated from the physiological recordings may be selected over alternatives

    Genetic Networking of the Bemisia tabaci Cryptic Species Complex Reveals Pattern of Biological Invasions

    Get PDF
    BACKGROUND: A challenge within the context of cryptic species is the delimitation of individual species within the complex. Statistical parsimony network analytics offers the opportunity to explore limits in situations where there are insufficient species-specific morphological characters to separate taxa. The results also enable us to explore the spread in taxa that have invaded globally. METHODOLOGY/PRINCIPAL FINDINGS: Using a 657 bp portion of mitochondrial cytochrome oxidase 1 from 352 unique haplotypes belonging to the Bemisia tabaci cryptic species complex, the analysis revealed 28 networks plus 7 unconnected individual haplotypes. Of the networks, 24 corresponded to the putative species identified using the rule set devised by Dinsdale et al. (2010). Only two species proposed in Dinsdale et al. (2010) departed substantially from the structure suggested by the analysis. The analysis of the two invasive members of the complex, Mediterranean (MED) and Middle East - Asia Minor 1 (MEAM1), showed that in both cases only a small number of haplotypes represent the majority that have spread beyond the home range; one MEAM1 and three MED haplotypes account for >80% of the GenBank records. Israel is a possible source of the globally invasive MEAM1 whereas MED has two possible sources. The first is the eastern Mediterranean which has invaded only the USA, primarily Florida and to a lesser extent California. The second are western Mediterranean haplotypes that have spread to the USA, Asia and South America. The structure for MED supports two home range distributions, a Sub-Saharan range and a Mediterranean range. The MEAM1 network supports the Middle East - Asia Minor region. CONCLUSION/SIGNIFICANCE: The network analyses show a high level of congruence with the species identified in a previous phylogenetic analysis. The analysis of the two globally invasive members of the complex support the view that global invasion often involve very small portions of the available genetic diversity

    CMS physics technical design report : Addendum on high density QCD with heavy ions

    Get PDF
    Peer reviewe

    Data-analysis strategies for image-based cell profiling

    Get PDF
    Image-based cell profiling is a high-throughput strategy for the quantification of phenotypic differences among a variety of cell populations. It paves the way to studying biological systems on a large scale by using chemical and genetic perturbations. The general workflow for this technology involves image acquisition with high-throughput microscopy systems and subsequent image processing and analysis. Here, we introduce the steps required to create high-quality image-based (i.e., morphological) profiles from a collection of microscopy images. We recommend techniques that have proven useful in each stage of the data analysis process, on the basis of the experience of 20 laboratories worldwide that are refining their image-based cell-profiling methodologies in pursuit of biological discovery. The recommended techniques cover alternatives that may suit various biological goals, experimental designs, and laboratories' preferences.Peer reviewe

    Biogeography of Amazonian fishes: deconstructing river basins as biogeographic units

    Full text link
    corecore