204 research outputs found
Tree pruner: An efficient tool for selecting data from a biased genetic database
<p>Abstract</p> <p>Background</p> <p>Large databases of genetic data are often biased in their representation. Thus, selection of genetic data with desired properties, such as evolutionary representation or shared genotypes, is problematic. Selection on the basis of epidemiological variables may not achieve the desired properties. Available automated approaches to the selection of influenza genetic data make a tradeoff between speed and simplicity on the one hand and control over quality and contents of the dataset on the other hand. A poorly chosen dataset may be detrimental to subsequent analyses.</p> <p>Results</p> <p>We developed a tool, <it>Tree Pruner</it>, for obtaining a dataset with desired evolutionary properties from a large, biased genetic database. Tree Pruner provides the user with an interactive phylogenetic tree as a means of editing the initial dataset from which the tree was inferred. The tree visualization changes dynamically, using colors and shading, reflecting Tree Pruner actions. At the end of a Tree Pruner session, the editing actions are implemented in the dataset.</p> <p>Currently, Tree Pruner is implemented on the Influenza Research Database (IRD). The data management capabilities of the IRD allow the user to store a pruned dataset for additional pruning or for subsequent analysis. Tree Pruner can be easily adapted for use with other organisms.</p> <p>Conclusions</p> <p>Tree Pruner is an efficient, manual tool for selecting a high-quality dataset with desired evolutionary properties from a biased database of genetic sequences. It offers an important alternative to automated approaches to the same goal, by providing the user with a dynamic, visual guide to the ongoing selection process and ultimate control over the contents (and therefore quality) of the dataset.</p
The 2-allocation p-hub median problem and a modified Benders decomposition method for solving hub location problems
We study the uncapacitated 2-allocation p-hub median problem (U2ApHMP), which is a special case of the well-studied hub median problem. The hub median problem designs a hub network in which the location of p hubs needs to be decided (the hubs are fully interconnected). The other nodes (known as access nodes) in the hub median problem are then allocated to one or many hubs. In the U2ApHMP, each access node is allocated to exactly two hubs. We discuss how this problem provides an alternative network design option for well-known p-hub median problems. We show its relevance and usefulness in the context of survivable network design and show that it addresses network survivability, a feature that has often been largely overlooked in hub network design research to date. We show that U2ApHMP is NP-hard even for a fixed/known set of hubs. We propose a mathematical formulation and develop a modified Benders decomposition method for this problem. In this, we convert the corresponding subproblems to minimum cost network flow problems. This allows us to solve large instances efficiently. We believe that, while our resulting method solves the U2ApHMP efficiently, it is also generalisable and can potentially be employed for solving other classes and types of hub location problems too
Recommended from our members
Practical algorithms for multivariate rational approximation
17 USC 105 interim-entered record; under review.The article of record as published may be found at https://doi.org/10.1016/j.cpc.2020.107663We present two approaches for computing rational approximations to multivariate functions, motivated by their effectiveness as surrogate models for high-energy physics (HEP) applications. Our first
approach builds on the Stieltjes process to efficiently and robustly compute the coefficients of the
rational approximation. Our second approach is based on an optimization formulation that allows us
to include structural constraints on the rational approximation (in particular, constraints demanding
the absence of singularities), resulting in a semi-infinite optimization problem that we solve using an
outer approximation approach. We present results for synthetic and real-life HEP data, and we compare
the approximation quality of our approaches with that of traditional polynomial approximations.This work was supported by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research, under Contract DE-AC02-06CH11357. Support for this work was provided through the SciDAC program funded by U.S. Department of Energy, Office of Science, Advanced Scientific Computing Re search. This work was also supported by the U.S. Department of Energy through grant DE-FG02-05ER25694, and by Fermi Re search Alliance, LLC, United States of America under Contract No. DE-AC02-07CH11359 with the U.S. Department of Energy, Office of Science, Office of High Energy Physics. This work was supported in part by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research and Office of Nuclear Physics, Scientific Discovery through Advanced Computing (SciDAC) program through the FASTMath Institute under Contract No. DE-AC02-05CH11231 at Lawrence Berkeley National Laboratory
BROOD: Bilevel and Robust Optimization and Outlier Detection for Efficient Tuning of High-Energy Physics Event Generators
The parameters in Monte Carlo (MC) event generators are tuned on experimental measurements by evaluating the goodness of fit between the data and the MC predictions. The relative importance of each measurement is adjusted manually in an often time consuming, iterative process to meet different experimental needs. In this work, we introduce several optimization formulations and algorithms with new decision criteria for streamlining and automating this process. These algorithms are designed for two formulations: bilevel optimization and robust optimization. Both formulations are applied to the datasets used in the ATLAS A14 tune and to the dedicated hadronization datasets generated by the SHERPA generator, respectively. The corresponding tuned generator parameters are compared using three metrics. We compare the quality of our automatic tunes to the published ATLAS A14 tune. Moreover, we analyze the impact of a pre-processing step that excludes data that cannot be described by the physics models used in the MC event generators
Apprentice for Event Generator Tuning
Apprentice is a tool developed for event generator tuning. It contains a
range of conceptual improvements and extensions over the tuning tool Professor.
Its core functionality remains the construction of a multivariate analytic
surrogate model to computationally expensive Monte-Carlo event generator
predictions. The surrogate model is used for numerical optimization in
chi-square minimization and likelihood evaluation. Apprentice also introduces
algorithms to automate the selection of observable weights to minimize the
effect of mis-modeling in the event generators. We illustrate our improvements
for the task of MC-generator tuning and limit setting.Comment: 9 pages, 2 figures, submitted to the 25th International Conference on
Computing in High-Energy and Nuclear Physic
Identification of broadly neutralizing antibody epitopes in the HIV-1 envelope glycoprotein using evolutionary models
Background: Identification of the epitopes targeted by antibodies that can neutralize diverse HIV-1 strains can provide important clues for the design of a preventative vaccine. Methods: We have developed a computational approach that can identify key amino acids within the HIV-1 envelope glycoprotein that influence sensitivity to broadly cross-neutralizing antibodies. Given a sequence alignment and neutralization titers for a panel of viruses, the method works by fitting a phylogenetic model that allows the amino acid frequencies at each site to depend on neutralization sensitivities. Sites at which viral evolution influences neutralization sensitivity were identified using Bayes factors (BFs) to compare the fit of this model to that of a null model in which sequences evolved independently of antibody sensitivity. Conformational epitopes were identified with a Metropolis algorithm that searched for a cluster of sites with large Bayes factors on the tertiary structure of the viral envelope. Results: We applied our method to ID50 neutralization data generated from seven HIV-1 subtype C serum samples with neutralization breadth that had been tested against a multi-clade panel of 225 pseudoviruses for which envelope sequences were also available. For each sample, between two and four sites were identified that were strongly associated with neutralization sensitivity (2ln(BF) > 6), a subset of which were experimentally confirmed using site-directed mutagenesis. Conclusions: Our results provide strong support for the use of evolutionary models applied to cross-sectional viral neutralization data to identify the epitopes of serum antibodies that confer neutralization breadth
Recurrent Signature Patterns in HIV-1 B Clade Envelope Glycoproteins Associated with either Early or Chronic Infections
Here we have identified HIV-1 B clade Envelope (Env) amino acid signatures from early in infection that may be favored at transmission, as well as patterns of recurrent mutation in chronic infection that may reflect common pathways of immune evasion. To accomplish this, we compared thousands of sequences derived by single genome amplification from several hundred individuals that were sampled either early in infection or were chronically infected. Samples were divided at the outset into hypothesis-forming and validation sets, and we used phylogenetically corrected statistical strategies to identify signatures, systematically scanning all of Env. Signatures included single amino acids, glycosylation motifs, and multi-site patterns based on functional or structural groupings of amino acids. We identified signatures near the CCR5 co-receptor-binding region, near the CD4 binding site, and in the signal peptide and cytoplasmic domain, which may influence Env expression and processing. Two signatures patterns associated with transmission were particularly interesting. The first was the most statistically robust signature, located in position 12 in the signal peptide. The second was the loss of an N-linked glycosylation site at positions 413β415; the presence of this site has been recently found to be associated with escape from potent and broad neutralizing antibodies, consistent with enabling a common pathway for immune escape during chronic infection. Its recurrent loss in early infection suggests it may impact fitness at the time of transmission or during early viral expansion. The signature patterns we identified implicate Env expression levels in selection at viral transmission or in early expansion, and suggest that immune evasion patterns that recur in many individuals during chronic infection when antibodies are present can be selected against when the infection is being established prior to the adaptive immune response
- β¦