78 research outputs found

    A generalization of the concept of distance based on the simplex inequality

    Get PDF
    We introduce and discuss the concept of n-distance, a generalization to n elements of the classical notion of distance obtained by replacing the triangle inequality with the so-called simplex inequality d(x1,…,xn)≤K∑i=1nd(x1,…,xn)zi,x1,…,xn,z∈X, where K=1. Here d(x1,…,xn)zi is obtained from the function d(x1,…,xn) by setting its ith variable to z. We provide several examples of n-distances, and for each of them we investigate the infimum of the set of real numbers K∈]0,1] for which the inequality above holds. We also introduce a generalization of the concept of n-distance obtained by replacing in the simplex inequality the sum function with an arbitrary symmetric function

    The Bregman chord divergence

    Full text link
    Distances are fundamental primitives whose choice significantly impacts the performances of algorithms in machine learning and signal processing. However selecting the most appropriate distance for a given task is an endeavor. Instead of testing one by one the entries of an ever-expanding dictionary of {\em ad hoc} distances, one rather prefers to consider parametric classes of distances that are exhaustively characterized by axioms derived from first principles. Bregman divergences are such a class. However fine-tuning a Bregman divergence is delicate since it requires to smoothly adjust a functional generator. In this work, we propose an extension of Bregman divergences called the Bregman chord divergences. This new class of distances does not require gradient calculations, uses two scalar parameters that can be easily tailored in applications, and generalizes asymptotically Bregman divergences.Comment: 10 page

    Unsupervised extremely randomized trees

    Get PDF
    International audienceIn this paper we present a method to compute dissimilarities on unlabeled data, based on extremely randomized trees. This method, Unsupervised Extremely Randomized Trees, is used jointly with a novel randomized labeling scheme we describe here, and that we call AddCl3. Unlike existing methods such as AddCl1 and AddCl2, no synthetic instances are generated, thus avoiding an increase in the size of the dataset. The empirical study of this method shows that Unsupervised Extremely Randomized Trees with AddCl3 provides competitive results regarding the quality of resulting clusterings, while clearly outperforming previous similar methods in terms of running time

    Discovery of potent, novel, non-toxic anti-malarial compounds via quantum modelling, virtual screening and in vitro experimental validation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Developing resistance towards existing anti-malarial therapies emphasize the urgent need for new therapeutic options. Additionally, many malaria drugs in use today have high toxicity and low therapeutic indices. Gradient Biomodeling, LLC has developed a quantum-model search technology that uses quantum similarity and does not depend explicitly on chemical structure, as molecules are rigorously described in fundamental quantum attributes related to individual pharmacological properties. Therapeutic activity, as well as toxicity and other essential properties can be analysed and optimized simultaneously, independently of one another. Such methodology is suitable for a search of novel, non-toxic, active anti-malarial compounds.</p> <p>Methods</p> <p>A set of innovative algorithms is used for the fast calculation and interpretation of electron-density attributes of molecular structures at the quantum level for rapid discovery of prospective pharmaceuticals. Potency and efficacy, as well as additional physicochemical, metabolic, pharmacokinetic, safety, permeability and other properties were characterized by the procedure. Once quantum models are developed and experimentally validated, the methodology provides a straightforward implementation for lead discovery, compound optimizzation and <it>de novo </it>molecular design.</p> <p>Results</p> <p>Starting with a diverse training set of 26 well-known anti-malarial agents combined with 1730 moderately active and inactive molecules, novel compounds that have strong anti-malarial activity, low cytotoxicity and structural dissimilarity from the training set were discovered and experimentally validated. Twelve compounds were identified <it>in silico </it>and tested <it>in vitro</it>; eight of them showed anti-malarial activity (IC50 ≤ 10 μM), with six being very effective (IC50 ≤ 1 μM), and four exhibiting low nanomolar potency. The most active compounds were also tested for mammalian cytotoxicity and found to be non-toxic, with a therapeutic index of more than 6,900 for the most active compound.</p> <p>Conclusions</p> <p>Gradient's metric modelling approach and electron-density molecular representations can be powerful tools in the discovery and design of novel anti-malarial compounds. Since the quantum models are agnostic of the particular biological target, the technology can account for different mechanisms of action and be used for <it>de novo </it>design of small molecules with activity against not only the asexual phase of the malaria parasite, but also against the liver stage of the parasite development, which may lead to true causal prophylaxis.</p

    Highly symmetric POVMs and their informational power

    Get PDF
    We discuss the dependence of the Shannon entropy of normalized finite rank-1 POVMs on the choice of the input state, looking for the states that minimize this quantity. To distinguish the class of measurements where the problem can be solved analytically, we introduce the notion of highly symmetric POVMs and classify them in dimension two (for qubits). In this case we prove that the entropy is minimal, and hence the relative entropy (informational power) is maximal, if and only if the input state is orthogonal to one of the states constituting a POVM. The method used in the proof, employing the Michel theory of critical points for group action, the Hermite interpolation and the structure of invariant polynomials for unitary-antiunitary groups, can also be applied in higher dimensions and for other entropy-like functions. The links between entropy minimization and entropic uncertainty relations, the Wehrl entropy and the quantum dynamical entropy are described.Comment: 40 pages, 3 figure

    svclassify: a method to establish benchmark structural variant calls

    Get PDF
    The human genome contains variants ranging in size from small single nucleotide polymorphisms (SNPs) to large structural variants (SVs). High-quality benchmark small variant calls for the pilot National Institute of Standards and Technology (NIST) Reference Material (NA12878) have been developed by the Genome in a Bottle Consortium, but no similar high-quality benchmark SV calls exist for this genome. Since SV callers output highly discordant results, we developed methods to combine multiple forms of evidence from multiple sequencing technologies to classify candidate SVs into likely true or false positives. Our method (svclassify) calculates annotations from one or more aligned bam files from many high-throughput sequencing technologies, and then builds a one-class model using these annotations to classify candidate SVs as likely true or false positives. We first used pedigree analysis to develop a set of high-confidence breakpoint-resolved large deletions. We then used svclassify to cluster and classify these deletions as well as a set of high-confidence deletions from the 1000 Genomes Project and a set of breakpoint-resolved complex insertions from Spiral Genetics. We find that likely SVs cluster separately from likely non-SVs based on our annotations, and that the SVs cluster into different types of deletions. We then developed a supervised one-class classification method that uses a training set of random non-SV regions to determine whether candidate SVs have abnormal annotations different from most of the genome. To test this classification method, we use our pedigree-based breakpoint-resolved SVs, SVs validated by the 1000 Genomes Project, and assembly-based breakpoint-resolved insertions, along with semi-automated visualization using svviz. We find that candidate SVs with high scores from multiple technologies have high concordance with PCR validation and an orthogonal consensus method MetaSV (99.7 % concordant), and candidate SVs with low scores are questionable. We distribute a set of 2676 high-confidence deletions and 68 high-confidence insertions with high svclassify scores from these call sets for benchmarking SV callers. We expect these methods to be particularly useful for establishing high-confidence SV calls for benchmark samples that have been characterized by multiple technologies.https://doi.org/10.1186/s12864-016-2366-
    corecore