78 research outputs found
A generalization of the concept of distance based on the simplex inequality
We introduce and discuss the concept of n-distance, a generalization to n elements of the classical notion of distance obtained by replacing the triangle inequality with the so-called simplex inequality
d(x1,…,xn)≤K∑i=1nd(x1,…,xn)zi,x1,…,xn,z∈X,
where K=1. Here d(x1,…,xn)zi is obtained from the function d(x1,…,xn) by setting its ith variable to z. We provide several examples of n-distances, and for each of them we investigate the infimum of the set of real numbers K∈]0,1] for which the inequality above holds. We also introduce a generalization of the concept of n-distance obtained by replacing in the simplex inequality the sum function with an arbitrary symmetric function
The Bregman chord divergence
Distances are fundamental primitives whose choice significantly impacts the
performances of algorithms in machine learning and signal processing. However
selecting the most appropriate distance for a given task is an endeavor.
Instead of testing one by one the entries of an ever-expanding dictionary of
{\em ad hoc} distances, one rather prefers to consider parametric classes of
distances that are exhaustively characterized by axioms derived from first
principles. Bregman divergences are such a class. However fine-tuning a Bregman
divergence is delicate since it requires to smoothly adjust a functional
generator. In this work, we propose an extension of Bregman divergences called
the Bregman chord divergences. This new class of distances does not require
gradient calculations, uses two scalar parameters that can be easily tailored
in applications, and generalizes asymptotically Bregman divergences.Comment: 10 page
Unsupervised extremely randomized trees
International audienceIn this paper we present a method to compute dissimilarities on unlabeled data, based on extremely randomized trees. This method, Unsupervised Extremely Randomized Trees, is used jointly with a novel randomized labeling scheme we describe here, and that we call AddCl3. Unlike existing methods such as AddCl1 and AddCl2, no synthetic instances are generated, thus avoiding an increase in the size of the dataset. The empirical study of this method shows that Unsupervised Extremely Randomized Trees with AddCl3 provides competitive results regarding the quality of resulting clusterings, while clearly outperforming previous similar methods in terms of running time
Discovery of potent, novel, non-toxic anti-malarial compounds via quantum modelling, virtual screening and in vitro experimental validation
<p>Abstract</p> <p>Background</p> <p>Developing resistance towards existing anti-malarial therapies emphasize the urgent need for new therapeutic options. Additionally, many malaria drugs in use today have high toxicity and low therapeutic indices. Gradient Biomodeling, LLC has developed a quantum-model search technology that uses quantum similarity and does not depend explicitly on chemical structure, as molecules are rigorously described in fundamental quantum attributes related to individual pharmacological properties. Therapeutic activity, as well as toxicity and other essential properties can be analysed and optimized simultaneously, independently of one another. Such methodology is suitable for a search of novel, non-toxic, active anti-malarial compounds.</p> <p>Methods</p> <p>A set of innovative algorithms is used for the fast calculation and interpretation of electron-density attributes of molecular structures at the quantum level for rapid discovery of prospective pharmaceuticals. Potency and efficacy, as well as additional physicochemical, metabolic, pharmacokinetic, safety, permeability and other properties were characterized by the procedure. Once quantum models are developed and experimentally validated, the methodology provides a straightforward implementation for lead discovery, compound optimizzation and <it>de novo </it>molecular design.</p> <p>Results</p> <p>Starting with a diverse training set of 26 well-known anti-malarial agents combined with 1730 moderately active and inactive molecules, novel compounds that have strong anti-malarial activity, low cytotoxicity and structural dissimilarity from the training set were discovered and experimentally validated. Twelve compounds were identified <it>in silico </it>and tested <it>in vitro</it>; eight of them showed anti-malarial activity (IC50 ≤ 10 μM), with six being very effective (IC50 ≤ 1 μM), and four exhibiting low nanomolar potency. The most active compounds were also tested for mammalian cytotoxicity and found to be non-toxic, with a therapeutic index of more than 6,900 for the most active compound.</p> <p>Conclusions</p> <p>Gradient's metric modelling approach and electron-density molecular representations can be powerful tools in the discovery and design of novel anti-malarial compounds. Since the quantum models are agnostic of the particular biological target, the technology can account for different mechanisms of action and be used for <it>de novo </it>design of small molecules with activity against not only the asexual phase of the malaria parasite, but also against the liver stage of the parasite development, which may lead to true causal prophylaxis.</p
Highly symmetric POVMs and their informational power
We discuss the dependence of the Shannon entropy of normalized finite rank-1
POVMs on the choice of the input state, looking for the states that minimize
this quantity. To distinguish the class of measurements where the problem can
be solved analytically, we introduce the notion of highly symmetric POVMs and
classify them in dimension two (for qubits). In this case we prove that the
entropy is minimal, and hence the relative entropy (informational power) is
maximal, if and only if the input state is orthogonal to one of the states
constituting a POVM. The method used in the proof, employing the Michel theory
of critical points for group action, the Hermite interpolation and the
structure of invariant polynomials for unitary-antiunitary groups, can also be
applied in higher dimensions and for other entropy-like functions. The links
between entropy minimization and entropic uncertainty relations, the Wehrl
entropy and the quantum dynamical entropy are described.Comment: 40 pages, 3 figure
svclassify: a method to establish benchmark structural variant calls
The human genome contains variants ranging in size from small single nucleotide polymorphisms (SNPs) to large structural variants (SVs). High-quality benchmark small variant calls for the pilot National Institute of Standards and Technology (NIST) Reference Material (NA12878) have been developed by the Genome in a Bottle Consortium, but no similar high-quality benchmark SV calls exist for this genome. Since SV callers output highly discordant results, we developed methods to combine multiple forms of evidence from multiple sequencing technologies to classify candidate SVs into likely true or false positives. Our method (svclassify) calculates annotations from one or more aligned bam files from many high-throughput sequencing technologies, and then builds a one-class model using these annotations to classify candidate SVs as likely true or false positives. We first used pedigree analysis to develop a set of high-confidence breakpoint-resolved large deletions. We then used svclassify to cluster and classify these deletions as well as a set of high-confidence deletions from the 1000 Genomes Project and a set of breakpoint-resolved complex insertions from Spiral Genetics. We find that likely SVs cluster separately from likely non-SVs based on our annotations, and that the SVs cluster into different types of deletions. We then developed a supervised one-class classification method that uses a training set of random non-SV regions to determine whether candidate SVs have abnormal annotations different from most of the genome. To test this classification method, we use our pedigree-based breakpoint-resolved SVs, SVs validated by the 1000 Genomes Project, and assembly-based breakpoint-resolved insertions, along with semi-automated visualization using svviz. We find that candidate SVs with high scores from multiple technologies have high concordance with PCR validation and an orthogonal consensus method MetaSV (99.7 % concordant), and candidate SVs with low scores are questionable. We distribute a set of 2676 high-confidence deletions and 68 high-confidence insertions with high svclassify scores from these call sets for benchmarking SV callers. We expect these methods to be particularly useful for establishing high-confidence SV calls for benchmark samples that have been characterized by multiple technologies.https://doi.org/10.1186/s12864-016-2366-
- …