Search CORE

306,720 research outputs found

Feature subset selection using support-vector machines by averaging over probabilistic genotype data

Author: Herrera Luque Francisco José
Publication venue
Publication date: 30/10/2012
Field of study

Despite the grand promises of the postgenomic era, such as personalized prevention, diagnosis, drugs, and treatments, the landscape of biomedicine looks more and more complex. The fullfillment of these promises for diseases significant in public health requires new approaches to induction for statistical and causal inferences from observations and interventions. Within the biomedical world an important response to this challenge is the mapping and relatively cheap measuring of the genetic variations, such as single nucleotide polymorphisms (SNPs). The recent mapping of the genetic variations has opened a new dimension in the postgenomic research at all phenotypic levels, such as genomic, proteomic, and clinical, and it has sparked a series of Genetic Association Studies (GAS), based on the application of machine learning and data mining techniques. To overcome such problems, different strategies are being investigated within the research community. The aim of this thesis work is to contribute to the progress in this field giving a step forward towards the solution. I have investigated the suitable machine learning and data mining algorithms for this task and the state of the art of the currently available implementations of them intended for biomedical research applications. As a result I have proposed a solution strategy, and chosen and extended the functionality of the Java-ML library, an open source machine learning library written in Java, implementing some missing algorithms and functionality that necessary for the proposed approach. This thesis work is structured into three main blocks. Section 3 “An approach to the use of machine learning techniques with genotype data” addresses the faced problem and the proposed solution. It begins with the definition of some introductory GAS concepts and the description of the solution strategy and elaborates in subsequent subsections on the description of the theoretical underpinnings of the algorithms setting up the solution. Specifically, the first subsection, “The feature selection problem in the bioinformatics domain”, justifies the necessity of reducing the dimensionality of data sets in order to allow for acceptable performance in the application of machine learning techniques to the broader field of bioinformatics implications and establishes a comparative taxonomy of the currently available techniques. In the second subsection, entitled “Feature selection using support-vector machines”, the idea behind support-vector machines classifiers and their application to feature subset selection is defined while the third subsection, “Ranking fusion as averaging technique: Markov chain based algorithms”, describes the ranking fusion algorithms which implementation has been chosen for the combination of the feature subsets obtained from different data sets. Section 4 “Analysis of available tools for experimental design” analyses the available suitable tools for experimental design in GAS based on machine learning techniques. In this sense in the first subsection, “Advantages of high level languages for machine learning algorithms”, the convenience of using high level languages for the kind of applications we are working in is discussed. In the second subsection, “Machine learning algorithms implementations in Java”, the election of the Java language is justified followed by an analysis of the currently available implementations of machine learning algorithms in this language that are worthwhile to be considered for our purposes, namely WEKA, RapidMiner and Java-ML. In Section 5 “Implemented extensions to the Java-ML library” a description of the functionalities that have been added to enable a framework suitable for the design of GAS experiments in order to test the proposed approach is provided. The “Missing values imputation: the dataset.tools package” subsection focuses on data sets handling functionalities while the “Averaging through ranking fusion: rankingfusion and rankingfusion.scoring package” subsection details the ranking fusion algorithms implementations. Finally the “How to use the code” subsection is a tutorial on how to use both the library and its extension for the development of applications. In addition to these main blocks, a final section called “Future Work” reflects how the developed work can be used by GAS domain experts to evaluate the usefulness of the proposed technique.Ingeniería de Telecomunicació

Universidad Carlos III de Madrid e-Archivo

Fast Incremental SVDD Learning Algorithm with the Gaussian Kernel

Author: Chaudhuri Arin
Hu Wenhao
Jiang Hansi
Kakde Deovrat
Wang Haoyu
Publication venue
Publication date: 01/11/2018
Field of study

Support vector data description (SVDD) is a machine learning technique that is used for single-class classification and outlier detection. The idea of SVDD is to find a set of support vectors that defines a boundary around data. When dealing with online or large data, existing batch SVDD methods have to be rerun in each iteration. We propose an incremental learning algorithm for SVDD that uses the Gaussian kernel. This algorithm builds on the observation that all support vectors on the boundary have the same distance to the center of sphere in a higher-dimensional feature space as mapped by the Gaussian kernel function. Each iteration involves only the existing support vectors and the new data point. Moreover, the algorithm is based solely on matrix manipulations; the support vectors and their corresponding Lagrange multiplier

\alpha_i

's are automatically selected and determined in each iteration. It can be seen that the complexity of our algorithm in each iteration is only

O(k^2)

, where

k

is the number of support vectors. Experimental results on some real data sets indicate that FISVDD demonstrates significant gains in efficiency with almost no loss in either outlier detection accuracy or objective function value.Comment: 18 pages, 1 table, 4 figure

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

On functional module detection in metabolic networks

Author: Adleman
Backhaus
Berthelot
Berthelot
Billington
Bortfeldt
Colom
Cormen
Deuflhard
Durbin
Farkas
Fourier
Gardiner
Garey
Grunwald
Haken
Klee
Knuth
Koch
Lipton
Lipton
Murray
Paun
Pérès
Schuster
Shor
Starke
Steinhausen
Su
Zhou
Publication venue
Publication date: 01/08/2013
Field of study

Functional modules of metabolic networks are essential for understanding the metabolism of an organism as a whole. With the vast amount of experimental data and the construction of complex and large-scale, often genome-wide, models, the computer-aided identification of functional modules becomes more and more important. Since steady states play a key role in biology, many methods have been developed in that context, for example, elementary flux modes, extreme pathways, transition invariants and place invariants. Metabolic networks can be studied also from the point of view of graph theory, and algorithms for graph decomposition have been applied for the identification of functional modules. A prominent and currently intensively discussed field of methods in graph theory addresses the Q-modularity. In this paper, we recall known concepts of module detection based on the steady-state assumption, focusing on transition-invariants (elementary modes) and their computation as minimal solutions of systems of Diophantine equations. We present the Fourier-Motzkin algorithm in detail. Afterwards, we introduce the Q-modularity as an example for a useful non-steady-state method and its application to metabolic networks. To illustrate and discuss the concepts of invariants and Q-modularity, we apply a part of the central carbon metabolism in potato tubers (Solanum tuberosum) as running example. The intention of the paper is to give a compact presentation of known steady-state concepts from a graph-theoretical viewpoint in the context of network decomposition and reduction and to introduce the application of Q-modularity to metabolic Petri net models

Crossref

Directory of Open Access Journals

PubMed Central

Hochschulschriftenserver - Universität Frankfurt am Main

Hadronic decays of the tau lepton: Theoretical overview

Author: Abreu
Ackerstaff
Ackerstaff
Akhmetshin
Alemany
Aloisio
Amorós
Anderson
Asner
Bando
Barate
Barate
Berger
Bijnens
Bijnens
Bijnens
Braaten
Brodsky
Browder
Bruch
Cirigliano
Cirigliano
Coan
Colangelo
Colangelo
Coleman
de Trocóniz
Decker
Decker
Decker
Decker
Dominguez
Ecker
Ecker
Edwards
Espriu
Feindt
Finkemeier
Finkemeier
Fischer
Floratos
Gasser
Gasser
Gasser
Georgi
Ghozzi
Girlanda
Gounaris
Guerrero
Gómez Dumm
Gómez Dumm
Knecht
Kühn
Kühn
Kühn
Kühn
Lepage
Leutwyler
Liu
Meissner
Mirkes
Oiler
Passera
Pich
Pich
Pich
Pich
Portolés
Portolés
Resell
Rougé
Rougé
Ruiz-Femenía
Sanz-Cillero
Sobie
t'Hooft
Weinberg
Weinberg
Witten
Publication venue: 'Elsevier BV'
Publication date: 01/01/2004
Field of study

Exclusive hadronic decays of the tau lepton provide an excellent framework to study the hadronization of QCD currents in a non-perturbative energy region populated by many resonances. I give a short review both on the main theoretical tools employed to analyse experimental data and on how Theory compares with Experiment.Comment: 13 pages, 4 figures. Invited talk given at the International Workshop on Tau Lepton Physics, TAU04 (14th-17th September 2004), Nara (Japan

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

Dispersive analysis of omega --> 3pi and phi --> 3pi decays

Author: Kubis Bastian
Niecknig Franz
Schneider Sebastian P.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/05/2012
Field of study

We study the three-pion decays of the lightest isoscalar vector mesons, omega and phi, in a dispersive framework that allows for a consistent description of final-state interactions between all three pions. Our results are solely dependent on the phenomenological input for the pion-pion P-wave scattering phase shift. We predict the Dalitz plot distributions for both decays and compare our findings to recent measurements of the phi --> 3pi Dalitz plot by the KLOE and CMD-2 collaborations. Dalitz plot parameters for future precision measurements of omega --> 3pi are predicted. We also calculate the pi-pi P-wave inelasticity contribution from omega-pi intermediate states.Comment: 23 pages, 18 figures; discussion extended, Appendix D added, matches version published in EPJ

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)