306,720 research outputs found

    Feature subset selection using support-vector machines by averaging over probabilistic genotype data

    Get PDF
    Despite the grand promises of the postgenomic era, such as personalized prevention, diagnosis, drugs, and treatments, the landscape of biomedicine looks more and more complex. The fullfillment of these promises for diseases significant in public health requires new approaches to induction for statistical and causal inferences from observations and interventions. Within the biomedical world an important response to this challenge is the mapping and relatively cheap measuring of the genetic variations, such as single nucleotide polymorphisms (SNPs). The recent mapping of the genetic variations has opened a new dimension in the postgenomic research at all phenotypic levels, such as genomic, proteomic, and clinical, and it has sparked a series of Genetic Association Studies (GAS), based on the application of machine learning and data mining techniques. To overcome such problems, different strategies are being investigated within the research community. The aim of this thesis work is to contribute to the progress in this field giving a step forward towards the solution. I have investigated the suitable machine learning and data mining algorithms for this task and the state of the art of the currently available implementations of them intended for biomedical research applications. As a result I have proposed a solution strategy, and chosen and extended the functionality of the Java-ML library, an open source machine learning library written in Java, implementing some missing algorithms and functionality that necessary for the proposed approach. This thesis work is structured into three main blocks. Section 3 ā€œAn approach to the use of machine learning techniques with genotype dataā€ addresses the faced problem and the proposed solution. It begins with the definition of some introductory GAS concepts and the description of the solution strategy and elaborates in subsequent subsections on the description of the theoretical underpinnings of the algorithms setting up the solution. Specifically, the first subsection, ā€œThe feature selection problem in the bioinformatics domainā€, justifies the necessity of reducing the dimensionality of data sets in order to allow for acceptable performance in the application of machine learning techniques to the broader field of bioinformatics implications and establishes a comparative taxonomy of the currently available techniques. In the second subsection, entitled ā€œFeature selection using support-vector machinesā€, the idea behind support-vector machines classifiers and their application to feature subset selection is defined while the third subsection, ā€œRanking fusion as averaging technique: Markov chain based algorithmsā€, describes the ranking fusion algorithms which implementation has been chosen for the combination of the feature subsets obtained from different data sets. Section 4 ā€œAnalysis of available tools for experimental designā€ analyses the available suitable tools for experimental design in GAS based on machine learning techniques. In this sense in the first subsection, ā€œAdvantages of high level languages for machine learning algorithmsā€, the convenience of using high level languages for the kind of applications we are working in is discussed. In the second subsection, ā€œMachine learning algorithms implementations in Javaā€, the election of the Java language is justified followed by an analysis of the currently available implementations of machine learning algorithms in this language that are worthwhile to be considered for our purposes, namely WEKA, RapidMiner and Java-ML. In Section 5 ā€œImplemented extensions to the Java-ML libraryā€ a description of the functionalities that have been added to enable a framework suitable for the design of GAS experiments in order to test the proposed approach is provided. The ā€œMissing values imputation: the dataset.tools packageā€ subsection focuses on data sets handling functionalities while the ā€œAveraging through ranking fusion: rankingfusion and rankingfusion.scoring packageā€ subsection details the ranking fusion algorithms implementations. Finally the ā€œHow to use the codeā€ subsection is a tutorial on how to use both the library and its extension for the development of applications. In addition to these main blocks, a final section called ā€œFuture Workā€ reflects how the developed work can be used by GAS domain experts to evaluate the usefulness of the proposed technique.IngenierĆ­a de TelecomunicaciĆ³

    Fast Incremental SVDD Learning Algorithm with the Gaussian Kernel

    Full text link
    Support vector data description (SVDD) is a machine learning technique that is used for single-class classification and outlier detection. The idea of SVDD is to find a set of support vectors that defines a boundary around data. When dealing with online or large data, existing batch SVDD methods have to be rerun in each iteration. We propose an incremental learning algorithm for SVDD that uses the Gaussian kernel. This algorithm builds on the observation that all support vectors on the boundary have the same distance to the center of sphere in a higher-dimensional feature space as mapped by the Gaussian kernel function. Each iteration involves only the existing support vectors and the new data point. Moreover, the algorithm is based solely on matrix manipulations; the support vectors and their corresponding Lagrange multiplier Ī±i\alpha_i's are automatically selected and determined in each iteration. It can be seen that the complexity of our algorithm in each iteration is only O(k2)O(k^2), where kk is the number of support vectors. Experimental results on some real data sets indicate that FISVDD demonstrates significant gains in efficiency with almost no loss in either outlier detection accuracy or objective function value.Comment: 18 pages, 1 table, 4 figure

    On functional module detection in metabolic networks

    Get PDF
    Functional modules of metabolic networks are essential for understanding the metabolism of an organism as a whole. With the vast amount of experimental data and the construction of complex and large-scale, often genome-wide, models, the computer-aided identification of functional modules becomes more and more important. Since steady states play a key role in biology, many methods have been developed in that context, for example, elementary flux modes, extreme pathways, transition invariants and place invariants. Metabolic networks can be studied also from the point of view of graph theory, and algorithms for graph decomposition have been applied for the identification of functional modules. A prominent and currently intensively discussed field of methods in graph theory addresses the Q-modularity. In this paper, we recall known concepts of module detection based on the steady-state assumption, focusing on transition-invariants (elementary modes) and their computation as minimal solutions of systems of Diophantine equations. We present the Fourier-Motzkin algorithm in detail. Afterwards, we introduce the Q-modularity as an example for a useful non-steady-state method and its application to metabolic networks. To illustrate and discuss the concepts of invariants and Q-modularity, we apply a part of the central carbon metabolism in potato tubers (Solanum tuberosum) as running example. The intention of the paper is to give a compact presentation of known steady-state concepts from a graph-theoretical viewpoint in the context of network decomposition and reduction and to introduce the application of Q-modularity to metabolic Petri net models

    Hadronic decays of the tau lepton: Theoretical overview

    Full text link
    Exclusive hadronic decays of the tau lepton provide an excellent framework to study the hadronization of QCD currents in a non-perturbative energy region populated by many resonances. I give a short review both on the main theoretical tools employed to analyse experimental data and on how Theory compares with Experiment.Comment: 13 pages, 4 figures. Invited talk given at the International Workshop on Tau Lepton Physics, TAU04 (14th-17th September 2004), Nara (Japan

    Dispersive analysis of omega --> 3pi and phi --> 3pi decays

    Full text link
    We study the three-pion decays of the lightest isoscalar vector mesons, omega and phi, in a dispersive framework that allows for a consistent description of final-state interactions between all three pions. Our results are solely dependent on the phenomenological input for the pion-pion P-wave scattering phase shift. We predict the Dalitz plot distributions for both decays and compare our findings to recent measurements of the phi --> 3pi Dalitz plot by the KLOE and CMD-2 collaborations. Dalitz plot parameters for future precision measurements of omega --> 3pi are predicted. We also calculate the pi-pi P-wave inelasticity contribution from omega-pi intermediate states.Comment: 23 pages, 18 figures; discussion extended, Appendix D added, matches version published in EPJ
    • ā€¦
    corecore