Search CORE

55,756 research outputs found

Genetic algorithm based two-mode clustering of metabolomics data

Author: Berg R.A., van den
Hageman J.A.
Smilde A.K.
Werf M.J., van der
Westerhuis J.A.
Publication venue
Publication date: 01/01/2008
Field of study

Metabolomics and other omics tools are generally characterized by large data sets with many variables obtained under different environmental conditions. Clustering methods and more specifically two-mode clustering methods are excellent tools for analyzing this type of data. Two-mode clustering methods allow for analysis of the behavior of subsets of metabolites under different experimental conditions. In addition, the results are easily visualized. In this paper we introduce a two-mode clustering method based on a genetic algorithm that uses a criterion that searches for homogeneous clusters. Furthermore we introduce a cluster stability criterion to validate the clusters and we provide an extended knee plot to select the optimal number of clusters in both experimental and metabolite modes. The genetic algorithm-based two-mode clustering gave biological relevant results when it was applied to two real life metabolomics data sets. It was, for instance, able to identify a catabolic pathway for growth on several of the carbon sources

Springer - Publisher Connector

Wageningen University & Research Publications

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Sparse Probit Linear Mixed Model

Author: Cunningham John P.
Kloft Marius
Lippert Christoph
Mandt Stephan
Nakajima Shinichi
Wenzel Florian
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/07/2017
Field of study

Linear Mixed Models (LMMs) are important tools in statistical genetics. When used for feature selection, they allow to find a sparse set of genetic traits that best predict a continuous phenotype of interest, while simultaneously correcting for various confounding factors such as age, ethnicity and population structure. Formulated as models for linear regression, LMMs have been restricted to continuous phenotypes. We introduce the Sparse Probit Linear Mixed Model (Probit-LMM), where we generalize the LMM modeling paradigm to binary phenotypes. As a technical challenge, the model no longer possesses a closed-form likelihood function. In this paper, we present a scalable approximate inference algorithm that lets us fit the model to high-dimensional data sets. We show on three real-world examples from different domains that in the setup of binary labels, our algorithm leads to better prediction accuracies and also selects features which show less correlation with the confounding factors.Comment: Published version, 21 pages, 6 figure

arXiv.org e-Print Archive

MDC Repository

Feature Selection of Post-Graduation Income of College Students in the United States

Author: C Avery
D Goldberg
D Witteveen
DH Autor
JE Beasley
M Hall
M Hout
P Beaudry
R Chetty
RD Putnam
SR Lucas
SR Lucas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/05/2018
Field of study

This study investigated the most important attributes of the 6-year post-graduation income of college graduates who used financial aid during their time at college in the United States. The latest data released by the United States Department of Education was used. Specifically, 1,429 cohorts of graduates from three years (2001, 2003, and 2005) were included in the data analysis. Three attribute selection methods, including filter methods, forward selection, and Genetic Algorithm, were applied to the attribute selection from 30 relevant attributes. Five groups of machine learning algorithms were applied to the dataset for classification using the best selected attribute subsets. Based on our findings, we discuss the role of neighborhood professional degree attainment, parental income, SAT scores, and family college education in post-graduation incomes and the implications for social stratification.Comment: 14 pages, 6 tables, 3 figure

arXiv.org e-Print Archive

Crossref

Data mining as a tool for environmental scientists

Author: Athanasiadis Ioannis
Comas Joaquim
Frank Eibe
Gibert Karina
Letcher Rebecca
Spate Jessica
Sànchez-Marrè Miquel
Publication venue: International Environmental Modelling and Software Society
Publication date: 01/01/2006
Field of study

Over recent years a huge library of data mining algorithms has been developed to tackle a variety of problems in fields such as medical imaging and network traffic analysis. Many of these techniques are far more flexible than more classical modelling approaches and could be usefully applied to data-rich environmental problems. Certain techniques such as Artificial Neural Networks, Clustering, Case-Based Reasoning and more recently Bayesian Decision Networks have found application in environmental modelling while other methods, for example classification and association rule extraction, have not yet been taken up on any wide scale. We propose that these and other data mining techniques could be usefully applied to difficult problems in the field. This paper introduces several data mining concepts and briefly discusses their application to environmental modelling, where data may be sparse, incomplete, or heterogenous

Research Commons@Waikato

Biomedical Informatics Applications for Precision Management of Neurodegenerative Diseases

Author: Jimenez-Maggoria Gustavo
Lombardo Joseph
Miller Justin B.
Shan Guogen
Publication venue: Digital Scholarship@UNLV
Publication date: 01/01/2018
Field of study

Modern medicine is in the midst of a revolution driven by “big data,” rapidly advancing computing power, and broader integration of technology into healthcare. Highly detailed and individualized profiles of both health and disease states are now possible, including biomarkers, genomic profiles, cognitive and behavioral phenotypes, high-frequency assessments, and medical imaging. Although these data are incredibly complex, they can potentially be used to understand multi-determinant causal relationships, elucidate modifiable factors, and ultimately customize treatments based on individual parameters. Especially for neurodegenerative diseases, where an effective therapeutic agent has yet to be discovered, there remains a critical need for an interdisciplinary perspective on data and information management due to the number of unanswered questions. Biomedical informatics is a multidisciplinary field that falls at the intersection of information technology, computer and data science, engineering, and healthcare that will be instrumental for uncovering novel insights into neurodegenerative disease research, including both causal relationships and therapeutic targets and maximizing the utility of both clinical and research data. The present study aims to provide a brief overview of biomedical informatics and how clinical data applications such as clinical decision support tools can be developed to derive new knowledge from the wealth of available data to advance clinical care and scientific research of neurodegenerative diseases in the era of precision medicine

University of Nevada, Las Vegas Repository

Distributed classifier based on genetically engineered bacterial cell cultures

Author: Didovyk Andriy
Hasty Jeff
Huerta Ramón
Ivanchenko Mikhail V.
Kanakov Oleg I.
Tsimring Lev
Publication venue
Publication date: 21/05/2014
Field of study

We describe a conceptual design of a distributed classifier formed by a population of genetically engineered microbial cells. The central idea is to create a complex classifier from a population of weak or simple classifiers. We create a master population of cells with randomized synthetic biosensor circuits that have a broad range of sensitivities towards chemical signals of interest that form the input vectors subject to classification. The randomized sensitivities are achieved by constructing a library of synthetic gene circuits with randomized control sequences (e.g. ribosome-binding sites) in the front element. The training procedure consists in re-shaping of the master population in such a way that it collectively responds to the "positive" patterns of input signals by producing above-threshold output (e.g. fluorescent signal), and below-threshold output in case of the "negative" patterns. The population re-shaping is achieved by presenting sequential examples and pruning the population using either graded selection/counterselection or by fluorescence-activated cell sorting (FACS). We demonstrate the feasibility of experimental implementation of such system computationally using a realistic model of the synthetic sensing gene circuits.Comment: 31 pages, 9 figure

arXiv.org e-Print Archive

FigShare

Accelerated Particle Swarm Optimization and Support Vector Machine for Business Optimization and Applications

Author: A. Chatterjee
B. Scholkopf
C. Blum
D.E. Goldberg
G.R. Shi
G.R. Shi
J. Kennedy
J.C. Plate
K. Kim
L.-X. Liu
M. Clerc
N. Lu
P.F. Pai
R. Kohavi
R. Kolisch
S. Hartmann
T. Howley
V. Vapnik
V. Vapnik
X.-S. Yang
X.-S. Yang
X.S. Yang
X.S. Yang
X.S. Yang
Publication venue
Publication date: 01/01/2011
Field of study

Business optimization is becoming increasingly important because all business activities aim to maximize the profit and performance of products and services, under limited resources and appropriate constraints. Recent developments in support vector machine and metaheuristics show many advantages of these techniques. In particular, particle swarm optimization is now widely used in solving tough optimization problems. In this paper, we use a combination of a recently developed Accelerated PSO and a nonlinear support vector machine to form a framework for solving business optimization problems. We first apply the proposed APSO-SVM to production optimization, and then use it for income prediction and project scheduling. We also carry out some parametric studies and discuss the advantages of the proposed metaheuristic SVM.Comment: 12 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Middlesex University Research Repository