13 research outputs found

    A New Discrete Particle Swarm Algorithm Applied to Attribute Selection in a Bioinformatics Data Set

    Get PDF
    Many data mining applications involve the task of build- ing a model for predictive classification. The goal of such a model is to classify examples (records or data instances) into classes or categories of the same type. The use of variables (attributes) not related to the classes can reduce the accu- racy and reliability of a classification or prediction model. Superfluous variables can also increase the costs of build- ing a model - particularly on large data sets. We propose a discrete Particle Swarm Optimization (PSO) algorithm de- signed for attribute selection. The proposed algorithm deals with discrete variables, and its population of candidate solu- tions contains particles of different sizes. The performance of this algorithm is compared with the performance of a standard binary PSO algorithm on the task of selecting at- tributes in a bioinformatics data set. The criteria used for comparison are: (1) maximizing predictive accuracy; and (2) finding the smallest subset of attributes

    COordination of Standards in MetabOlomicS (COSMOS): facilitating integrated metabolomics data access

    Get PDF
    Metabolomics has become a crucial phenotyping technique in a range of research fields including medicine, the life sciences, biotechnology and the environmental sciences. This necessitates the transfer of experimental information between research groups, as well as potentially to publishers and funders. After the initial efforts of the metabolomics standards initiative, minimum reporting standards were proposed which included the concepts for metabolomics databases. Built by the community, standards and infrastructure for metabolomics are still needed to allow storage, exchange, comparison and re-utilization of metabolomics data. The Framework Programme 7 EU Initiative ‘coordination of standards in metabolomics’ (COSMOS) is developing a robust data infrastructure and exchange standards for metabolomics data and metadata. This is to support workflows for a broad range of metabolomics applications within the European metabolomics community and the wider metabolomics and biomedical communities’ participation. Here we announce our concepts and efforts asking for re-engagement of the metabolomics community, academics and industry, journal publishers, software and hardware vendors, as well as those interested in standardisation worldwide (addressing missing metabolomics ontologies, complex-metadata capturing and XML based open source data exchange format), to join and work towards updating and implementing metabolomics standards

    Glycemia but not the Metabolic Syndrome is Associated with Cognitive Decline: Findings from the European Male Ageing Study

    Get PDF
    © 2017 American Association for Geriatric Psychiatry. Objective Previous research has indicated that components of the metabolic syndrome (MetS), such as hyperglycemia and hypertension, are negatively associated with cognition. However, evidence that MetS itself is related to cognitive performance has been inconsistent. This longitudinal study investigates whether MetS or its components affect cognitive decline in aging men and whether any interaction with inflammation exists. Methods Over a mean of 4.4 years (SD ± 0.3), men aged 40–79 years from the multicenter European Male Ageing Study were recruited. Cognitive functioning was assessed using the Rey-Osterrieth Complex Figure (ROCF), the Camden Topographical Recognition Memory (CTRM) task, and the Digit Symbol Substitution Test (DSST). High-sensitivity C-reactive protein (hs-CRP) levels were measured using a chemiluminescent immunometric assay. Results Overall, 1,913 participants contributed data to the ROCF analyses and 1,965 subjects contributed to the CTRM and DSST analyses. In multiple regression models the presence of baseline MetS was not associated with cognitive decline over time (p  >  0.05). However, logistic ordinal regressions indicated that high glucose levels were related to a greater risk of decline on the ROCF Copy (β = −0.42, p  <  0.05) and the DSST (β = −0.39, p  <  0.001). There was neither a main effect of hs-CRP levels nor an interaction effect of hs-CRP and MetS at baseline on cognitive decline. Conclusion No evidence was found for a relationship between MetS or inflammation and cognitive decline in this sample of aging men. However, glycemia was negatively associated with visuoconstructional abilities and processing speed

    Integrating Bayesian networks and Simpson's paradox in data mining

    Get PDF
    This paper proposes to integrate two very different kinds of methods for data mining, namely the construction of Bayesian networks from data and the detection of occurrences of Simpson’s paradox. The former aims at discovering potentially causal knowledge in the data, whilst the latter aims at detecting surprising patterns in he data. By integrating these two kinds of methods we can hopefully discover patterns which are more likely to be useful to the user, a challenging data mining goal which is under-explored in the literature. The proposed integration method involves two approaches. The first approach uses the detection of occurrences of Simpson’s paradox as a preprocessing for a more effective construction of Bayesian networks; whilst the second approach uses the construction of a Bayesian network from data as a preprocessing for the detection of occurrences of Simpson’s paradox

    Particle swarm for attribute selection in Bayesian classification: an application to protein function prediction

    Get PDF
    The discrete particle swarm optimization (DPSO) algorithm is an optimization technique which belongs to the fertile paradigm of Swarm Intelligence. Designed for the task of attribute selection, the DPSO deals with discrete variables in a straightforward manner. This work empowers the DPSO algorithm by extending it in two ways. First, it enables the DPSO to select attributes for a Bayesian network algorithm; which is more sophisticated than the Naive Bayes classifier previously used by the original DPSO algorithm. Second, it applies the DPSO to a set of challenging protein functional classification data, involving a large number of classes to be predicted. The work then compares the performance of the DPSO algorithm against the performance of a standard Binary PSO algorithm on the task of selecting attributes on those data sets. The criteria used for this comparison are (1) maximizing predictive accuracy, and (2) finding the smallest subset of attributes

    Particle swarm and bayesian networks applied to attribute selection for protein functional classification

    No full text
    The Discrete Particle Swarm (DPSO) algorithm is an optimizationmethod that belongs to the fertile paradigm of Swarm Intelligence. The DPSO was designed for the task of attribute selection and it deals with discrete variables in a straightforward manner. This work extends the DPSO algorithm in two ways. First, we enable the DPSO to select attributes for a Bayesian network algorithm, which is a much more sophisticated algorithm than the Naive Bayes classifier previously used by this algorithm. Second, we apply the DPSO to a challenging protein functional classification data set, involving a large number of classes to be predicted. The performance of the DPSO is compared to the performance of a Binary PSO on the task of selecting attributes in this challenging data set. The criteria used for comparison are: (1) maximizing predictive accuracy; and (2) finding the smallest subset of attributes

    A genetic algorithm for solving a capacitated p-median problem

    No full text
    Facility-location problems have several applications, such as telecommunications, industrial transportation and distribution. One of the most well-known facility-location problems is the p-median problem. This work addresses an application of the capacitated p-median problem to a real-world problem. We propose a genetic algorithm (GA) to solve the capacitated p-median problem. The proposed GA uses not only conventional genetic operators, but also a new heuristic "hypermutation" operator suggested in this work. The proposed GA is compared with a tabu search algorithm

    Integrating multiple analytical platforms and chemometrics for comprehensive metabolic profiling: application to meat spoilage detection

    No full text
    Untargeted metabolic profiling has become a common approach to attempt to understand biological systems. However, due to the large chemical diversity in the metabolites it is generally necessary to employ multiple analytical platforms so as to encompass a wide range of metabolites. Thus it is beneficial to find chemometrics approaches which can effectively integrate data generated from multiple platforms and ideally combine the strength of each platform and overcome their inherent weaknesses; most pertinent is with respect to limited chemistries. We have reported a few studies using untargeted metabolic profiling techniques to monitor the natural spoilage process in pork and also to detect specific metabolites associated with contaminations with the pathogen Salmonella typhimurium. One method used was to analyse the volatile organic compounds (VoCs) generated throughout the spoilage process while the other was to analyse the soluble small molecule metabolites (SMM) extracted from the microbial community, as well as from the surface of the spoiled/contaminated meat. In this study, we exploit multi-block principal component analysis (MB-PCA) and multi-block partial least squares (MB-PLS) to combine the VoCs and SMM data together and compare the results obtained by analysing each data set individually. We show that by combining the two data sets and applying appropriate chemometrics, a model with much better prediction and importantly with improved interpretability was obtained. The MB-PCA model was able to combine the strength of both platforms together and generated a model with high consistency with the biological expectations, despite its unsupervised nature. MB-PLS models also achieved the best over-all performance in modelling the spoilage progression and discriminating the naturally spoiled samples and the pathogen contaminated samples. Correlation analysis and Bayesian network analysis were also performed to elucidate which metabolites were correlated strongly in the two data sets and such information could add additional information in understanding the meat spoilage process
    corecore