Search CORE

11,898 research outputs found

Data mining as a tool for environmental scientists

Author: Athanasiadis Ioannis
Comas Joaquim
Frank Eibe
Gibert Karina
Letcher Rebecca
Spate Jessica
Sànchez-Marrè Miquel
Publication venue: International Environmental Modelling and Software Society
Publication date: 01/01/2006
Field of study

Over recent years a huge library of data mining algorithms has been developed to tackle a variety of problems in fields such as medical imaging and network traffic analysis. Many of these techniques are far more flexible than more classical modelling approaches and could be usefully applied to data-rich environmental problems. Certain techniques such as Artificial Neural Networks, Clustering, Case-Based Reasoning and more recently Bayesian Decision Networks have found application in environmental modelling while other methods, for example classification and association rule extraction, have not yet been taken up on any wide scale. We propose that these and other data mining techniques could be usefully applied to difficult problems in the field. This paper introduces several data mining concepts and briefly discusses their application to environmental modelling, where data may be sparse, incomplete, or heterogenous

Research Commons@Waikato

Machine learning and data mining frameworks for predicting drug response in cancer:An overview and a novel <i>in silico</i> screening process based on association rule mining

Author: Aas
Abrams
Agrawal
Alexander Polyzos
Alexandrov
Ali
Aliper
Ammad-ud-din
Andersson
Andreas Ntargaras
Antoniou
Aristotelis Tsirigos
Athanassios Kotsinas
Azuaje
Baldari
Barretina
Bartkova
Beesley
Bengio
Bertacchini
Bishop
Blachly
Blumenschein
Breiman
Breiman
Brookshear
Byers
Byron
Campbell
Canela
Caponigro
Carracedo
Chang
Chen
Chen
Chiu
Cortes
Corte´s-Ciriano I., van Westen, G.J., Bouvier, G., et al.
Costello
Coudray
Crespi
Creswell
Cuadrado
Daemen
Das
Das Thakur
Day
Dev
Dhillon
Di Micco
Dietterich
Dimitris Thanos
Eisfeld
Enslen
Evangelou
Falgreen
Fang
Fey
Filippos Koinis
Forbes
Friedman
Frismantas
Galanos
Galanos
Garnett
Geeleher
George-Romanos P. Foukas
Gillet
Gorgoulis
Guinney
Gupta
Haar
Haeuw
Halazonetis
Hanahan
Hanahan
Hastie
Henderson
Hills
Hinton
Hinton
Hinton
Hoadley
Holland
Hua Zhou
Hui
Hussmann
Iannis Aifantis
Iorio
James
Jang
Jin
Jiri Bartek
Kanda
Karakaidos
Kastenhuber
Kelland
Kiaris
Kim
Kim
Kleppmann
Knudson
Koinis
Komseli
Konstantinos Vougas
Kragelj
Lacombe
Laplante
LeCun
Lee
Leonidas Alexopoulos
Li
Li
Liang
Libbrecht
Liontos
Lior
Liu
Liu
Logue
Long
Lovitt
Lu
Lunn
Luo
Maron
Masica
McCain
McCulloch
Mehta
Mendelsohn
Menden
Menden
Meng
Milligan
Min
Mirman
Moghaddas Gholami
Muller
Murase
Negrini
Nelder
Neto
Nicolau
Nidheesh
Niepel
Noll
Noordermeer
Núñez-Enríquez
O'Connor
Padovano
Palmirotta
Park
Paul A. Townsend
Pearson
Pemovska
Pereira
Perez
Petrakis
Petros Sfikakis
Planchard
Popovics
Porter
Pritchard
Pritchard
Rampášek
Rangel
Rebecca Fitzgerald
Rickardson
Rodriguez-Escudero
Roidl
Ross
Ruder
Rusnak
Russell Petty
Sahai
Sami
Santana-Codina
Schmidhuber
Schreuer
Seashore-Ludlow
Sethi
Shoemaker
Sideridou
Siolas
Sonali Narang
Steckel
Stone
Stransky
Su
Sueoka
Sun
Taghanaki
Talwar
Tan
Tan
Tentler
Theodore Sakellaropoulos
Tominaga
Tran
Triantaphyllou
Trilla-Fuertes
Turajlic
Turki
Tyner
Ulivi
van de Schoot
van der Maaten
van't Veer
Varmus
Vassilios Myrianthopoulos
Vassilis G. Gorgoulis
Vassilis Georgoulias
Wang
Wang
Wang
Wang
Wang
Weinstein
Weiss
Wu
Wu
Xu
Xu
Yamada
Yan
Yang
Yang
Yeh
Zhang
Zhang
Zhang
Zhao
Zhao
Zheng
Zhong
Zhong
Publication venue: 'Elsevier BV'
Publication date: 01/11/2019
Field of study

Crossref

The University of Manchester - Institutional Repository

University of Dundee Online Publications

A Comparative Analysis of Ensemble Classifiers: Case Studies in Genomics

Author: Pandey Gaurav
Whalen Sean
Publication venue
Publication date: 19/09/2013
Field of study

The combination of multiple classifiers using ensemble methods is increasingly important for making progress in a variety of difficult prediction problems. We present a comparative analysis of several ensemble methods through two case studies in genomics, namely the prediction of genetic interactions and protein functions, to demonstrate their efficacy on real-world datasets and draw useful conclusions about their behavior. These methods include simple aggregation, meta-learning, cluster-based meta-learning, and ensemble selection using heterogeneous classifiers trained on resampled data to improve the diversity of their predictions. We present a detailed analysis of these methods across 4 genomics datasets and find the best of these methods offer statistically significant improvements over the state of the art in their respective domains. In addition, we establish a novel connection between ensemble selection and meta-learning, demonstrating how both of these disparate methods establish a balance between ensemble diversity and performance.Comment: 10 pages, 3 figures, 8 tables, to appear in Proceedings of the 2013 International Conference on Data Minin

arXiv.org e-Print Archive

Crossref

Machine learning for the prediction of protein-protein interactions

Author: Reyes José Antonio
Publication venue
Publication date: 01/01/2009
Field of study

The prediction of protein-protein interactions (PPI) has recently emerged as an important problem in the fields of bioinformatics and systems biology, due to the fact that most essential cellular processes are mediated by these kinds of interactions. In this thesis we focussed in the prediction of co-complex interactions, where the objective is to identify and characterize protein pairs which are members of the same protein complex. Although high-throughput methods for the direct identification of PPI have been developed in the last years. It has been demonstrated that the data obtained by these methods is often incomplete and suffers from high false-positive and false-negative rates. In order to deal with this technology-driven problem, several machine learning techniques have been employed in the past to improve the accuracy and trustability of predicted protein interacting pairs, demonstrating that the combined use of direct and indirect biological insights can improve the quality of predictive PPI models. This task has been commonly viewed as a binary classification problem. However, the nature of the data creates two major problems. Firstly, the imbalanced class problem due to the number of positive examples (pairs of proteins which really interact) being much smaller than the number of negative ones. Secondly, the selection of negative examples is based on some unreliable assumptions which could introduce some bias in the classification results. The first part of this dissertation addresses these drawbacks by exploring the use of one-class classification (OCC) methods to deal with the task of prediction of PPI. OCC methods utilize examples of just one class to generate a predictive model which is consequently independent of the kind of negative examples selected; additionally these approaches are known to cope with imbalanced class problems. We designed and carried out a performance evaluation study of several OCC methods for this task. We also undertook a comparative performance evaluation with several conventional learning techniques. Furthermore, we pay attention to a new potential drawback which appears to affect the performance of PPI prediction. This is associated with the composition of the positive gold standard set, which contain a high proportion of examples associated with interactions of ribosomal proteins. We demonstrate that this situation indeed biases the classification task, resulting in an over-optimistic performance result. The prediction of non-ribosomal PPI is a much more difficult task. We investigate some strategies in order to improve the performance of this subtask, integrating new kinds of data as well as combining diverse classification models generated from different sets of data. In this thesis, we undertook a preliminary validation study of the new PPI predicted by using OCC methods. To achieve this, we focus in three main aspects: look for biological evidence in the literature that support the new predictions; the analysis of predicted PPI networks properties; and the identification of highly interconnected groups of proteins which can be associated with new protein complexes. Finally, this thesis explores a slightly different area, related to the prediction of PPI types. This is associated with the classification of PPI structures (complexes) contained in the Protein Data Bank (PDB) data base according to its function and binding affinity. Considering the relatively reduced number of crystalized protein complexes available, it is not possible at the moment to link these results with the ones obtained previously for the prediction of PPI complexes. However, this could be possible in the near future when more PPI structures will be available

Glasgow Theses Service

CiteSeerX

OpenGrey Repository

Survey of Data Mining and Applications (Review from 1996 to Now)

Author: Karahoca Adem
Karahoca Dilek
Şanver Mert
Publication venue: 'IntechOpen'
Publication date: 29/08/2012
Field of study

IntechOpen

A Microscopic Simulation Laboratory for Evaluation of Off-street Parking Systems

Author: Yuan Yun
Publication venue: UWM Digital Commons
Publication date: 01/12/2018
Field of study

The parking industry produces an enormous amount of data every day that, properly analyzed, will change the way the industry operates. The collected data form patterns that, in most cases, would allow parking operators and property owners to better understand how to maximize revenue and decrease operating expenses and support the decisions such as how to set specific parking policies (e.g. electrical charging only parking space) to achieve the sustainable and eco-friendly parking. However, there lacks an intelligent tool to assess the layout design and operational performance of parking lots to reduce the externalities and increase the revenue. To address this issue, this research presents a comprehensive agent-based framework for microscopic off-street parking system simulation. A rule-based parking simulation logic programming model is formulated. The proposed simulation model can effectively capture the behaviors of drivers and pedestrians as well as spatial and temporal interactions of traffic dynamics in the parking system. A methodology for data collection, processing, and extraction of user behaviors in the parking system is also developed. A Long-Short Term Memory (LSTM) neural network is used to predict the arrival and departure of the vehicles. The proposed simulator is implemented in Java and a Software as a Service (SaaS) graphic user interface is designed to analyze and visualize the simulation results. This study finds the active capacity of the parking system, which is defined as the largest number of actively moving vehicles in the parking system under the facility layout. In the system application of the real world testbed, the numerical tests show (a) the smart check-in device has marginal benefits in vehicle waiting time; (b) the flexible pricing policy may increase the average daily revenue if the elasticity of the price is not involved; (c) the number of electrical charging only spots has a negative impact on the performance of the parking facility; and (d) the rear-in only policy may increase the duration of parking maneuvers and reduce the efficiency during the arrival rush hour. Application of the developed simulation system using a real-world case demonstrates its capability of providing informative quantitative measures to support decisions in designing, maintaining, and operating smart parking facilities

University of Wisconsin-Milwaukee

ProtFus:A Comprehensive Method Characterizing Protein-Protein Interactions of Fusion Proteins

Author: Frenkel-Morgenstern Milana
Gorohovski Alessandro
Jensen Lars Juhl
Tagore Somnath
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2019
Field of study

Copenhagen University Research Information System

Data analytics 2016: proceedings of the fifth international conference on data analytics

Author: Bhulai Sandjai
Semanjski Ivana
Publication venue: The International Academy, Research and Industry Association
Publication date: 01/01/2016
Field of study

VU Research Portal

Ghent University Academic Bibliography