252 research outputs found
Cis-regulatory module detection using constraint programming
We propose a method for finding CRMs in a set of co-regulated genes. Each CRM consists of a set of binding sites of transcription factors. We wish to find CRMs involving the same transcription factors in multiple sequences. Finding such a combination of transcription factors is inherently a combinatorial problem. We solve this problem by combining the principles of itemset mining and constraint programming. The constraints involve the putative binding sites of transcription factors, the number of sequences in which they co-occur and the proximity of the binding sites. Genomic background sequences are used to assess the significance of the modules. We experimentally validate our approach and compare it with state-of-the-art techniques
A Revised Publication Model for ECML PKDD
ECML PKDD is the main European conference on machine learning and data
mining. Since its foundation it implemented the publication model common in
computer science: there was one conference deadline; conference submissions
were reviewed by a program committee; papers were accepted with a low
acceptance rate. Proceedings were published in several Springer Lecture Notes
in Artificial (LNAI) volumes, while selected papers were invited to special
issues of the Machine Learning and Data Mining and Knowledge Discovery
journals. In recent years, this model has however come under stress. Problems
include: reviews are of highly variable quality; the purpose of bringing the
community together is lost; reviewing workloads are high; the information
content of conferences and journals decreases; there is confusion among
scientists in interdisciplinary contexts. In this paper, we present a new
publication model, which will be adopted for the ECML PKDD 2013 conference, and
aims to solve some of the problems of the traditional model. The key feature of
this model is the creation of a journal track, which is open to submissions all
year long and allows for revision cycles.Comment: 13 page
Mining Patterns in Networks using Homomorphism
In recent years many algorithms have been developed for finding patterns in
graphs and networks. A disadvantage of these algorithms is that they use
subgraph isomorphism to determine the support of a graph pattern; subgraph
isomorphism is a well-known NP complete problem. In this paper, we propose an
alternative approach which mines tree patterns in networks by using subgraph
homomorphism. The advantage of homomorphism is that it can be computed in
polynomial time, which allows us to develop an algorithm that mines tree
patterns in arbitrary graphs in incremental polynomial time. Homomorphism
however entails two problems not found when using isomorphism: (1) two patterns
of different size can be equivalent; (2) patterns of unbounded size can be
frequent. In this paper we formalize these problems and study solutions that
easily fit within our algorithm
Unveiling combinatorial regulation through the combination of ChIP information and in silico cis-regulatory module detection
Computationally retrieving biologically relevant cis-regulatory modules (CRMs) is not straightforward. Because of the large number of candidates and the imperfection of the screening methods, many spurious CRMs are detected that are as high scoring as the biologically true ones. Using ChIP-information allows not only to reduce the regions in which the binding sites of the assayed transcription factor (TF) should be located, but also allows restricting the valid CRMs to those that contain the assayed TF (here referred to as applying CRM detection in a query-based mode). In this study, we show that exploiting ChIP-information in a query-based way makes in silico CRM detection a much more feasible endeavor. To be able to handle the large datasets, the query-based setting and other specificities proper to CRM detection on ChIP-Seq based data, we developed a novel powerful CRM detection method 'CPModule'. By applying it on a well-studied ChIP-Seq data set involved in self-renewal of mouse embryonic stem cells, we demonstrate how our tool can recover combinatorial regulation of five known TFs that are key in the self-renewal of mouse embryonic stem cells. Additionally, we make a number of new predictions on combinatorial regulation of these five key TFs with other TFs documented in TRANSFAC
Using an interpretable Machine Learning approach to study the drivers of International Migration
Globally increasing migration pressures call for new modelling approaches in
order to design effective policies. It is important to have not only efficient
models to predict migration flows but also to understand how specific
parameters influence these flows. In this paper, we propose an artificial
neural network (ANN) to model international migration. Moreover, we use a
technique for interpreting machine learning models, namely Partial Dependence
Plots (PDP), to show that one can well study the effects of drivers behind
international migration. We train and evaluate the model on a dataset
containing annual international bilateral migration from to from
origin countries to mainly OECD destinations, along with the main
determinants as identified in the migration literature. The experiments carried
out confirm that: 1) the ANN model is more efficient w.r.t. a traditional
model, and 2) using PDP we are able to gain additional insights on the specific
effects of the migration drivers. This approach provides much more information
than only using the feature importance information used in previous works
Mining local staircase patterns in noisy data
Most traditional biclustering algorithms identify biclusters with no or little overlap. In this paper, we introduce the problem of identifying staircases of biclusters. Such staircases may be indicative for causal relationships between columns and can not easily be identified by existing biclustering algorithms. Our formalization relies on a scoring function based on the Minimum Description Length principle. Furthermore, we propose a first algorithm for identifying staircase biclusters, based on a combination of local search and constraint programming. Experiments show that the approach is promising
- ā¦