252 research outputs found

    Cis-regulatory module detection using constraint programming

    Get PDF
    We propose a method for finding CRMs in a set of co-regulated genes. Each CRM consists of a set of binding sites of transcription factors. We wish to find CRMs involving the same transcription factors in multiple sequences. Finding such a combination of transcription factors is inherently a combinatorial problem. We solve this problem by combining the principles of itemset mining and constraint programming. The constraints involve the putative binding sites of transcription factors, the number of sequences in which they co-occur and the proximity of the binding sites. Genomic background sequences are used to assess the significance of the modules. We experimentally validate our approach and compare it with state-of-the-art techniques

    A Revised Publication Model for ECML PKDD

    Full text link
    ECML PKDD is the main European conference on machine learning and data mining. Since its foundation it implemented the publication model common in computer science: there was one conference deadline; conference submissions were reviewed by a program committee; papers were accepted with a low acceptance rate. Proceedings were published in several Springer Lecture Notes in Artificial (LNAI) volumes, while selected papers were invited to special issues of the Machine Learning and Data Mining and Knowledge Discovery journals. In recent years, this model has however come under stress. Problems include: reviews are of highly variable quality; the purpose of bringing the community together is lost; reviewing workloads are high; the information content of conferences and journals decreases; there is confusion among scientists in interdisciplinary contexts. In this paper, we present a new publication model, which will be adopted for the ECML PKDD 2013 conference, and aims to solve some of the problems of the traditional model. The key feature of this model is the creation of a journal track, which is open to submissions all year long and allows for revision cycles.Comment: 13 page

    Mining Patterns in Networks using Homomorphism

    Full text link
    In recent years many algorithms have been developed for finding patterns in graphs and networks. A disadvantage of these algorithms is that they use subgraph isomorphism to determine the support of a graph pattern; subgraph isomorphism is a well-known NP complete problem. In this paper, we propose an alternative approach which mines tree patterns in networks by using subgraph homomorphism. The advantage of homomorphism is that it can be computed in polynomial time, which allows us to develop an algorithm that mines tree patterns in arbitrary graphs in incremental polynomial time. Homomorphism however entails two problems not found when using isomorphism: (1) two patterns of different size can be equivalent; (2) patterns of unbounded size can be frequent. In this paper we formalize these problems and study solutions that easily fit within our algorithm

    Unveiling combinatorial regulation through the combination of ChIP information and in silico cis-regulatory module detection

    Get PDF
    Computationally retrieving biologically relevant cis-regulatory modules (CRMs) is not straightforward. Because of the large number of candidates and the imperfection of the screening methods, many spurious CRMs are detected that are as high scoring as the biologically true ones. Using ChIP-information allows not only to reduce the regions in which the binding sites of the assayed transcription factor (TF) should be located, but also allows restricting the valid CRMs to those that contain the assayed TF (here referred to as applying CRM detection in a query-based mode). In this study, we show that exploiting ChIP-information in a query-based way makes in silico CRM detection a much more feasible endeavor. To be able to handle the large datasets, the query-based setting and other specificities proper to CRM detection on ChIP-Seq based data, we developed a novel powerful CRM detection method 'CPModule'. By applying it on a well-studied ChIP-Seq data set involved in self-renewal of mouse embryonic stem cells, we demonstrate how our tool can recover combinatorial regulation of five known TFs that are key in the self-renewal of mouse embryonic stem cells. Additionally, we make a number of new predictions on combinatorial regulation of these five key TFs with other TFs documented in TRANSFAC

    Using an interpretable Machine Learning approach to study the drivers of International Migration

    Full text link
    Globally increasing migration pressures call for new modelling approaches in order to design effective policies. It is important to have not only efficient models to predict migration flows but also to understand how specific parameters influence these flows. In this paper, we propose an artificial neural network (ANN) to model international migration. Moreover, we use a technique for interpreting machine learning models, namely Partial Dependence Plots (PDP), to show that one can well study the effects of drivers behind international migration. We train and evaluate the model on a dataset containing annual international bilateral migration from 19601960 to 20102010 from 175175 origin countries to 3333 mainly OECD destinations, along with the main determinants as identified in the migration literature. The experiments carried out confirm that: 1) the ANN model is more efficient w.r.t. a traditional model, and 2) using PDP we are able to gain additional insights on the specific effects of the migration drivers. This approach provides much more information than only using the feature importance information used in previous works

    Mining local staircase patterns in noisy data

    Get PDF
    Most traditional biclustering algorithms identify biclusters with no or little overlap. In this paper, we introduce the problem of identifying staircases of biclusters. Such staircases may be indicative for causal relationships between columns and can not easily be identified by existing biclustering algorithms. Our formalization relies on a scoring function based on the Minimum Description Length principle. Furthermore, we propose a first algorithm for identifying staircase biclusters, based on a combination of local search and constraint programming. Experiments show that the approach is promising
    • ā€¦
    corecore