60,274 research outputs found

    Hierarchical meta-rules for scalable meta-learning

    Get PDF
    The Pairwise Meta-Rules (PMR) method proposed in [18] has been shown to improve the predictive performances of several metalearning algorithms for the algorithm ranking problem. Given m target objects (e.g., algorithms), the training complexity of the PMR method with respect to m is quadratic: (formula presented). This is usually not a problem when m is moderate, such as when ranking 20 different learning algorithms. However, for problems with a much larger m, such as the meta-learning-based parameter ranking problem, where m can be 100+, the PMR method is less efficient. In this paper, we propose a novel method named Hierarchical Meta-Rules (HMR), which is based on the theory of orthogonal contrasts. The proposed HMR method has a linear training complexity with respect to m, providing a way of dealing with a large number of objects that the PMR method cannot handle efficiently. Our experimental results demonstrate the benefit of the new method in the context of meta-learning

    Motif Discovery through Predictive Modeling of Gene Regulation

    Full text link
    We present MEDUSA, an integrative method for learning motif models of transcription factor binding sites by incorporating promoter sequence and gene expression data. We use a modern large-margin machine learning approach, based on boosting, to enable feature selection from the high-dimensional search space of candidate binding sequences while avoiding overfitting. At each iteration of the algorithm, MEDUSA builds a motif model whose presence in the promoter region of a gene, coupled with activity of a regulator in an experiment, is predictive of differential expression. In this way, we learn motifs that are functional and predictive of regulatory response rather than motifs that are simply overrepresented in promoter sequences. Moreover, MEDUSA produces a model of the transcriptional control logic that can predict the expression of any gene in the organism, given the sequence of the promoter region of the target gene and the expression state of a set of known or putative transcription factors and signaling molecules. Each motif model is either a kk-length sequence, a dimer, or a PSSM that is built by agglomerative probabilistic clustering of sequences with similar boosting loss. By applying MEDUSA to a set of environmental stress response expression data in yeast, we learn motifs whose ability to predict differential expression of target genes outperforms motifs from the TRANSFAC dataset and from a previously published candidate set of PSSMs. We also show that MEDUSA retrieves many experimentally confirmed binding sites associated with environmental stress response from the literature.Comment: RECOMB 200

    A survey on utilization of data mining approaches for dermatological (skin) diseases prediction

    Get PDF
    Due to recent technology advances, large volumes of medical data is obtained. These data contain valuable information. Therefore data mining techniques can be used to extract useful patterns. This paper is intended to introduce data mining and its various techniques and a survey of the available literature on medical data mining. We emphasize mainly on the application of data mining on skin diseases. A categorization has been provided based on the different data mining techniques. The utility of the various data mining methodologies is highlighted. Generally association mining is suitable for extracting rules. It has been used especially in cancer diagnosis. Classification is a robust method in medical mining. In this paper, we have summarized the different uses of classification in dermatology. It is one of the most important methods for diagnosis of erythemato-squamous diseases. There are different methods like Neural Networks, Genetic Algorithms and fuzzy classifiaction in this topic. Clustering is a useful method in medical images mining. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. Clustering has some applications in dermatology. Besides introducing different mining methods, we have investigated some challenges which exist in mining skin data
    corecore