3,811 research outputs found

    Computational annotation of UTR cis-regulatory modules through Frequent Pattern Mining

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many studies report about detection and functional characterization of cis-regulatory motifs in untranslated regions (UTRs) of mRNAs but little is known about the nature and functional role of their distribution. To address this issue we have developed a computational approach based on the use of data mining techniques. The idea is that of mining frequent combinations of translation regulatory motifs, since their significant co-occurrences could reveal functional relationships important for the post-transcriptional control of gene expression. The experimentation has been focused on targeted mitochondrial transcripts to elucidate the role of translational control in mitochondrial biogenesis and function.</p> <p>Results</p> <p>The analysis is based on a two-stepped procedure using a sequential pattern mining algorithm. The first step searches for frequent patterns (FPs) of motifs without taking into account their spatial displacement. In the second step, frequent sequential patterns (FSPs) of spaced motifs are generated by taking into account the conservation of spacers between each ordered pair of co-occurring motifs. The algorithm makes no assumption on the relation among motifs and on the number of motifs involved in a pattern. Different FSPs can be found depending on different combinations of two parameters, i.e. the threshold of the minimum percentage of sequences supporting the pattern, and the granularity of spacer discretization. Results can be retrieved at the UTRminer web site: <url>http://utrminer.ba.itb.cnr.it/</url>. The discovered FPs of motifs amount to 216 in the overall dataset and to 140 in the human subset. For each FP, the system provides information on the discovered FSPs, if any. A variety of search options help users in browsing the web resource. The list of sequence IDs supporting each pattern can be used for the retrieval of information from the UTRminer database.</p> <p>Conclusion</p> <p>Computational prediction of structural properties of regulatory sequences is not trivial. The presented data mining approach is able to overcome some limits observed in other competitive tools. Preliminary results on UTR sequences from nuclear transcripts targeting mitochondria are promising and lead us to be confident on the effectiveness of the approach for future developments.</p

    Clustering Customer Shopping Trips With Network Structure

    Get PDF
    Moving objects can be tracked with sensors such as RFID tags or GPS devices. Their movement can be represented as sequences of time-stamped locations. Studying such spatio-temporal movement sequences to discover spatial sequential patterns holds promises in many real-world settings. A few interesting applications are customer shopping traverse pattern discovery, vehicle traveling pattern discovery, and route prediction. Traditional spatial data mining algorithms suitable for the Euclidean space are not directly applicable in these settings. We propose a new algorithm to cluster movement paths such as shopping trips for pattern discovery. In our work, we represent the spatio-temporal series as sequences of discrete locations following a pre-defined network. We incorporate a modified version of the Longest Common Subsequence (LCS) algorithm with the network structure to measure the similarity of movement paths. With such spatial networks we implicitly address the existence of spatial obstructs as well. Experiments were performed on both hand-collected real-life trips and simulated trips in grocery shopping. The initial evaluation results show that our proposed approach, called Net-LCSS, can be used to support effective and efficient clustering for shopping trip pattern discovery

    Mining patterns in complex data

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    An ensemble learning approach to reverse-engineering transcriptional regulatory networks from time-series gene expression data

    Get PDF
    Background One of the most challenging tasks in the post-genomic era is to reconstruct the transcriptional regulatory networks. The goal is to reveal, for each gene that responds to a certain biological event, which transcription factors affect its expression, and how a set of transcription factors coordinate to accomplish temporal and spatial specific regulations. Results Here we propose a supervised machine learning approach to address these questions. We focus our study on the gene transcriptional regulation of the cell cycle in the budding yeast, thanks to the large amount of data available and relatively well-understood biology, although the main ideas of our method can be applied to other data as well. Our method starts with building an ensemble of decision trees for each microarray data to capture the association between the expression levels of yeast genes and the binding of transcription factors to gene promoter regions, as determined by chromatin immunoprecipitation microarray (ChIP-chip) experiment. Cross-validation experiments show that the method is more accurate and reliable than the naive decision tree algorithm and several other ensemble learning methods. From the decision tree ensembles, we extract logical rules that explain how a set of transcription factors act in concert to regulate the expression of their targets. We further compute a profile for each rule to show its regulation strengths at different time points. We also propose a spline interpolation method to integrate the rule profiles learned from several time series expression data sets that measure the same biological process. We then combine these rule profiles to build a transcriptional regulatory network for the yeast cell cycle. Compared to the results in the literature, our method correctly identifies all major known yeast cell cycle transcription factors, and assigns them into appropriate cell cycle phases. Our method also identifies many interesting synergetic relationships among these transcription factors, most of which are well known, while many of the rest can also be supported by other evidences. Conclusion The high accuracy of our method indicates that our method is valid and robust. As more gene expression and transcription factor binding data become available, we believe that our method is useful for reconstructing large-scale transcriptional regulatory networks in other species as well

    An Ensemble Learning Approach to Reverse-Engineering Transcriptional Regulatory Networks from Time-Series Gene Expression Data

    Get PDF
    Background One of the most challenging tasks in the post-genomic era is to reconstruct the transcriptional regulatory networks. The goal is to reveal, for each gene that responds to a certain biological event, which transcription factors affect its expression, and how a set of transcription factors coordinate to accomplish temporal and spatial specific regulations. Results Here we propose a supervised machine learning approach to address these questions. We focus our study on the gene transcriptional regulation of the cell cycle in the budding yeast, thanks to the large amount of data available and relatively well-understood biology, although the main ideas of our method can be applied to other data as well. Our method starts with building an ensemble of decision trees for each microarray data to capture the association between the expression levels of yeast genes and the binding of transcription factors to gene promoter regions, as determined by chromatin immunoprecipitation microarray (ChIP-chip) experiment. Cross-validation experiments show that the method is more accurate and reliable than the naive decision tree algorithm and several other ensemble learning methods. From the decision tree ensembles, we extract logical rules that explain how a set of transcription factors act in concert to regulate the expression of their targets. We further compute a profile for each rule to show its regulation strengths at different time points. We also propose a spline interpolation method to integrate the rule profiles learned from several time series expression data sets that measure the same biological process. We then combine these rule profiles to build a transcriptional regulatory network for the yeast cell cycle. Compared to the results in the literature, our method correctly identifies all major known yeast cell cycle transcription factors, and assigns them into appropriate cell cycle phases. Our method also identifies many interesting synergetic relationships among these transcription factors, most of which are well known, while many of the rest can also be supported by other evidences. Conclusion The high accuracy of our method indicates that our method is valid and robust. As more gene expression and transcription factor binding data become available, we believe that our method is useful for reconstructing large-scale transcriptional regulatory networks in other species as well

    Machine learning for microbial ecology: predicting interactions and identifying their putative mechanisms

    Get PDF
    Microbial communities are key components of Earth’s ecosystems and they play important roles in human health and industrial processes. These communities and their functions can strongly depend on the diverse interactions between constituent species, posing the question of how such interactions can be predicted, measured and controlled. This challenge is particularly relevant for the many practical applications enabled by the rising field of synthetic microbial ecology, which includes the design of microbiome therapies for human diseases. Advances in sequencing technologies and genomic databases provide valuable datasets and tools for studying inter-microbial interactions, but the capacity to characterize the strength and mechanisms of interactions between species in large consortia is still an unsolved challenge. In this thesis, I show how machine learning methods can be used to help address these questions. The first portion of my thesis work was focused on predicting the outcome of pairwise interactions between microbial species. By integrating genomic information and observed experimental data, I used machine learning algorithms to explore the predictive relationship between single-species traits and inter-species interaction phenotypes. I found that organismal traits (e.g. annotated functions of genomic elements) are sufficient to predict the qualitative outcome of interactions between microbes. I also found that the relative fraction of possible experiments needed to build acceptable models drastically shrinks as the combinatorial space grows. In the second part of my thesis work, I developed an algorithmic method for identifying putative interaction mechanisms by scoring combinations of variables that random forest uses in order to predict interaction outcomes. I applied this method to a study of the human microbiome and identified a previously unreported combination of microbes that are strongly associated with Crohn’s disease. In the last part of my thesis, I utilized a regression approach to first identify and then quantify interactions between microbial species relevant to community function. The work I present in this dissertation provides a general framework for understanding the myriad interactions that occur in natural and synthetic microbial consortia

    The Partial Evaluation Approach to Information Personalization

    Get PDF
    Information personalization refers to the automatic adjustment of information content, structure, and presentation tailored to an individual user. By reducing information overload and customizing information access, personalization systems have emerged as an important segment of the Internet economy. This paper presents a systematic modeling methodology - PIPE (`Personalization is Partial Evaluation') - for personalization. Personalization systems are designed and implemented in PIPE by modeling an information-seeking interaction in a programmatic representation. The representation supports the description of information-seeking activities as partial information and their subsequent realization by partial evaluation, a technique for specializing programs. We describe the modeling methodology at a conceptual level and outline representational choices. We present two application case studies that use PIPE for personalizing web sites and describe how PIPE suggests a novel evaluation criterion for information system designs. Finally, we mention several fundamental implications of adopting the PIPE model for personalization and when it is (and is not) applicable.Comment: Comprehensive overview of the PIPE model for personalizatio

    A Framework for Discovery and Diagnosis of Behavioral Transitions in Event-streams

    Get PDF
    Date stream mining techniques can be used in tracking user behaviors as they attempt to achieve their goals. Quality metrics over stream-mined models identify potential changes in user goal attainment. When the quality of some data mined models varies significantly from nearby models—as defined by quality metrics—then the user’s behavior is automatically flagged as a potentially significant behavioral change. Decision tree, sequence pattern and Hidden Markov modeling being used in this study. These three types of modeling can expose different aspect of user’s behavior. In case of decision tree modeling, the specific changes in user behavior can automatically characterized by differencing the data-mined decision-tree models. The sequence pattern modeling can shed light on how the user changes his sequence of actions and Hidden Markov modeling can identifies the learning transition points. This research describes how model-quality monitoring and these three types of modeling as a generic framework can aid recognition and diagnoses of behavioral changes in a case study of cognitive rehabilitation via emailing. The date stream mining techniques mentioned are used to monitor patient goals as part of a clinical plan to aid cognitive rehabilitation. In this context, real time data mining aids clinicians in tracking user behaviors as they attempt to achieve their goals. This generic framework can be widely applicable to other real-time data-intensive analysis problems. In order to illustrate this fact, the similar Hidden Markov modeling is being used for analyzing the transactional behavior of a telecommunication company for fraud detection. Fraud similarly can be considered as a potentially significant transaction behavioral change
    corecore