118 research outputs found

    Graph Pattern Mining Techniques to Identify Potential Model Organisms

    Get PDF
    Recent advances in high throughput technologies have led to an increasing amount of rich and diverse biological data and related literature. Model organisms are classically selected as subjects for studying human disease based on their genotypic and phenotypic features. A significant problem with model organism identification is the determination of characteristic features related to biological processes that can provide insights into the mechanisms underlying diseases. These insights could have a positive impact on the diagnosis and management of diseases and the development of therapeutic drugs. The increased availability of biological data presents an opportunity to develop data mining methods that can address these challenges and help scientists formulate and test data-driven hypotheses. In this dissertation, data mining methods were developed to provide a quantitative approach for the identification of potential model organisms based on underlying features that may be correlated with disease manifestation in humans. The work encompassed three major types of contributions that aimed to address challenges related to inferring information from biological data available from a range of sources. First, new statistical models and algorithms for graph pattern mining were developed and tested on diverse genres of data (biological networks, drug chemical compounds, and text documents). Second, data mining techniques were developed and shown to identify characteristic disease patterns (disease fingerprints), predict potentially new genetic pathways, and facilitate the assessment of organisms as potential disease models. Third, a methodology was developed that combined the application of graph-based models with information derived from natural language processing methods to identify statistically significant patterns in biomedical text. Together, the approaches developed for this dissertation show promise for summarizing the information about biological processes and phenomena associated with organisms broadly and for the potential assessment of their suitability to study human diseases

    The Diagnosticity of Argument Diagrams

    Get PDF
    Can argument diagrams be used to diagnose and predict argument performance? Argumentation is a complex domain with robust and often contradictory theories about the structure and scope of valid arguments. Argumentation is central to advanced problem solving in many domains and is a core feature of day-to-day discourse. Argumentation is quite literally, all around us, and yet is rarely taught explicitly. Novices often have difficulty parsing and constructing arguments particularly in written and verbal form. Such formats obscure key argumentative moves and often mask the strengths and weaknesses of the argument structure with complicated phrasing or simple sophistry. Argument diagrams have a long history in the philosophy of argument and have been seen increased application as instructional tools. Argument diagrams reify important argument structures, avoid the serial limitations of text, and are amenable to automatic processing. This thesis addresses the question posed above. In it I show that diagrammatic models of argument can be used to predict students' essay grades and that automatically-induced models can be competitive with human grades. In the course of this analysis I survey analytical tools such as Augmented Graph Grammars that can be applied to formalize argument analysis, and detail a novel Augmented Graph Grammar formalism and implementation used in the study. I also introduce novel machine learning algorithms for regression and tolerance reduction. This work makes contributions to research on Education, Intelligent Tutoring Systems, Machine Learning, Educational Datamining, Graph Analysis, and online grading

    A Bayesian learning approach to inconsistency identification in model-based systems engineering

    Get PDF
    Designing and developing complex engineering systems is a collaborative effort. In Model-Based Systems Engineering (MBSE), this collaboration is supported through the use of formal, computer-interpretable models, allowing stakeholders to address concerns using well-defined modeling languages. However, because concerns cannot be separated completely, implicit relationships and dependencies among the various models describing a system are unavoidable. Given that models are typically co-evolved and only weakly integrated, inconsistencies in the agglomeration of the information and knowledge encoded in the various models are frequently observed. The challenge is to identify such inconsistencies in an automated fashion. In this research, a probabilistic (Bayesian) approach to abductive reasoning about the existence of specific types of inconsistencies and, in the process, semantic overlaps (relationships and dependencies) in sets of heterogeneous models is presented. A prior belief about the manifestation of a particular type of inconsistency is updated with evidence, which is collected by extracting specific features from the models by means of pattern matching. Inference results are then utilized to improve future predictions by means of automated learning. The effectiveness and efficiency of the approach is evaluated through a theoretical complexity analysis of the underlying algorithms, and through application to a case study. Insights gained from the experiments conducted, as well as the results from a comparison to the state-of-the-art have demonstrated that the proposed method is a significant improvement over the status quo of inconsistency identification in MBSE.Ph.D

    The Minimum Description Length Principle for Pattern Mining: A Survey

    Full text link
    This is about the Minimum Description Length (MDL) principle applied to pattern mining. The length of this description is kept to the minimum. Mining patterns is a core task in data analysis and, beyond issues of efficient enumeration, the selection of patterns constitutes a major challenge. The MDL principle, a model selection method grounded in information theory, has been applied to pattern mining with the aim to obtain compact high-quality sets of patterns. After giving an outline of relevant concepts from information theory and coding, as well as of work on the theory behind the MDL and similar principles, we review MDL-based methods for mining various types of data and patterns. Finally, we open a discussion on some issues regarding these methods, and highlight currently active related data analysis problems

    Graph-based Pattern Matching and Discovery for Process-centric Service Architecture Design and Integration

    Get PDF
    Process automation and applications integration initiatives are often complex and involve significant resources in large organisations. The increasing adoption of service-based architectures to solve integration problems and the widely accepted practice of utilising patterns as a medium to reuse design knowledge motivated the definition of this work. In this work a pattern-based framework and techniques providing automation and structure to address the process and application integration problem are proposed. The framework is a layered architecture providing modelling and traceability support to different abstraction layers of the integration problem. To define new services - building blocks of the integration solution - the framework includes techniques to identify process patterns in concrete process models. Graphs and graph morphisms provide a formal basis to represent patterns and their relation to models. A family of graph-based algorithms support automation during matching and discovery of patterns in layered process service models. The framework and techniques are demonstrated in a case study. The algorithms implementing the pattern matching and discovery techniques are investigated through a set of experiments from an empirical evaluation. Observations from conducted interviews to practitioners provide suggestions to enhance the proposed techniques and direct future work regarding analysis tasks in process integration initiatives

    InfĂ©rence de la grammaire structurelle d’une Ă©mission TV rĂ©currente Ă  partir du contenu

    Get PDF
    TV program structuring raises as a major theme in last decade for the task of high quality indexing. In this thesis, we address the problem of unsupervised TV program structuring from the point of view of grammatical inference, i.e., discovering a common structural model shared by a collection of episodes of a recurrent program. Using grammatical inference makes it possible to rely on only minimal domain knowledge. In particular, we assume no prior knowledge on the structural elements that might be present in a recurrent program and very limited knowledge on the program type, e.g., to name structural elements, apart from the recurrence. With this assumption, we propose an unsupervised framework operating in two stages. The first stage aims at determining the structural elements that are relevant to the structure of a program. We address this issue making use of the property of element repetitiveness in recurrent programs, leveraging temporal density analysis to filter out irrelevant events and determine valid elements. Having discovered structural elements, the second stage is to infer a grammar of the program. We explore two inference techniques based either on multiple sequence alignment or on uniform resampling. A model of the structure is derived from the grammars and used to predict the structure of new episodes. Evaluations are performed on a selection of four different types of recurrent programs. Focusing on structural element determination, we analyze the effect on the number of determined structural elements, fixing the threshold applied on the density function as well as the size of collection of episodes. For structural grammar inference, we discuss the quality of the grammars obtained and show that they accurately reflect the structure of the program. We also demonstrate that the models obtained by grammatical inference can accurately predict the structure of unseen episodes, conducting a quantitative and comparative evaluation of the two methods by segmenting the new episodes into their structural components. Finally, considering the limitations of our work, we discuss a number of open issues in structure discovery and propose three new research directions to address in future work.Dans cette thĂšse, on aborde le problĂšme de structuration des programmes tĂ©lĂ©visĂ©s de maniĂšre non supervisĂ©e Ă  partir du point de vue de l'infĂ©rence grammaticale, focalisant sur la dĂ©couverte de la structure des programmes rĂ©currents Ă  partir une collection homogĂšne. On vise Ă  dĂ©couvrir les Ă©lĂ©ments structuraux qui sont pertinents Ă  la structure du programme, et Ă  l’infĂ©rence grammaticale de la structure des programmes. Des expĂ©rimentations montrent que l'infĂ©rence grammaticale permet de utiliser minimum des connaissances de domaine a priori pour atteindre la dĂ©couverte de la structure des programmes
    • 

    corecore