6 research outputs found

    Output-Sensitive Pattern Extraction in Sequences

    Get PDF
    Genomic Analysis, Plagiarism Detection, Data Mining, Intrusion Detection, Spam Fighting and Time Series Analysis are just some examples of applications where extraction of recurring patterns in sequences of objects is one of the main computational challenges. Several notions of patterns exist, and many share the common idea of strictly specifying some parts of the pattern and to don\u27t care about the remaining parts. Since the number of patterns can be exponential in the length of the sequences, pattern extraction focuses on statistically relevant patterns, where any attempt to further refine or extend them causes a loss of significant information (where the number of occurrences changes). Output-sensitive algorithms have been proposed to enumerate and list these patterns, taking polynomial time O(n^c) per pattern for constant c > 1, which is impractical for massive sequences of very large length n. We address the problem of extracting maximal patterns with at most k don\u27t care symbols and at least q occurrences. Our contribution is to give the first algorithm that attains a stronger notion of output-sensitivity, borrowed from the analysis of data structures: the cost is proportional to the actual number of occurrences of each pattern, which is at most n and practically much smaller than n in real applications, thus avoiding the aforementioned cost of O(n^c) per pattern

    Electronic medical records in paediatric ophthalmology: a study of potential users and uses to inform design

    Get PDF
    Electronic medical records are at the core of an advancing movement toward information-driven healthcare. By enhancing abilities to capture, store, and analyse vast amounts of health data, the routine use of electronic medical records is advocated as a means to improve the efficiency and quality of care provision, advance population health, empower patients, and reduce healthcare costs. However, the delivery of any benefits is threatened by a failure to understand the unique care environments of different clinical specialties, and to appropriately customise system design. This has prompted a move to the user-centred design process of health information technology. Paediatric ophthalmology is a unique field that faces particular challenges in electronic medical record adoption. As with other ophthalmic specialties, the heavy use of imaging and diagrammatic documentation is difficult to replicate electronically. As is the flexibility required to meet the demands incurred by the varying ages, developmental stages, and visual needs of each patient, reflecting a unique interface between the ophthalmic and paediatric requirements. The consideration of such requirements is essential throughout the user-centred design of effective health information technology systems. However, paucity in the evidence base surrounding electronic medical record design methodologies and system usage hinders technological development and application within paediatric ophthalmology. This research was centred on a user-centred design process, to provide an understanding of the users of electronic medical records in paediatric ophthalmology, and their requirements. Taking a mixed methods approach, this research initially explored the landscape of medical record use – gathering user- centred requirements – and concluded with the development and testing of three prototype data collection forms, for specific use cases within paediatric ophthalmology. Overall, this work articulates the specific challenges and requirements in this area, and provides the foundation for future design and adoption strategies of electronic medical record systems within paediatric ophthalmology

    Extraction et partitionnement pour la recherche de régularités : application à l’analyse de dialogues

    Get PDF
    In the context of dialogue analysis, a corpus of dialogues can be represented as a set of arrays of annotations encoding the dialogue utterances. In order to identify the frequently used dialogue schemes, we design a two-step methodology in which recurrent patterns are first extracted and then partitioned into homogenous classes constituting the regularities. Two methods are developed to extract recurrent patterns: LPCA-DC and SABRE. The former is an adaptation of a dynamic programming algorithm whereas the latter is obtained from a formal modeling of the extraction of local alignment problem in annotations arrays.The partitioning of recurrent patterns is realised using various heuristics from the literature as well as two original formulations of the K-partitioning problem in the form of mixed integer linear programs. Throughout a polyhedral study of a polyhedron associated to these formulations, facets are characterized (in particular: 2-chorded cycle inequalities, 2-partition inequalities and general clique inequalities). These theoretical results allow the establishment of an efficient cutting plane algorithm.We developed a decision support software called VIESA which implements these different methods and allows their evaluation during two experiments realised by a psychologist. Thus, regularities corresponding to dialogical strategies that previous manual extractions failed to identify are obtained.Dans le cadre de l’aide à l’analyse de dialogues, un corpus de dialogues peut être représenté par un ensemble de tableaux d’annotations encodant les différents énoncés des dialogues. Afin d’identifier des schémas dialogiques mis en oeuvre fréquemment, nous définissons une méthodologie en deux étapes : extraction de motifs récurrents, puis partitionnement de ces motifs en classes homogènes constituant ces régularités. Deux méthodes sont développées afin de réaliser l’extraction de motifs récurrents : LPCADC et SABRE. La première est une adaptation d’un algorithme de programmation dynamique tandis que la seconde est issue d’une modélisation formelle du problème d’extraction d’alignements locaux dans un couple de tableaux d’annotations.Le partitionnement de motifs récurrents est réalisé par diverses heuristiques de la littérature ainsi que deux formulations originales du problème de K-partitionnement sous la forme de programmes linéaires en nombres entiers. Lors d’une étude polyèdrale, nous caractérisons des facettes d’un polyèdre associé à ces formulations (notamment les inégalités de 2-partitions, les inégalités 2-chorded cycles et les inégalités de clique généralisées). Ces résultats théoriques permettent la mise en place d’un algorithme de plans coupants résolvant efficacement le problème.Nous développons le logiciel d’aide à la décision VIESA, mettant en oeuvre ces différentes méthodes et permettant leur évaluation au cours de deux expérimentations réalisées par un expert psychologue. Des régularités correspondant à des stratégies dialogiques que des extractions manuelles n’avaient pas permis d’obtenir sont ainsi identifiées

    From Data to Knowledge in Secondary Health Care Databases

    Get PDF
    The advent of big data in health care is a topic receiving increasing attention worldwide. In the UK, over the last decade, the National Health Service (NHS) programme for Information Technology has boosted big data by introducing electronic infrastructures in hospitals and GP practices across the country. This ever growing amount of data promises to expand our understanding of the services, processes and research. Potential bene�ts include reducing costs, optimisation of services, knowledge discovery, and patient-centred predictive modelling. This thesis will explore the above by studying over ten years worth of electronic data and systems in a hospital treating over 750 thousand patients a year. The hospital's information systems store routinely collected data, used primarily by health practitioners to support and improve patient care. This raw data is recorded on several di�erent systems but rarely linked or analysed. This thesis explores the secondary uses of such data by undertaking two case studies, one on prostate cancer and another on stroke. The journey from data to knowledge is made in each of the studies by traversing critical steps: data retrieval, linkage, integration, preparation, mining and analysis. Throughout, novel methods and computational techniques are introduced and the value of routinely collected data is assessed. In particular, this thesis discusses in detail the methodological aspects of developing clinical data warehouses from routine heterogeneous data and it introduces methods to model, visualise and analyse the journeys that patients take through care. This work has provided lessons in hospital IT provision, integration, visualisation and analytics of complex electronic patient records and databases and has enabled the use of raw routine data for management decision making and clinical research in both case studies

    String Mining in Bioinformatics

    No full text
    corecore