169 research outputs found

    A research roadmap towards achieving scalability in model driven engineering

    Get PDF
    International audienceAs Model-Driven Engineering (MDE) is increasingly applied to larger and more complex systems, the current generation of modelling and model management technologies are being pushed to their limits in terms of capacity and eciency. Additional research and development is imperative in order to enable MDE to remain relevant with industrial practice and to continue delivering its widely recognised productivity , quality, and maintainability benefits. Achieving scalabil-ity in modelling and MDE involves being able to construct large models and domain-specific languages in a systematic manner, enabling teams of modellers to construct and refine large models in a collaborative manner, advancing the state of the art in model querying and transformations tools so that they can cope with large models (of the scale of millions of model elements), and providing an infrastructure for ecient storage, indexing and retrieval of large models. This paper attempts to provide a research roadmap for these aspects of scalability in MDE and outline directions for work in this emerging research area

    One health in the U.S. military: a review of existing systems and recommendations for the future

    Get PDF
    2014 Summer.Includes bibliographical references.Background: The merging of the former U.S. Army Veterinary Command (VETCOM) with the former U.S. Army Center for Health Promotion and Preventive Medicine (USACHPPM) into the U.S. Army Public Health Command (USAPHC) in 2011 created an opportunity for the military to fully embrace the One Health concept. That same year, the USAPHC began work on a Zoonotic Disease Report (ZDR) aimed at supporting critical zoonotic disease risk assessments by combining zoonotic disease data from human, entomological, laboratory, and animal data sources. The purpose of this dissertation is to facilitate the creation of a military Zoonotic Disease Surveillance program that combines disease data from both military human and animal sources. Methods: Five of the most commonly used human military medical data systems were systematically reviewed using a standardized template based on Centers for Disease Control and Preventive Medicine (CDC) guidelines. The systems were then compared to each other in order to recommend the one(s) best suited for use in the USAPHC ZDR. The first stage of the comparison focused on each system's ability to meet the specific goals and objectives of the ZDR, whereas the second stage applied capture-recapture methodology to data system queries in order to evaluate each system's data quality (completeness). A pilot study was conducted using Lyme borreliosis to investigate the utility of military pet dogs as sentinel surveillance for zoonotic disease in military populations. Canine data came from 3996 surveys collected from 15 military veterinary facilities from 1 November 2012 through 31 October 2013. Surveys simultaneously collected Borrelia burgdorferi (Bb) seroprevalence and canine risk factor data for each participating pet dog. Human data were obtained by querying the Defense Medical Surveillance System for the same 15 military locations and the same time period. The correlation of military pet dog Bb seroprevalence and military human Lyme disease (borreliosis) data was estimated using the Spearman Rank Correlation. The difference between military pet dog data and civilian pet dog data was examined through the use of the chi-squared test for proportions. Multivariable logistic regression was then used to investigate the potential for identified risk factors to impact the observed association. Results: The comparison of human military medical data systems found the Military Health System Management Analysis and Reporting Tool (M2) data system most completely met the specific goals and objects of the ZDR. In addition, completeness calculation showed the M2 data source to be the most complete source of human data; 55% of total captured cases coming from the M2 system alone. The pilot study found a strong positive correlation between military human borreliosis data and military pet dog Bb seroprevalence data by location (rs = 0.821). The study showed reassuring similarities in pet dog seroprevalence by location for the majority of sites, but also showed meaningful differences between two locations, potentially indicating military pet dogs as more appropriate indicators of Lyme disease risk for military populations than civilian pet dog data. Unfortunately, whether canine Bb seroprevalence is influenced by the distribution of identified risk factors could not be determined due to limited study power. Conclusions: Based on this study M2 was recommended as the primary source of military human medical data for use in the Public Health Command Zoonotic Disease Report. In addition, it was recommended that Service member pet dog data be incorporated as a sensitive and convenient measure of zoonotic disease risk in human military populations. The validity of the data, however, should be evaluated further with either larger sample sizes and/or a zoonotic disease with higher prevalence

    Machine Learning in clinical biology and medicine: from prediction of multidrug resistant infections in humans to pre-mRNA splicing control in Ciliates

    Get PDF
    Machine Learning methods have broadly begun to infiltrate the clinical literature in such a way that the correct use of algorithms and tools can facilitate both diagnosis and therapies. The availability of large quantities of high-quality data could lead to an improved understanding of risk factors in community and healthcare-acquired infections. In the first part of my PhD program, I refined my skills in Machine Learning by developing and evaluate with a real antibiotic stewardship dataset, a model useful to predict multi-drugs resistant urinary tract infections after patient hospitalization9 . For this purpose, I created an online platform called DSaaS specifically designed for healthcare operators to train ML models (supervised learning algorithms). These results are reported in Chapter 2. In the second part of the PhD thesis (Chapter 3) I used my new skills to study the genomic variants, in particular the phenomenon of intron splicing. One of the important modes of pre-mRNA post-transcriptional modification is alternative intron splicing, that includes intron retention (unsplicing), allowing the creation of many distinct mature mRNA transcripts from a single gene. An accurate interpretation of genomic variants is the backbone of genomic medicine. Determining for example the causative variant in patients with Mendelian disorders facilitates both management and potential downstream treatment of the patient’s condition, as well as providing peace of mind and allowing more effective counselling for the wider family. Recent years have seen a surge in bioinformatics tools designed to predict variant impact on splicing, and these offer an opportunity to circumvent many limitations of RNA-seq based approaches. An increasing number of these tools rely on machine learning computational approaches that can identify patterns in data and use this knowledge to speculate on new data. I optimized a pipeline to extract and classify introns from genomes and transcriptomes and I classified them into retained (Ris) and constitutively spliced (CSIs) introns. I used data from ciliates for the peculiar organization of their genomes (enriched of coding sequences) and because they are unicellular organisms without cells differentiated into tissues. That made easier the identification and the manipulation of introns. In collaboration with the PhD colleague dr. Leonardo Vito, I analyzed these intronic sequences in order to identify “features” to predict and to classify them by Machine Learning algorithms. We also developed a platform useful to manipulate FASTA, gtf, BED, etc. files produced by the pipeline tools. I named the platform: Biounicam (intron extraction tools) available at http://46.23.201.244:1880/ui. The major objective of this study was to develop an accurate machine-learning model that can predict whether an intron will be retained or not, to understand the key-features involved in the intron retention mechanism, and provide insight on the factors that drive IR. Once the model has been developed, the final step of my PhD work will be to expand the platform with different machine learning algorithms to better predict the retention and to test new features that drive this phenomenon. These features hopefully will contribute to find new mechanisms that controls intron splicing. The other additional papers and patents I published during my PhD program are in Appendix B and C. These works have enriched me with many useful techniques for future works and ranged from microbiology to classical statistics

    Molecular dynamics simulations of HIV-1 protease complexed with saquinavir

    Get PDF
    Inhibition of the Human Immunode�ficiency virus type-1 (HIV-1) protease enzyme blocks HIV-1 replication. Protease inhibitor drugs have successfully been used as a therapy for HIV-infected individuals to reduce their viral loads and slow the progression to Acquired Immune Defi�ciency Syndrome (AIDS). However, mutations readily and rapidly accrue in the protease gene resulting in a reduced sensitivity of the protein to the inhibitor. In this thesis, molecular dynamics simulations (MDS) were run on HIV proteases complexed with the protease inhibitor saquinavir, and the strength of affinity calculated through MMPBSA and normal mode analysis. We show in this thesis that at least 13 residues can be computationally mutated in the proteases sequence without adversely aff�ecting its structure or dynamics, and can still replicate the change in binding affinity to saquinavir caused by said mutations. Using 6 protease genotypes with an ordered decrease in saquinavir sensitivity we use MDS to calculate drug binding affinity. Our results show that single 10ns simulations of the systems resulted in good concurrence for the wild-type (WT) system, but an overall strong anti-correlation to biochemically derived results. Extension of the WT and multi-drug resistant (MDR) systems to 50ns yielded no improvement in the correlation to experimental. However, expansion of these systems to a 10-repetition ensemble MDS considerably improved the MDR binding affinity compared to the biochemical result. Principle components analysis on the simulations revealed that a much greater confi�gurational sampling was achieved through ensemble MD than simulation extension. These data suggest a possible mechanism for saquinavir resistance in the MDR system, where a transitioning to a lower binding-affinity configuration than WT occurs. Furthermore, we show that ensembles of 1ns in length sample a significant proportion of the con�figurations adopted over 10ns, and generate sufficiently similar binding affinities

    Intégration de ressources en recherche translationnelle : une approche unificatrice en support des systèmes de santé "apprenants"

    Get PDF
    Learning health systems (LHS) are gradually emerging and propose a complimentary approach to translational research challenges by implementing close coupling of health care delivery, research and knowledge translation. To support coherent knowledge sharing, the system needs to rely on an integrated and efficient data integration platform. The framework and its theoretical foundations presented here aim at addressing this challenge. Data integration approaches are analysed in light of the requirements derived from LHS activities and data mediation emerges as the one most adapted for a LHS. The semantics of clinical data found in biomedical sources can only be fully derived by taking into account, not only information from the structural models (field X of table Y), but also terminological information (e.g. International Classification of Disease 10th revision) used to encode facts. The unified framework proposed here takes this into account. The platform has been implemented and tested in context of the TRANSFoRm endeavour, a European project funded by the European commission. It aims at developing a LHS including clinical activities in primary care. The mediation model developed for the TRANSFoRm project, the Clinical Data Integration Model, is presented and discussed. Results from TRANSFoRm use-cases are presented. They illustrate how a unified data sharing platform can support and enhance prospective research activities in context of a LHS. In the end, the unified mediation framework presented here allows sufficient expressiveness for the TRANSFoRm needs. It is flexible, modular and the CDIM mediation model supports the requirements of a primary care LHS.Les systèmes de santé "apprenants" (SSA) présentent une approche complémentaire et émergente aux problèmes de la recherche translationnelle en couplant de près les soins de santé, la recherche et le transfert de connaissances. Afin de permettre un flot d’informations cohérent et optimisé, le système doit se doter d’une plateforme intégrée de partage de données. Le travail présenté ici vise à proposer une approche de partage de données unifiée pour les SSA. Les grandes approches d’intégration de données sont analysées en fonction du SSA. La sémantique des informations cliniques disponibles dans les sources biomédicales est la résultante des connaissances des modèles structurelles des sources mais aussi des connaissances des modèles terminologiques utilisés pour coder l’information. Les mécanismes de la plateforme unifiée qui prennent en compte cette interdépendance sont décrits. La plateforme a été implémentée et testée dans le cadre du projet TRANSFoRm, un projet européen qui vise à développer un SSA. L’instanciation du modèle de médiation pour le projet TRANSFoRm, le Clinical Data Integration Model est analysée. Sont aussi présentés ici les résultats d’un des cas d’utilisation de TRANSFoRm pour supporter la recherche afin de donner un aperçu concret de l’impact de la plateforme sur le fonctionnement du SSA. Au final, la plateforme unifiée d’intégration proposée ici permet un niveau d’expressivité suffisant pour les besoins de TRANSFoRm. Le système est flexible et modulaire et le modèle de médiation CDIM couvre les besoins exprimés pour le support des activités d’un SSA comme TRANSFoRm

    Data Enrichment for Data Mining Applied to Bioinformatics and Cheminformatics Domains

    Get PDF
    Problemas cada vez mais complexos estão a ser tratados na àrea das ciências da vida. A aquisição de todos os dados que possam estar relacionados com o problema em questão é primordial. Igualmente importante é saber como os dados estão relacionados uns com os outros e com o próprio problema. Por outro lado, existem grandes quantidades de dados e informações disponíveis na Web. Os investigadores já estão a utilizar Data Mining e Machine Learning como ferramentas valiosas nas suas investigações, embora o procedimento habitual seja procurar a informação baseada nos modelos indutivos. Até agora, apesar dos grandes sucessos já alcançados com a utilização de Data Mining e Machine Learning, não é fácil integrar esta vasta quantidade de informação disponível no processo indutivo, com algoritmos proposicionais. A nossa principal motivação é abordar o problema da integração de informação de domínio no processo indutivo de técnicas proposicionais de Data Mining e Machine Learning, enriquecendo os dados de treino a serem utilizados em sistemas de programação de lógica indutiva. Os algoritmos proposicionais de Machine Learning são muito dependentes dos atributos dos dados. Ainda é difícil identificar quais os atributos mais adequados para uma determinada tarefa na investigação. É também difícil extrair informação relevante da enorme quantidade de dados disponíveis. Vamos concentrar os dados disponíveis, derivar características que os algoritmos de ILP podem utilizar para induzir descrições, resolvendo os problemas. Estamos a criar uma plataforma web para obter informação relevante para problemas de Bioinformática (particularmente Genómica) e Quimioinformática. Esta vai buscar os dados a repositórios públicos de dados genómicos, proteicos e químicos. Após o enriquecimento dos dados, sistemas Prolog utilizam programação lógica indutiva para induzir regras e resolver casos específicos de Bioinformática e Cheminformática. Para avaliar o impacto do enriquecimento dos dados com ILP, comparamos com os resultados obtidos na resolução dos mesmos casos utilizando algoritmos proposicionais.Increasingly more complex problems are being addressed in life sciences. Acquiring all the data that may be related to the problem in question is paramount. Equally important is to know how the data is related to each other and to the problem itself. On the other hand, there are large amounts of data and information available on the Web. Researchers are already using Data Mining and Machine Learning as a valuable tool in their researches, albeit the usual procedure is to look for the information based on induction models. So far, despite the great successes already achieved using Data Mining and Machine Learning, it is not easy to integrate this vast amount of available information in the inductive process with propositional algorithms. Our main motivation is to address the problem of integrating domain information into the inductive process of propositional Data Mining and Machine Learning techniques by enriching the training data to be used in inductive logic programming systems. The algorithms of propositional machine learning are very dependent on data attributes. It still is hard to identify which attributes are more suitable for a particular task in the research. It is also hard to extract relevant information from the enormous quantity of data available. We will concentrate the available data, derive features that ILP algorithms can use to induce descriptions, solving the problems. We are creating a web platform to obtain relevant bioinformatics (particularly Genomics) and Cheminformatics problems. It fetches the data from public repositories with genomics, protein and chemical data. After the data enrichment, Prolog systems use inductive logic programming to induce rules and solve specific Bioinformatics and Cheminformatics case studies. To assess the impact of the data enrichment with ILP, we compare with the results obtained solving the same cases using propositional algorithms

    Model morphisms (MoMo) to enable language independent information models and interoperable business networks

    Get PDF
    MSc. Dissertation presented at Faculdade de Ciências e Tecnologia of Universidade Nova de Lisboa to obtain the Master degree in Electrical and Computer EngineeringWith the event of globalisation, the opportunities for collaboration became more evident with the effect of enlarging business networks. In such conditions, a key for enterprise success is a reliable communication with all the partners. Therefore, organisations have been searching for flexible integrated environments to better manage their services and product life cycle, where their software applications could be easily integrated independently of the platform in use. However, with so many different information models and implementation standards being used, interoperability problems arise. Moreover,organisations are themselves at different technological maturity levels, and the solution that might be good for one, can be too advanced for another, or vice-versa. This dissertation responds to the above needs, proposing a high level meta-model to be used at the entire business network, enabling to abstract individual models from their specificities and increasing language independency and interoperability, while keeping all the enterprise legacy software‟s integrity intact. The strategy presented allows an incremental mapping construction, to achieve a gradual integration. To accomplish this, the author proposes Model Driven Architecture (MDA) based technologies for the development of traceable transformations and execution of automatic Model Morphisms
    • …
    corecore