752 research outputs found

    UPI: A Primary Index for Uncertain Databases

    Get PDF
    Uncertain data management has received growing attention from industry and academia. Many efforts have been made to optimize uncertain databases, including the development of special index data structures. However, none of these efforts have explored primary (clustered) indexes for uncertain databases, despite the fact that clustering has the potential to offer substantial speedups for non-selective analytic queries on large uncertain databases. In this paper, we propose a new index called a UPI (Uncertain Primary Index) that clusters heap files according to uncertain attributes with both discrete and continuous uncertainty distributions. Because uncertain attributes may have several possible values, a UPI on an uncertain attribute duplicates tuple data once for each possible value. To prevent the size of the UPI from becoming unmanageable, its size is kept small by placing low-probability tuples in a special Cutoff Index that is consulted only when queries for low-probability values are run. We also propose several other optimizations, including techniques to improve secondary index performance and techniques to reduce maintenance costs and fragmentation by buffering changes to the table and writing updates in sequential batches. Finally, we develop cost models for UPIs to estimate query performance in various settings to help automatically select tuning parameters of a UPI. We have implemented a prototype UPI and experimented on two real datasets. Our results show that UPIs can significantly (up to two orders of magnitude) improve the performance of uncertain queries both over clustered and unclustered attributes. We also show that our buffering techniques mitigate table fragmentation and keep the maintenance cost as low as or even lower than using an unclustered heap file.National Science Foundation (U.S.) (Grant IIS-0448124)National Science Foundation (U.S.) (Grant IIS-0905553)National Science Foundation (U.S.) (Grant IIS-0916691

    PthA4AT, a 7.5-repeats transcription activator-like (TAL) effector from Xanthomonas citri ssp. citri, triggers citrus canker resistance

    Get PDF
    Transcription activator-like effectors (TALEs) are important effectors of Xanthomonas spp. that manipulate the transcriptome of the host plant, conferring susceptibility or resistance to bacterial infection. Xanthomonas citri ssp. citri variant AT (X. citri AT) triggers a host-specific hypersensitive response (HR) that suppresses citrus canker development. However, the bacterial effector that elicits this process is unknown. In this study, we show that a 7.5-repeat TALE is responsible for triggering the HR. PthA4AT was identified within the pthA repertoire of X. citri AT followed by assay of the effects on different hosts. The mode of action of PthA4AT was characterized using protein-binding microarrays and testing the effects of deletion of the nuclear localization signals and activation domain on plant responses. PthA4AT is able to bind DNA and activate transcription in an effector binding element-dependent manner. Moreover, HR requires PthA4AT nuclear localization, suggesting the activation of executor resistance (R) genes in host and non-host plants. This is the first case where a TALE of unusually short length performs a biological function by means of its repeat domain, indicating that the action of these effectors to reprogramme the host transcriptome following nuclear localization is not limited to ‘classical’ TALEs.Fil: Roeschlin, Roxana Andrea. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Biología Molecular y Celular de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas. Instituto de Biología Molecular y Celular de Rosario; Argentina. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas; ArgentinaFil: Uviedo, Facundo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Biología Molecular y Celular de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas. Instituto de Biología Molecular y Celular de Rosario; ArgentinaFil: García, Lucila. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Biología Molecular y Celular de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas. Instituto de Biología Molecular y Celular de Rosario; ArgentinaFil: Molina, María Celeste. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Biología Molecular y Celular de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas. Instituto de Biología Molecular y Celular de Rosario; Argentina. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas; ArgentinaFil: Favaro, María Alejandra. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Biología Molecular y Celular de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas. Instituto de Biología Molecular y Celular de Rosario; ArgentinaFil: Chiesa, Maria Amalia. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Investigaciones en Ciencias Agrarias de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Agrarias. Instituto de Investigaciones en Ciencias Agrarias de Rosario; ArgentinaFil: Tasselli, Sabrina Emilse. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Biología Molecular y Celular de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas. Instituto de Biología Molecular y Celular de Rosario; ArgentinaFil: Franco Zorrilla, José Manuel. Instituto de Bioquímica y Biología Molecular; ArgentinaFil: Forment, Javier. Universidad Politécnica de Valencia; EspañaFil: Gadea, José. Universidad Politécnica de Valencia; EspañaFil: Marano, María Rosa. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Biología Molecular y Celular de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas. Instituto de Biología Molecular y Celular de Rosario; Argentina. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas; Argentin

    Automatically Generating Test Cases for Safety-Critical Software via Symbolic Execution

    Full text link
    Automated test generation based on symbolic execution can be beneficial for systematically testing safety-critical software, to facilitate test engineers to pursue the strict testing requirements mandated by the certification standards, while controlling at the same time the costs of the testing process. At the same time, the development of safety-critical software is often constrained with programming languages or coding conventions that ban linguistic features which are believed to downgrade the safety of the programs, e.g., they do not allow dynamic memory allocation and variable-length arrays, limit the way in which loops are used, forbid recursion, and bound the complexity of control conditions. As a matter of facts, these linguistic features are also the main efficiency-blockers for the test generation approaches based on symbolic execution at the state of the art. This paper contributes new evidence of the effectiveness of generating test cases with symbolic execution for a significant class of industrial safety critical-systems. We specifically focus on Scade, a largely adopted model-based development language for safety-critical embedded software, and we report on a case study in which we exploited symbolic execution to automatically generate test cases for a set of safety-critical programs developed in Scade. To this end, we introduce a novel test generator that we developed in a recent industrial project on testing safety-critical railway software written in Scade, and we report on our experience of using this test generator for testing a set of Scade programs that belong to the development of an on-board signaling unit for high-speed rail. The results provide empirically evidence that symbolic execution is indeed a viable approach for generating high-quality test suites for the safety-critical programs considered in our case study

    Re-use of tests and arguments for assesing dependable mixed-critically systems

    Get PDF
    The safety assessment of mixed-criticality systems (MCS) is a challenging activity due to system heterogeneity, design constraints and increasing complexity. The foundation for MCSs is the integrated architecture paradigm, where a compact hardware comprises multiple execution platforms and communication interfaces to implement concurrent functions with different safety requirements. Besides a computing platform providing adequate isolation and fault tolerance mechanism, the development of an MCS application shall also comply with the guidelines defined by the safety standards. A way to lower the overall MCS certification cost is to adopt a platform-based design (PBD) development approach. PBD is a model-based development (MBD) approach, where separate models of logic, hardware and deployment support the analysis of the resulting system properties and behaviour. The PBD development of MCSs benefits from a composition of modular safety properties (e.g. modular safety cases), which support the derivation of mixed-criticality product lines. The validation and verification (V&V) activities claim a substantial effort during the development of programmable electronics for safety-critical applications. As for the MCS dependability assessment, the purpose of the V&V is to provide evidences supporting the safety claims. The model-based development of MCSs adds more V&V tasks, because additional analysis (e.g., simulations) need to be carried out during the design phase. During the MCS integration phase, typically hardware-in-the-loop (HiL) plant simulators support the V&V campaigns, where test automation and fault-injection are the key to test repeatability and thorough exercise of the safety mechanisms. This dissertation proposes several V&V artefacts re-use strategies to perform an early verification at system level for a distributed MCS, artefacts that later would be reused up to the final stages in the development process: a test code re-use to verify the fault-tolerance mechanisms on a functional model of the system combined with a non-intrusive software fault-injection, a model to X-in-the-loop (XiL) and code-to-XiL re-use to provide models of the plant and distributed embedded nodes suited to the HiL simulator, and finally, an argumentation framework to support the automated composition and staged completion of modular safety-cases for dependability assessment, in the context of the platform-based development of mixed-criticality systems relying on the DREAMS harmonized platform.La dificultad para evaluar la seguridad de los sistemas de criticidad mixta (SCM) aumenta con la heterogeneidad del sistema, las restricciones de diseño y una complejidad creciente. Los SCM adoptan el paradigma de arquitectura integrada, donde un hardware embebido compacto comprende múltiples plataformas de ejecución e interfaces de comunicación para implementar funciones concurrentes y con diferentes requisitos de seguridad. Además de una plataforma de computación que provea un aislamiento y mecanismos de tolerancia a fallos adecuados, el desarrollo de una aplicación SCM además debe cumplir con las directrices definidas por las normas de seguridad. Una forma de reducir el coste global de la certificación de un SCM es adoptar un enfoque de desarrollo basado en plataforma (DBP). DBP es un enfoque de desarrollo basado en modelos (DBM), en el que modelos separados de lógica, hardware y despliegue soportan el análisis de las propiedades y el comportamiento emergente del sistema diseñado. El desarrollo DBP de SCMs se beneficia de una composición modular de propiedades de seguridad (por ejemplo, casos de seguridad modulares), que facilitan la definición de líneas de productos de criticidad mixta. Las actividades de verificación y validación (V&V) representan un esfuerzo sustancial durante el desarrollo de aplicaciones basadas en electrónica confiable. En la evaluación de la seguridad de un SCM el propósito de las actividades de V&V es obtener las evidencias que apoyen las aseveraciones de seguridad. El desarrollo basado en modelos de un SCM incrementa las tareas de V&V, porque permite realizar análisis adicionales (por ejemplo, simulaciones) durante la fase de diseño. En las campañas de pruebas de integración de un SCM habitualmente se emplean simuladores de planta hardware-in-the-loop (HiL), en donde la automatización de pruebas y la inyección de faltas son la clave para la repetitividad de las pruebas y para ejercitar completamente los mecanismos de tolerancia a fallos. Esta tesis propone diversas estrategias de reutilización de artefactos de V&V para la verificación temprana de un MCS distribuido, artefactos que se emplearán en ulteriores fases del desarrollo: la reutilización de código de prueba para verificar los mecanismos de tolerancia a fallos sobre un modelo funcional del sistema combinado con una inyección de fallos de software no intrusiva, la reutilización de modelo a X-in-the-loop (XiL) y código a XiL para obtener modelos de planta y nodos distribuidos aptos para el simulador HiL y, finalmente, un marco de argumentación para la composición automatizada y la compleción escalonada de casos de seguridad modulares, en el contexto del desarrollo basado en plataformas de sistemas de criticidad mixta empleando la plataforma armonizada DREAMS.Kritikotasun nahastuko sistemen segurtasun ebaluazioa jarduera neketsua da beraien heterogeneotasuna dela eta. Sistema hauen oinarria arkitektura integratuen paradigman datza, non hardware konpaktu batek exekuzio plataforma eta komunikazio interfaze ugari integratu ahal dituen segurtasun baldintza desberdineko funtzio konkurrenteak inplementatzeko. Konputazio plataformek isolamendu eta akatsen aurkako mekanismo egokiak emateaz gain, segurtasun arauek definituriko jarraibideak jarraitu behar dituzte kritikotasun mistodun aplikazioen garapenean. Sistema hauen zertifikazio prozesuaren kostua murrizteko aukera bat plataformetan oinarritutako garapenean (PBD) datza. Garapen planteamendu hau modeloetan oinarrituriko garapena da (MBD) non modeloaren logika, hardware eta garapen desberdinak sistemaren propietateen eta portaeraren aurka aztertzen diren. Kritikotasun mistodun sistemen PBD garapenak etekina ateratzen dio moduluetan oinarrituriko segurtasun propietateei, adibidez: segurtasun kasu modularrak (MSC). Modulu hauek kritikotasun mistodun produktu-lerroak ere hartzen dituzte kontutan. Berifikazio eta balioztatze (V&V) jarduerek esfortzu kontsideragarria eskatzen dute segurtasun-kiritikoetarako elektronika programagarrien garapenean. Kritikotasun mistodun sistemen konfiantzaren ebaluazioaren eta V&V jardueren helburua segurtasun eskariak jasotzen dituzten frogak proportzionatzea da. Kritikotasun mistodun sistemen modelo bidezko garapenek zeregin gehigarriak atxikitzen dizkio V&V jarduerari, fase honetan analisi gehigarriak (hots, simulazioak) zehazten direlako. Bestalde, kritikotasun mistodun sistemen integrazio fasean, hardware-in-the-loop (Hil) simulazio plantek V&V iniziatibak sostengatzen dituzte non testen automatizazioan eta akatsen txertaketan funtsezko jarduerak diren. Jarduera hauek frogen errepikapena eta segurtasun mekanismoak egiaztzea ahalbidetzen dute. Tesi honek V&V artefaktuen berrerabilpenerako estrategiak proposatzen ditu, kritikotasun mistodun sistemen egiaztatze azkarrerako sistema mailan eta garapen prozesuko azken faseetaraino erabili daitezkeenak. Esate baterako, test kodearen berrabilpena akats aurkako mekanismoak egiaztatzeko, modelotik X-in-the-loop (XiL)-ra eta kodetik XiL-rako konbertsioa HiL simulaziorako eta argumentazio egitura bat DREAMS Europear proiektuan definituriko arkitektura estiloan oinarrituriko segurtasun kasu modularrak automatikoki eta gradualki sortzeko

    Design of the software development and verification system (SWDVS) for shuttle NASA study task 35

    Get PDF
    An overview of the Software Development and Verification System (SWDVS) for the space shuttle is presented. The design considerations, goals, assumptions, and major features of the design are examined. A scenario that shows three persons involved in flight software development using the SWDVS in response to a program change request is developed. The SWDVS is described from the standpoint of different groups of people with different responsibilities in the shuttle program to show the functional requirements that influenced the SWDVS design. The software elements of the SWDVS that satisfy the requirements of the different groups are identified

    Fourth Conference on Artificial Intelligence for Space Applications

    Get PDF
    Proceedings of a conference held in Huntsville, Alabama, on November 15-16, 1988. The Fourth Conference on Artificial Intelligence for Space Applications brings together diverse technical and scientific work in order to help those who employ AI methods in space applications to identify common goals and to address issues of general interest in the AI community. Topics include the following: space applications of expert systems in fault diagnostics, in telemetry monitoring and data collection, in design and systems integration; and in planning and scheduling; knowledge representation, capture, verification, and management; robotics and vision; adaptive learning; and automatic programming

    Arquitectura, técnicas y modelos para posibilitar la Ciencia de Datos en el Archivo de la Misión Gaia

    Get PDF
    Tesis inédita de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Arquitectura de Computadores y Automática, leída el 26/05/2017.The massive amounts of data that the world produces every day pose new challenges to modern societies in terms of how to leverage their inherent value. Social networks, instant messaging, video, smart devices and scientific missions are just mere examples of the vast number of sources generating data every second. As the world becomes more and more digitalized, new needs arise for organizing, archiving, sharing, analyzing, visualizing and protecting the ever-increasing data sets, so that we can truly develop into a data-driven economy that reduces inefficiencies and increases sustainability, creating new business opportunities on the way. Traditional approaches for harnessing data are not suitable any more as they lack the means for scaling to the larger volumes in a timely and cost efficient manner. This has somehow changed with the advent of Internet companies like Google and Facebook, which have devised new ways of tackling this issue. However, the variety and complexity of the value chains in the private sector as well as the increasing demands and constraints in which the public one operates, needs an ongoing research that can yield newer strategies for dealing with data, facilitate the integration of providers and consumers of information, and guarantee a smooth and prompt transition when adopting these cutting-edge technological advances. This thesis aims at providing novel architectures and techniques that will help perform this transition towards Big Data in massive scientific archives. It highlights the common pitfalls that must be faced when embracing it and how to overcome them, especially when the data sets, their transformation pipelines and the tools used for the analysis are already present in the organizations. Furthermore, a new perspective for facilitating a smoother transition is laid out. It involves the usage of higher-level and use case specific frameworks and models, which will naturally bridge the gap between the technological and scientific domains. This alternative will effectively widen the possibilities of scientific archives and therefore will contribute to the reduction of the time to science. The research will be applied to the European Space Agency cornerstone mission Gaia, whose final data archive will represent a tremendous discovery potential. It will create the largest and most precise three dimensional chart of our galaxy (the Milky Way), providing unprecedented position, parallax and proper motion measurements for about one billion stars. The successful exploitation of this data archive will depend to a large degree on the ability to offer the proper architecture, i.e. infrastructure and middleware, upon which scientists will be able to do exploration and modeling with this huge data set. In consequence, the approach taken needs to enable data fusion with other scientific archives, as this will produce the synergies leading to an increment in scientific outcome, both in volume and in quality. The set of novel techniques and frameworks presented in this work addresses these issues by contextualizing them with the data products that will be generated in the Gaia mission. All these considerations have led to the foundations of the architecture that will be leveraged by the Science Enabling Applications Work Package. Last but not least, the effectiveness of the proposed solution will be demonstrated through the implementation of some ambitious statistical problems that will require significant computational capabilities, and which will use Gaia-like simulated data (the first Gaia data release has recently taken place on September 14th, 2016). These ambitious problems will be referred to as the Grand Challenge, a somewhat grandiloquent name that consists in inferring a set of parameters from a probabilistic point of view for the Initial Mass Function (IMF) and Star Formation Rate (SFR) of a given set of stars (with a huge sample size), from noisy estimates of their masses and ages respectively. This will be achieved by using Hierarchical Bayesian Modeling (HBM). In principle, the HBM can incorporate stellar evolution models to infer the IMF and SFR directly, but in this first step presented in this thesis, we will start with a somewhat less ambitious goal: inferring the PDMF and PDAD. Moreover, the performance and scalability analyses carried out will also prove the suitability of the models for the large amounts of data that will be available in the Gaia data archive.Las grandes cantidades de datos que se producen en el mundo diariamente plantean nuevos retos a la sociedad en términos de cómo extraer su valor inherente. Las redes sociales, mensajería instantánea, los dispositivos inteligentes y las misiones científicas son meros ejemplos del gran número de fuentes generando datos en cada momento. Al mismo tiempo que el mundo se digitaliza cada vez más, aparecen nuevas necesidades para organizar, archivar, compartir, analizar, visualizar y proteger la creciente cantidad de datos, para que podamos desarrollar economías basadas en datos e información que sean capaces de reducir las ineficiencias e incrementar la sostenibilidad, creando nuevas oportunidades de negocio por el camino. La forma en la que se han manejado los datos tradicionalmente no es la adecuada hoy en día, ya que carece de los medios para escalar a los volúmenes más grandes de datos de una forma oportuna y eficiente. Esto ha cambiado de alguna manera con la llegada de compañías que operan en Internet como Google o Facebook, ya que han concebido nuevas aproximaciones para abordar el problema. Sin embargo, la variedad y complejidad de las cadenas de valor en el sector privado y las crecientes demandas y limitaciones en las que el sector público opera, necesitan una investigación continua en la materia que pueda proporcionar nuevas estrategias para procesar las enormes cantidades de datos, facilitar la integración de productores y consumidores de información, y garantizar una transición rápida y fluida a la hora de adoptar estos avances tecnológicos innovadores. Esta tesis tiene como objetivo proporcionar nuevas arquitecturas y técnicas que ayudarán a realizar esta transición hacia Big Data en archivos científicos masivos. La investigación destaca los escollos principales a encarar cuando se adoptan estas nuevas tecnologías y cómo afrontarlos, principalmente cuando los datos y las herramientas de transformación utilizadas en el análisis existen en la organización. Además, se exponen nuevas medidas para facilitar una transición más fluida. Éstas incluyen la utilización de software de alto nivel y específico al caso de uso en cuestión, que haga de puente entre el dominio científico y tecnológico. Esta alternativa ampliará de una forma efectiva las posibilidades de los archivos científicos y por tanto contribuirá a la reducción del tiempo necesario para generar resultados científicos a partir de los datos recogidos en las misiones de astronomía espacial y planetaria. La investigación se aplicará a la misión de la Agencia Espacial Europea (ESA) Gaia, cuyo archivo final de datos presentará un gran potencial para el descubrimiento y hallazgo desde el punto de vista científico. La misión creará el catálogo en tres dimensiones más grande y preciso de nuestra galaxia (la Vía Láctea), proporcionando medidas sin precedente acerca del posicionamiento, paralaje y movimiento propio de alrededor de mil millones de estrellas. Las oportunidades para la explotación exitosa de este archivo de datos dependerán en gran medida de la capacidad de ofrecer la arquitectura adecuada, es decir infraestructura y servicios, sobre la cual los científicos puedan realizar la exploración y modelado con esta inmensa cantidad de datos. Por tanto, la estrategia a realizar debe ser capaz de combinar los datos con otros archivos científicos, ya que esto producirá sinergias que contribuirán a un incremento en la ciencia producida, tanto en volumen como en calidad de la misma. El conjunto de técnicas e infraestructuras innovadoras presentadas en este trabajo aborda estos problemas, contextualizándolos con los productos de datos que se generarán en la misión Gaia. Todas estas consideraciones han conducido a los fundamentos de la arquitectura que se utilizará en el paquete de trabajo de aplicaciones que posibilitarán la ciencia en el archivo de la misión Gaia (Science Enabling Applications). Por último, la eficacia de la solución propuesta se demostrará a través de la implementación de dos problemas estadísticos que requerirán cantidades significativas de cómputo, y que usarán datos simulados en el mismo formato en el que se producirán en el archivo de la misión Gaia (la primera versión de datos recogidos por la misión está disponible desde el día 14 de Septiembre de 2016). Estos ambiciosos problemas representan el Gran Reto (Grand Challenge), un nombre grandilocuente que consiste en inferir una serie de parámetros desde un punto de vista probabilístico para la función de masa inicial (Initial Mass Function) y la tasa de formación estelar (Star Formation Rate) dado un conjunto de estrellas (con una muestra grande), desde estimaciones con ruido de sus masas y edades respectivamente. Esto se abordará utilizando modelos jerárquicos bayesianos (Hierarchical Bayesian Modeling). Enprincipio,losmodelospropuestos pueden incorporar otros modelos de evolución estelar para inferir directamente la función de masa inicial y la tasa de formación estelar, pero en este primer paso presentado en esta tesis, empezaremos con un objetivo algo menos ambicioso: la inferencia de la función de masa y distribución de edades actual (Present-Day Mass Function y Present-Day Age Distribution respectivamente). Además, se llevará a cabo el análisis de rendimiento y escalabilidad para probar la idoneidad de la implementación de dichos modelos dadas las enormes cantidades de datos que estarán disponibles en el archivo de la misión Gaia...Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEunpu

    Automated Injection of Curated Knowledge Into Real-Time Clinical Systems: CDS Architecture for the 21st Century

    Get PDF
    abstract: Clinical Decision Support (CDS) is primarily associated with alerts, reminders, order entry, rule-based invocation, diagnostic aids, and on-demand information retrieval. While valuable, these foci have been in production use for decades, and do not provide a broader, interoperable means of plugging structured clinical knowledge into live electronic health record (EHR) ecosystems for purposes of orchestrating the user experiences of patients and clinicians. To date, the gap between knowledge representation and user-facing EHR integration has been considered an “implementation concern” requiring unscalable manual human efforts and governance coordination. Drafting a questionnaire engineered to meet the specifications of the HL7 CDS Knowledge Artifact specification, for example, carries no reasonable expectation that it may be imported and deployed into a live system without significant burdens. Dramatic reduction of the time and effort gap in the research and application cycle could be revolutionary. Doing so, however, requires both a floor-to-ceiling precoordination of functional boundaries in the knowledge management lifecycle, as well as formalization of the human processes by which this occurs. This research introduces ARTAKA: Architecture for Real-Time Application of Knowledge Artifacts, as a concrete floor-to-ceiling technological blueprint for both provider heath IT (HIT) and vendor organizations to incrementally introduce value into existing systems dynamically. This is made possible by service-ization of curated knowledge artifacts, then injected into a highly scalable backend infrastructure by automated orchestration through public marketplaces. Supplementary examples of client app integration are also provided. Compilation of knowledge into platform-specific form has been left flexible, in so far as implementations comply with ARTAKA’s Context Event Service (CES) communication and Health Services Platform (HSP) Marketplace service packaging standards. Towards the goal of interoperable human processes, ARTAKA’s treatment of knowledge artifacts as a specialized form of software allows knowledge engineers to operate as a type of software engineering practice. Thus, nearly a century of software development processes, tools, policies, and lessons offer immediate benefit: in some cases, with remarkable parity. Analyses of experimentation is provided with guidelines in how choice aspects of software development life cycles (SDLCs) apply to knowledge artifact development in an ARTAKA environment. Portions of this culminating document have been further initiated with Standards Developing Organizations (SDOs) intended to ultimately produce normative standards, as have active relationships with other bodies.Dissertation/ThesisDoctoral Dissertation Biomedical Informatics 201

    The pan-NLR'ome of Arabidopsis thaliana

    Get PDF
    Plants are the major nutritional component of the human diet, provide us with shel- ter, fuel, and enjoyment. Substantial yield loss is caused by plant diseases transmitted by bacteria, fungi, and oomycete pathogens. Plants have an elaborate innate immune system to fight threatening pathogens, relying to a great extend on highly variable re- sistance (R) genes. R genes often encode intracellular nucleotide-binding leucine-rich repeat receptors (NLRs) that directly or indirectly recognize pathogens by the presence or the activity of effector proteins in the plants’ cells. NLRs contain variable N-terminal domains, a central nucleotide-binding (NB) domain, and C-terminal leucine-rich repeats (LRRs). The N-terminal domains can be used to distinguish between the evolutionary conserved NLR classes TNL (with a toll/interleucin-1 receptor homology (TIR) domain), CNL (with a coiled-coil (CC) domain), and RNL (with an RPW8 domain). The archi- tectural diversity is increased by additional integrated domains (IDs) found in different positions. Plant species have between a few dozen and several hundred NLRs. The intraspecific R gene diversity is also high, and the still few known NLRs responsible for long-term resistance are often accession-specific. Intraspecific NLR studies to date suffer from several shortcomings: The pan-NLR’omes (the collection of all NLR genes and alleles occurring in a species) can often not be comprehensively described because too few accessions are analyzed, and NLR detection is essentially always guided by reference genomes, which biases the detection of novel genes and alleles. In addition, inappropriate or immature bioinformatics analysis pipelines may miss NLRs during the assembly or annotation phase, or result in erroneous NLR annotations. Knowing the pan-NLR’ome of a plant species is key to obtain novel resistant plants in the future. I created an extensive and reliable database that defines the near-complete pan-NLR’ome of the model plant Arabidopsis thaliana. Efforts were focused on a panel of 65 diverse accessions and applied state-of-the-art targeted long read sequencing (SMRT RenSeq). My analysis pipeline was designed to include optimized methods that could be applied to any SMRT RenSeq project. In the first part of my thesis I set quality control standards for the assembly of NLR-coding genomic fragments. I further introduce a novel and thorough gene annotation pipeline, supported by careful manual curation. In the second part, I present the manuscript reporting the saturated near-complete A. thaliana pan- NLR’ome. The species-wide high NLR diversity is revealed on the domain architecture level, and the usage of novel IDs is highlighted. The core NLR complement is defined and presence-absence polymorphisms in non-core NLRs are described. Furthermore, haplotype saturation is shown, selective forces are quantified, and evolutionary coupled co-evolving NLRs are detected. The method optimization results show that final NLR assembly quality is mainly influenced by the amount and the quality of input sequencing data. The results further show that manual curation of automated NLR predictions are crucial to prevent frequently occurring misannotations. The saturation of an NLR’ome has not been shown in any plant species so far, thus this study provides an unprecedented view on intraspecific NLR variation, the core NLR complement, and the evolutionary trajectories of NLRs. IDs are more frequently used than known before, suggesting a pivotal role of noncanonical NLRs in plant-pathogen interactions. This work sets new standards for the analysis of gene families at the species level. Future NLR’ome projects applied to important crop species will profit from my results and the easy-to-adopt anal- ysis pipeline. Ultimately, this will extend our knowledge of intraspecific NLR diversity beyond few reference species or genomes, and will facilitate the detection of functional NLRs, to be used in disease resistance breeding programs
    corecore