791 research outputs found

    The Life Cycle of Knowledge in Big Language Models: A Survey

    Full text link
    Knowledge plays a critical role in artificial intelligence. Recently, the extensive success of pre-trained language models (PLMs) has raised significant attention about how knowledge can be acquired, maintained, updated and used by language models. Despite the enormous amount of related studies, there still lacks a unified view of how knowledge circulates within language models throughout the learning, tuning, and application processes, which may prevent us from further understanding the connections between current progress or realizing existing limitations. In this survey, we revisit PLMs as knowledge-based systems by dividing the life circle of knowledge in PLMs into five critical periods, and investigating how knowledge circulates when it is built, maintained and used. To this end, we systematically review existing studies of each period of the knowledge life cycle, summarize the main challenges and current limitations, and discuss future directions.Comment: paperlist: https://github.com/c-box/KnowledgeLifecycl

    Chatbots for Modelling, Modelling of Chatbots

    Full text link
    Tesis Doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingeniería Informática. Fecha de Lectura: 28-03-202

    Measuring the impact of COVID-19 on hospital care pathways

    Get PDF
    Care pathways in hospitals around the world reported significant disruption during the recent COVID-19 pandemic but measuring the actual impact is more problematic. Process mining can be useful for hospital management to measure the conformance of real-life care to what might be considered normal operations. In this study, we aim to demonstrate that process mining can be used to investigate process changes associated with complex disruptive events. We studied perturbations to accident and emergency (A &E) and maternity pathways in a UK public hospital during the COVID-19 pandemic. Co-incidentally the hospital had implemented a Command Centre approach for patient-flow management affording an opportunity to study both the planned improvement and the disruption due to the pandemic. Our study proposes and demonstrates a method for measuring and investigating the impact of such planned and unplanned disruptions affecting hospital care pathways. We found that during the pandemic, both A &E and maternity pathways had measurable reductions in the mean length of stay and a measurable drop in the percentage of pathways conforming to normative models. There were no distinctive patterns of monthly mean values of length of stay nor conformance throughout the phases of the installation of the hospital’s new Command Centre approach. Due to a deficit in the available A &E data, the findings for A &E pathways could not be interpreted

    Toward relevant answers to queries on incomplete databases

    Get PDF
    Incomplete and uncertain information is ubiquitous in database management applications. However, the techniques specifically developed to handle incomplete data are not sufficient. Even the evaluation of SQL queries on databases containing NULL values remains a challenge after 40 years. There is no consensus on what an answer to a query on an incomplete database should be, and the existing notions often have limited applicability. One of the most prevalent techniques in the literature is based on finding answers that are certainly true, independently of how missing values are interpreted. However, this notion has yielded several conflicting formal definitions for certain answers. Based on the fact that incomplete data can be enriched by some additional knowledge, we designed a notion able to unify and explain the different definitions for certain answers. Moreover, the knowledge-preserving certain answers notion is able to provide the first well-founded definition of certain answers for the relational bag data model and value-inventing queries, addressing some key limitations of previous approaches. However, it doesn’t provide any guarantee about the relevancy of the answers it captures. To understand what would be relevant answers to queries on incomplete databases, we designed and conducted a survey on the everyday usage of NULL values among database users. One of the findings from this socio-technical study is that even when users agree on the possible interpretation of NULL values, they may not agree on what a satisfactory query answer is. Therefore, to be relevant, query evaluation on incomplete databases must account for users’ tasks and preferences. We model users’ preferences and tasks with the notion of regret. The regret function captures the task-dependent loss a user endures when he considers a database as ground truth instead of another. Thanks to this notion, we designed the first framework able to provide a score accounting for the risk associated with query answers. It allows us to define the risk-minimizing answers to queries on incomplete databases. We show that for some regret functions, regret-minimizing answers coincide with certain answers. Moreover, as the notion is more agile, it can capture more nuanced answers and more interpretations of incompleteness. A different approach to improve the relevancy of an answer is to explain its provenance. We propose to partition the incompleteness into sources and measure their respective contribution to the risk of answer. As a first milestone, we study several models to predict the evolution of the risk when we clean a source of incompleteness. We implemented the framework, and it exhibits promising results on relational databases and queries with aggregate and grouping operations. Indeed, the model allows us to infer the risk reduction obtained by cleaning an attribute. Finally, by considering a game theoretical approach, the model can provide an explanation for answers based on the contribution of each attributes to the risk

    Systems and Algorithms for Dynamic Graph Processing

    Get PDF
    Data generated from human and systems interactions could be naturally represented as graph data. Several emerging applications rely on graph data, such as the semantic web, social networks, bioinformatics, finance, and trading among others. These applications require graph querying capabilities which are often implemented in graph database management systems (GDBMS). Many GDBMSs have capabilities to evaluate one-time versions of recursive or subgraph queries over static graphs – graphs that do not change or a single snapshot of a changing graph. They generally do not support incrementally maintaining queries as graphs change. However, most applications that employ graphs are dynamic in nature resulting in graphs that change over time, also known as dynamic graphs. This thesis investigates how to build a generic and scalable incremental computation solution that is oblivious to graph workloads. It focuses on two fundamental computations performed by many applications: recursive queries and subgraph queries. Specifically, for subgraph queries, this thesis presents the first approach that (i) performs joins with worstcase optimal computation and communication costs; and (ii) maintains a total memory footprint almost linear in the number of input edges. For recursive queries, this thesis studies optimizations for using differential computation (DC). DC is a general incremental computation that can maintain the output of a recursive dataflow computation upon changes. However, it requires a prohibitively large amount of memory because it maintains differences that track changes in queries input/output. The thesis proposes a suite of optimizations that are based on reducing the number of these differences and recomputing them when necessary. The techniques and optimizations in this thesis, for subgraph and recursive computations, represent a proposal for how to build a state-of-the-art generic and scalable GDBMS for dynamic graph data management

    Size Bounds and Algorithms for Conjunctive Regular Path Queries

    Get PDF
    Conjunctive regular path queries (CRPQs) are one of the core classes of queries over graph databases. They are join intensive, inheriting their structure from the relational setting, but they also allow arbitrary length paths to connect points that are to be joined. However, despite their popularity, little is known about what are the best algorithms for processing CRPQs. We focus on worst-case optimal algorithms, which are algorithms that run in time bounded by the worst-case output size of queries, and have been recently deployed for simpler graph queries with very promising results. We show that the famous bound on the number of query results by Atserias, Grohe and Marx can be extended to CRPQs, but to obtain tight bounds one needs to work with slightly stronger cardinality profiles. We also discuss what algorithms follow from our analysis. If one pays the cost for fully materializing graph queries, then the techniques developed for conjunctive queries can be reused. If, on the other hand, one imposes constraint on the working memory of algorithms, then worst-case optimal algorithms must be adapted with care: the order of variables in which queries are processed can have striking implications on the running time of queries

    Towards LLOD-based language contact studies: a case study in interoperability

    Get PDF
    We describe a methodological and technical framework for conducting qualitative and quantitative studies of linguistic research questions over diverse and heterogeneous data sources such as corpora and elicitations. We demonstrate how LLOD formalisms can be employed to develop extraction pipelines for features and linguistic examples from corpora and collections of interlinear glossed text, and furthermore, how SPARQL UPDATE can be employed (1) to normalize diverse data against a reference data model (here, POWLA), (2) to harmonize annotation vocabularies by reference to terminology repositories (here, OLiA), (3) to extract examples from these normalized data structures regardless of their origin, and (4) to implement this extraction routine in a tool-independent manner for different languages with different annotation schemes. We demonstrate our approach for language contact studies for genetically unrelated, but neighboring languages from the Caucasus area, Eastern Armenian and Georgian

    Methodological approaches and techniques for designing ontologies in information systems requirements engineering

    Get PDF
    Programa doutoral em Information Systems and TechnologyThe way we interact with the world around us is changing as new challenges arise, embracing innovative business models, rethinking the organization and processes to maximize results, and evolving change management. Currently, and considering the projects executed, the methodologies used do not fully respond to the companies' needs. On the one hand, organizations are not familiar with the languages used in Information Systems, and on the other hand, they are often unable to validate requirements or business models. These are some of the difficulties encountered that lead us to think about formulating a new approach. Thus, the state of the art presented in this paper includes a study of the models involved in the software development process, where traditional methods and the rivalry of agile methods are present. In addition, a survey is made about Ontologies and what methods exist to conceive, transform, and represent them. Thus, after analyzing some of the various possibilities currently available, we began the process of evolving a method and developing an approach that would allow us to design ontologies. The method we evolved and adapted will allow us to derive terminologies from a specific domain, aggregating them in order to facilitate the construction of a catalog of terminologies. Next, the definition of an approach to designing ontologies will allow the construction of a domain-specific ontology. This approach allows in the first instance to integrate and store the data from different information systems of a given organization. In a second instance, the rules for mapping and building the ontology database are defined. Finally, a technological architecture is also proposed that will allow the mapping of an ontology through the construction of complex networks, allowing mapping and relating terminologies. This doctoral work encompasses numerous Research & Development (R&D) projects belonging to different domains such as Software Industry, Textile Industry, Robotic Industry and Smart Cities. Finally, a critical and descriptive analysis of the work done is performed, and we also point out perspectives for possible future work.A forma como interagimos com o mundo à nossa volta está a mudar à medida que novos desafios surgem, abraçando modelos empresariais inovadores, repensando a organização e os processos para maximizar os resultados, e evoluindo a gestão da mudança. Atualmente, e considerando os projetos executados, as metodologias utilizadas não respondem na totalidade às necessidades das empresas. Por um lado, as organizações não estão familiarizadas com as linguagens utilizadas nos Sistemas de Informação, por outro lado, são muitas vezes incapazes de validar requisitos ou modelos de negócio. Estas são algumas das dificuldades encontradas que nos levam a pensar na formulação de uma nova abordagem. Assim, o estado da arte apresentado neste documento inclui um estudo dos modelos envolvidos no processo de desenvolvimento de software, onde os métodos tradicionais e a rivalidade de métodos ágeis estão presentes. Além disso, é efetuado um levantamento sobre Ontologias e quais os métodos existentes para as conceber, transformar e representar. Assim, e após analisarmos algumas das várias possibilidades atualmente disponíveis, iniciou-se o processo de evolução de um método e desenvolvimento de uma abordagem que nos permitisse conceber ontologias. O método que evoluímos e adaptamos permitirá derivar terminologias de um domínio específico, agregando-as de forma a facilitar a construção de um catálogo de terminologias. Em seguida, a definição de uma abordagem para conceber ontologias permitirá a construção de uma ontologia de um domínio específico. Esta abordagem permite em primeira instância, integrar e armazenar os dados de diferentes sistemas de informação de uma determinada organização. Num segundo momento, são definidas as regras para o mapeamento e construção da base de dados ontológica. Finalmente, é também proposta uma arquitetura tecnológica que permitirá efetuar o mapeamento de uma ontologia através da construção de redes complexas, permitindo mapear e relacionar terminologias. Este trabalho de doutoramento engloba inúmeros projetos de Investigação & Desenvolvimento (I&D) pertencentes a diferentes domínios como por exemplo Indústria de Software, Indústria Têxtil, Indústria Robótica e Smart Cities. Finalmente, é realizada uma análise critica e descritiva do trabalho realizado, sendo que apontamos ainda perspetivas de possíveis trabalhos futuros
    • …
    corecore