70 research outputs found

    CamemBERT-bio: a Tasty French Language Model Better for your Health

    Full text link
    Clinical data in hospitals are increasingly accessible for research through clinical data warehouses, however these documents are unstructured. It is therefore necessary to extract information from medical reports to conduct clinical studies. Transfer learning with BERT-like models such as CamemBERT has allowed major advances, especially for named entity recognition. However, these models are trained for plain language and are less efficient on biomedical data. This is why we propose a new French public biomedical dataset on which we have continued the pre-training of CamemBERT. Thus, we introduce a first version of CamemBERT-bio, a specialized public model for the French biomedical domain that shows 2.54 points of F1 score improvement on average on different biomedical named entity recognition tasks

    Building A Corporate Corpus For Threads Constitution

    Get PDF
    International audienceIn this paper we describe the process of building a corporate corpus that will be used as a reference for modelling and computing threads from conversations generated using communication and collaboration tools. The overall goal of the reconstruction of threads is to be able to provide value to the collorator in various use cases, such as higlighting the important parts of a running discussion, reviewing the upcoming commitments or deadlines, etc. Since, to our knowledge, there is no available corporate corpus for the French language which could allow us to address this problem of thread constitution, we present here a method for building such corpora including different aspects and steps which allowed the creation of a pipeline to pseudo-anonymise data. Such a pipeline is a response to the constraints induced by the General Data Protection Regulation GDPR in Europe and the compliance to the secrecy of correspondence

    Relatório de estágio em farmácia comunitária

    Get PDF
    Relatório de estágio realizado no âmbito do Mestrado Integrado em Ciências Farmacêuticas, apresentado à Faculdade de Farmácia da Universidade de Coimbr

    Refining Tabular Parsers for TAGs

    No full text
    This paper investigates several refinements of a generic tabular parser for Tree Adjoining Grammars. The resulting parser is simpler and more efficient in practice, even though the worst case complexity is not optimal

    Modulated Call/Return Evaluation strategies for Logic Programs

    No full text
    Using the Logic Push-Down automata [LPDA] formalism to express resolution strategies, we show how new resolution strategies can be derived from a basic Call/Return-oriented strategy by (a) modulating the information flow and (b) applying some standard simplifications. The correctness of the resulting strategies is proved. This study takes place in the context of a tabular-based evaluation of the strategies (automatically given by the LPDA) where to focus on relevant information may avoid useless computations and reduce the cost of the tabulation. Keywords: Tabulation LPDA Resolution-Strategy Some proofs are provided in appendix in case the rewievers wish to check them. They are not part of the paper and may be ignored. 1 Introduction Tabulation-based logic program evaluators have renewed the interest in the spectrum of resolution strategies, outside the well known SLD and BottomUp strategies. Indeed, we can imagine all kinds of "mixed" strategies where OLD-like prediction is used to ..

    Contents

    No full text
    Foundational course in Language and Logic Designing tabular parsers for various syntactic formalism

    Information flow in tabular interpretations for generalized Push-Down Automata

    Get PDF
    This paper presents a general framework for deriving tabular algorithms for a very large class of stack-based computations, not only in context-free parsing but in logic programming as well and more generally for all kinds of "information" domains (abstract domains, constraint domains). Tabular algorithms store traces of computations in a table to achieve computation sharing, which is most useful when dealing with non-deterministic computations. By considering what can be naively described as partial information on stack elements, we interpret these traces as stack fragments. Tuning the exact amount of information present in these traces allows us to improve tabular evaluation of stack-based computations, both by increasing the sharing of partial computations and by unifying different tabular algorithms within the same framework. Key words: Information Flow, Tabulation, Push-Down Automata, Logic Programming, Parsing. Introduction It is now an accepted practice to define computer fo..

    A tabular interpretation of a class of 2-Stack Automata

    No full text
    The paper presents a tabular interpretation for a kind of 2-Stack Automata. These automata may be used to describe various parsing strategies, ranging from purely top-down to purely bottom-up, for LIGs and TAGs. The tabular interpretation ensures, for all strategies, a time complexity in O(n 6 ) and space complexity in O(n 5 ) where n is the length of the input string. Introduction 2-Stack automata [2SA] have been identified as possible operational devices to describe parsing strategies for Linear Indexed Grammars [LIG] or Tree Adjoining Grammars [TAG] (mirroring the traditional use of Push-Down Automata [PDA] for ContextFree Grammars [CFG]). Different variants of 2SA (or not so distant Embedded Push-Down Automata) have been proposed, some to describe top-down strategies (Vijay-Shanker, 1988; Becker, 1994), some to describe bottom-up strategies (Rambow, 1994; Nederhof, 1998; Alonso Pardo et al., 1997), but none (that we know) that are able to describe both kinds of strategies. Th..
    corecore