70 research outputs found
CamemBERT-bio: a Tasty French Language Model Better for your Health
Clinical data in hospitals are increasingly accessible for research through
clinical data warehouses, however these documents are unstructured. It is
therefore necessary to extract information from medical reports to conduct
clinical studies. Transfer learning with BERT-like models such as CamemBERT has
allowed major advances, especially for named entity recognition. However, these
models are trained for plain language and are less efficient on biomedical
data. This is why we propose a new French public biomedical dataset on which we
have continued the pre-training of CamemBERT. Thus, we introduce a first
version of CamemBERT-bio, a specialized public model for the French biomedical
domain that shows 2.54 points of F1 score improvement on average on different
biomedical named entity recognition tasks
Building A Corporate Corpus For Threads Constitution
International audienceIn this paper we describe the process of building a corporate corpus that will be used as a reference for modelling and computing threads from conversations generated using communication and collaboration tools. The overall goal of the reconstruction of threads is to be able to provide value to the collorator in various use cases, such as higlighting the important parts of a running discussion, reviewing the upcoming commitments or deadlines, etc. Since, to our knowledge, there is no available corporate corpus for the French language which could allow us to address this problem of thread constitution, we present here a method for building such corpora including different aspects and steps which allowed the creation of a pipeline to pseudo-anonymise data. Such a pipeline is a response to the constraints induced by the General Data Protection Regulation GDPR in Europe and the compliance to the secrecy of correspondence
Relatório de estágio em farmácia comunitária
Relatório de estágio realizado no âmbito do Mestrado Integrado em Ciências Farmacêuticas, apresentado à Faculdade de Farmácia da Universidade de Coimbr
Refining Tabular Parsers for TAGs
This paper investigates several refinements of a generic tabular parser for Tree Adjoining Grammars. The resulting parser is simpler and more efficient in practice, even though the worst case complexity is not optimal
Modulated Call/Return Evaluation strategies for Logic Programs
Using the Logic Push-Down automata [LPDA] formalism to express resolution strategies, we show how new resolution strategies can be derived from a basic Call/Return-oriented strategy by (a) modulating the information flow and (b) applying some standard simplifications. The correctness of the resulting strategies is proved. This study takes place in the context of a tabular-based evaluation of the strategies (automatically given by the LPDA) where to focus on relevant information may avoid useless computations and reduce the cost of the tabulation. Keywords: Tabulation LPDA Resolution-Strategy Some proofs are provided in appendix in case the rewievers wish to check them. They are not part of the paper and may be ignored. 1 Introduction Tabulation-based logic program evaluators have renewed the interest in the spectrum of resolution strategies, outside the well known SLD and BottomUp strategies. Indeed, we can imagine all kinds of "mixed" strategies where OLD-like prediction is used to ..
Contents
Foundational course in Language and Logic Designing tabular parsers for various syntactic formalism
Information flow in tabular interpretations for generalized Push-Down Automata
This paper presents a general framework for deriving tabular algorithms for a very large class of stack-based computations, not only in context-free parsing but in logic programming as well and more generally for all kinds of "information" domains (abstract domains, constraint domains). Tabular algorithms store traces of computations in a table to achieve computation sharing, which is most useful when dealing with non-deterministic computations. By considering what can be naively described as partial information on stack elements, we interpret these traces as stack fragments. Tuning the exact amount of information present in these traces allows us to improve tabular evaluation of stack-based computations, both by increasing the sharing of partial computations and by unifying different tabular algorithms within the same framework. Key words: Information Flow, Tabulation, Push-Down Automata, Logic Programming, Parsing. Introduction It is now an accepted practice to define computer fo..
A tabular interpretation of a class of 2-Stack Automata
The paper presents a tabular interpretation for a kind of 2-Stack Automata. These automata may be used to describe various parsing strategies, ranging from purely top-down to purely bottom-up, for LIGs and TAGs. The tabular interpretation ensures, for all strategies, a time complexity in O(n 6 ) and space complexity in O(n 5 ) where n is the length of the input string. Introduction 2-Stack automata [2SA] have been identified as possible operational devices to describe parsing strategies for Linear Indexed Grammars [LIG] or Tree Adjoining Grammars [TAG] (mirroring the traditional use of Push-Down Automata [PDA] for ContextFree Grammars [CFG]). Different variants of 2SA (or not so distant Embedded Push-Down Automata) have been proposed, some to describe top-down strategies (Vijay-Shanker, 1988; Becker, 1994), some to describe bottom-up strategies (Rambow, 1994; Nederhof, 1998; Alonso Pardo et al., 1997), but none (that we know) that are able to describe both kinds of strategies. Th..
- …