1,341 research outputs found

    An extensible multilingual open source lemmatizer

    Get PDF
    We present GATE DictLemmatizer, a multilingual open source lemmatizer for the GATE NLP framework that currently supports English, German, Italian, French, Dutch, and Spanish, and is easily extensible to other languages. The software is freely available under the LGPL license. The lemmatization is based on the Helsinki Finite-State Transducer Technology (HFST) and lemma dictionaries automatically created from Wiktionary. We evaluate the performance of the lemmatizers against TreeTagger, which is only freely available for research purposes. Our evaluation shows that DictLemmatizer achieves similar or even better results than TreeTagger for languages where there is support from HFST. The performance drops when there is no support from HFST and the entire lemmatization process is based on lemma dictionaries. However, the results are still satisfactory given the fact that DictLemmatizer isopen-source and can be easily extended to other languages. The software for extending the lemmatizer by creating word lists from Wiktionary dictionaries is also freely available as open-source software

    A deep neural network sentence level classification method with context information

    Get PDF
    In the sentence classification task, context formed from sentences adjacent to the sentence being classified can provide important information for classification. This context is, however, often ignored. Where methods do make use of context, only small amounts are considered, making it difficult to scale. We present a new method for sentence classification, Context-LSTM-CNN, that makes use of potentially large contexts. The method also utilizes long-range dependencies within the sentence being classified, using an LSTM, and short-span features, using a stacked CNN. Our experiments demonstrate that this approach consistently improves over previous methods on two different datasets

    Iz naših knjižnica Jelka Petrak: Poduka u središtu

    Get PDF
    From Our Libraries J. Petrak: Poduka u središtu (In Croatian

    Application of mathematical models in milk coagulation process during lactic acid fermentation. Part I. Relation betwen enzymmatic and acidic milk coagulation

    Get PDF
    Acidic and enzymatic coagulation of milk are complex processes which proceed in several phases and are dependent upon many different parameters. The formation of coagulum during lactic-acid fermentation is in fact acidic coagulation of milk. It occurs because of an increase in concentration of lactic acid, which causes a decrease in pH. Enzymatic coagulation of milk has been analytically described by means of mathematical models by many authors. Although enzymatic and acidic coagulation of milk do not proceed according to identical physical and chemical rules, it is possible to compare them kinetically. The aim of this paper was to combine the kinetics of enzymatic and acidic coagulation of milk and to mathematically present the changes that develop during lactic-acid fermentation of milk. The models presented in this paper enable a more complex mathematical analysis of the coagulation of the protein content of milk during lactic-acid fermentation. Application of the models enables the analysis and comparison of the kinetics of coagulation in different types of milk and various types of fermented dairy products manufactured with lactic acid bacteria. Mathematical combination of coagulation kinetics of the protein complex in milk with reological characteristics of the obtained fermented dairy products enables easier defining of parameters for lactic acid fermentation

    Using ontologies to map between research data and policymakers’ presumptions: the experience of the KNOWMAK project

    Get PDF
    Understanding knowledge co-creation in key emerging areas of European research is critical for policy makers wishing to analyze impact and make strategic decisions. However, purely data-driven methods for characterising policy topics have limitations relating to the broad nature of such topics and the differences in language and topic structure between the political language and scientific and technological outputs. In this paper, we discuss the use of ontologies and semantic technologies as a means to bridge the linguistic and conceptual gap between policy questions and data sources for characterising European knowledge production. Our experience suggests that the integration between advanced techniques for language processing and expert assessment at critical junctures in the process is key for the success of this endeavour

    Classification aware neural topic model and its application on a new COVID-19 disinformation corpus

    Get PDF
    The explosion of disinformation related to the COVID-19 pandemic has overloaded fact-checkers and media worldwide. To help tackle this, we developed computational methods to support COVID-19 disinformation debunking and social impacts research. This paper presents: 1) the currently largest available manually annotated COVID-19 disinformation category dataset; and 2) a classification-aware neural topic model (CANTM) that combines classification and topic modelling under a variational autoencoder framework. We demonstrate that CANTM efficiently improves classification performance with low resources, and is scalable. In addition, the classification-aware topics help researchers and end-users to better understand the classification results

    Classification aware neural topic model for COVID-19 disinformation categorisation

    Get PDF
    The explosion of disinformation accompanying the COVID-19 pandemic has overloaded fact-checkers and media worldwide, and brought a new major challenge to government responses worldwide. Not only is disinformation creating confusion about medical science amongst citizens, but it is also amplifying distrust in policy makers and governments. To help tackle this, we developed computational methods to categorise COVID-19 disinformation. The COVID-19 disinformation categories could be used for a) focusing fact-checking efforts on the most damaging kinds of COVID-19 disinformation; b) guiding policy makers who are trying to deliver effective public health messages and counter effectively COVID-19 disinformation. This paper presents: 1) a corpus containing what is currently the largest available set of manually annotated COVID-19 disinformation categories; 2) a classification-aware neural topic model (CANTM) designed for COVID-19 disinformation category classification and topic discovery; 3) an extensive analysis of COVID-19 disinformation categories with respect to time, volume, false type, media type and origin source

    Suppression of Surface-Related Loss in a Gated Semiconductor Microcavity

    Get PDF
    We present a surface-passivation method that reduces surface-related losses by almost 2 orders of magnitude in a highly miniaturized GaAs open microcavity. The microcavity consists of a curved dielectric distributed Bragg reflector with radius of approximately 10 ?m paired with a GaAs-based heterostructure. The heterostructure consists of a semiconductor distributed Bragg reflector followed by an n-i-p diode with a layer of quantum dots in the intrinsic region. Free-carrier absorption in the highly -n-doped and highly -p-doped layers is minimized by our positioning them close to a node of the vacuum electromagnetic field. The surface, however, resides at an antinode of the vacuum field and results in significant loss. These losses are much reduced by surface passivation. The strong dependence on wavelength implies that the main effect of the surface passivation is to eliminate the surface electric field, thereby quenching below-band-gap absorption via a Franz-Keldysh-like effect. An additional benefit is that the surface passivation reduces scattering at the GaAs surface. These results are important in other nanophotonic devices that rely on a GaAs-vacuum interface to confine the electromagnetic field
    corecore