10 research outputs found

    The value of numbers in clinical text classification

    Get PDF
    Clinical text often includes numbers of various types and formats. However, most current text classification approaches do not take advantage of these numbers. This study aims to demonstrate that using numbers as features can significantly improve the performance of text classification models. This study also demonstrates the feasibility of extracting such features from clinical text. Unsupervised learning was used to identify patterns of number usage in clinical text. These patterns were analyzed manually and converted into pattern-matching rules. Information extraction was used to incorporate numbers as features into a document representation model. We evaluated text classification models trained on such representation. Our experiments were performed with two document representation models (vector space model and word embedding model) and two classification models (support vector machines and neural networks). The results showed that even a handful of numerical features can significantly improve text classification performance. We conclude that commonly used document representations do not represent numbers in a way that machine learning algorithms can effectively utilize them as features. Although we demonstrated that traditional information extraction can be effective in converting numbers into features, further community-wide research is required to systematically incorporate number representation into the word embedding process

    To BAN or not to BAN

    Full text link
    Hate speech is an important problem in the management of user-generated content. To remove offensive content or ban misbehaving users, content moderators need reliable hate speech detectors. Recently, deep neural networks based on the transformer architecture, such as the (multilingual) BERT model, have achieved superior performance in many natural language classification tasks, including hate speech detection. So far, these methods have not been able to quantify their output in terms of reliability. We propose a Bayesian method using Monte Carlo dropout within the attention layers of the transformer models to provide well-calibrated reliability estimates. We evaluate and visualize the results of the proposed approach on hate speech detection problems in several languages. Additionally, we test whether affective dimensions can enhance the information extracted by the BERT model in hate speech classification. Our experiments show that Monte Carlo dropout provides a viable mechanism for reliability estimation in transformer networks. Used within the BERT model, it offers state-of-the-art classification performance and can detect less trusted predictions

    Headwater refuges: Flow protects Austropotamobius crayfish from Faxonius limosus invasion

    No full text
    This study explores the geospatial relationship between the invasive crayfish species Faxonius limosus and the native Austropotamobius bihariensis and A. torrentium crayfish populations in Eastern Europe, identifying the environmental factors which influence the invasion. We used species distribution modelling based on several climatic, geophysical and water quality variables and crayfish distributional data to predict sectors suitable for each species within the river network. Thus, we identified the sectors potentially connecting invasive and native population clusters and quantified the degree of proximity between competing species. These sectors were then extensively surveyed with trapping and hand searching, doubled by eDNA methods, in order to assess whether any crayfish or the crayfish plague pathogen Aphanomyces astaci are present. The predictive models exhibited excellent performance and successfully distinguished between the analysed crayfish species. The expansion of F. limosus in streams was found to be limited by flash-flood potential, resulting in a range that is constrained to lowland rivers. Field surveys found neither crayfish nor pathogen presence in the connective sectors. Another interesting finding derived from the screening efforts, which are among the most extensive carried out across native, apparently healthy crayfish populations, was the existence of a latent infection with an A. astaci strain identified as A-haplogroup. Our results provide realistic insights for the long-term conservation of native Austropotamobius species, which appear to be naturally protected from F. limosus expansion. Conservation efforts can thus focus on other relevant aspects, such as ark-sites establishment for preventing the spread of more dangerous invasive crayfish species and of virulent crayfish plague pathogen strains, even in locations without direct contact between crayfish hosts

    Multi-aspect Multilingual and Cross-lingual Parliamentary Speech Analysis

    Full text link
    Parliamentary and legislative debate transcripts provide informative insight into elected politicians' opinions, positions, and policy preferences. They are interesting for political and social sciences as well as linguistics and natural language processing (NLP) research. While existing research studied individual parliaments, we apply advanced NLP methods to a joint and comparative analysis of six national parliaments (Bulgarian, Czech, French, Slovene, Spanish, and United Kingdom) between 2017 and 2020. We analyze emotions and sentiment in the transcripts from the ParlaMint dataset collection and assess if the age, gender, and political orientation of speakers can be detected from their speeches. The results show some commonalities and many surprising differences among the analyzed countries

    MECHANICAL AND FRACTOGRAPHIC ASSESSMENT OF DIFFERENT TYPES OF ENDODONTIC POST SYSTEMS USED IN THE RESTORATION OF DEVITALIZED TEETH

    No full text
    Aim of the study The resistance to functional and parafunctional stresses of devitalized teeth is multifactorial, but the choice of a specific endodontic post system influences the restoration’s longevity. This primary ex-vivo study aims to evaluate the compressive strength of three different non-metallic post systems and to analyze the appearance of fracture surfaces (fractography) using optical microscopy and alongside statistical analysis that validates their behavior. Material and methods For this study, three groups of non-metallic post systems were considered as follows: glass fiber-reinforced photopolymerized resin, pressed ceramic, and pressed ceramic on glass fiber. The sample size has a length of 0.8mm and an average diameter of 0.2mm. Subsequently, the samples were embedded in self-polymerizing resin to be secured in the workspace of the Zwick Roell 5kN testing machine that works at loading speed of 2 mm/min. The fractographic analysis of the obtained surfaces were performed with the Optika SLX3 microscope and C-B16 camera, at magnifications of 20, 50, and 90X. Results The data obtained from the mechanical tests were statistically processed and the results showed significant differences regarding the compressive strength of non-metallic post systems made from different materials and different technologies. Conclusions The group containing posts made from pressed ceramic on glass fiber exhibited higher resistance to applied compressive forces
    corecore