9 research outputs found

    Bringing order into the realm of Transformer-based language models for artificial intelligence and law

    Full text link
    Transformer-based language models (TLMs) have widely been recognized to be a cutting-edge technology for the successful development of deep-learning-based solutions to problems and applications that require natural language processing and understanding. Like for other textual domains, TLMs have indeed pushed the state-of-the-art of AI approaches for many tasks of interest in the legal domain. Despite the first Transformer model being proposed about six years ago, there has been a rapid progress of this technology at an unprecedented rate, whereby BERT and related models represent a major reference, also in the legal domain. This article provides the first systematic overview of TLM-based methods for AI-driven problems and tasks in the legal sphere. A major goal is to highlight research advances in this field so as to understand, on the one hand, how the Transformers have contributed to the success of AI in supporting legal processes, and on the other hand, what are the current limitations and opportunities for further research development.Comment: Please refer to the published version: Greco, C.M., Tagarelli, A. (2023) Bringing order into the realm of Transformer-based language models for artificial intelligence and law. Artif Intell Law, Springer Nature. November 2023. https://doi.org/10.1007/s10506-023-09374-

    IberSPEECH 2020: XI Jornadas en TecnologĂ­a del Habla and VII Iberian SLTech

    Get PDF
    IberSPEECH2020 is a two-day event, bringing together the best researchers and practitioners in speech and language technologies in Iberian languages to promote interaction and discussion. The organizing committee has planned a wide variety of scientific and social activities, including technical paper presentations, keynote lectures, presentation of projects, laboratories activities, recent PhD thesis, discussion panels, a round table, and awards to the best thesis and papers. The program of IberSPEECH2020 includes a total of 32 contributions that will be presented distributed among 5 oral sessions, a PhD session, and a projects session. To ensure the quality of all the contributions, each submitted paper was reviewed by three members of the scientific review committee. All the papers in the conference will be accessible through the International Speech Communication Association (ISCA) Online Archive. Paper selection was based on the scores and comments provided by the scientific review committee, which includes 73 researchers from different institutions (mainly from Spain and Portugal, but also from France, Germany, Brazil, Iran, Greece, Hungary, Czech Republic, Ucrania, Slovenia). Furthermore, it is confirmed to publish an extension of selected papers as a special issue of the Journal of Applied Sciences, “IberSPEECH 2020: Speech and Language Technologies for Iberian Languages”, published by MDPI with fully open access. In addition to regular paper sessions, the IberSPEECH2020 scientific program features the following activities: the ALBAYZIN evaluation challenge session.Red Española de Tecnologías del Habla. Universidad de Valladoli

    Using NLP to resolve mismatches between jobseekers and positions in recruitment

    Get PDF
    Recruiting through online portals has seen a dramatic increase in recent decades and it is challenging for job seekers to evaluate the overwhelming amount of data to efficiently identify positions that align with their skills and qualifications. This research addresses this issue by investigating automatic approaches that leverage recent developments in Natural Language Processing (NLP) that search, parse, and evaluate the often unstructured data in order to find appropriate matches. We present the development of a benchmark suite consisting of an annotation schema, training corpus and baseline model for Entity Recognition (ER) in job descriptions, published under a Creative Commons licence. The dataset contains 18.6k entities comprising five types: Skill; Qualification; Experience; Occupation; and Domain. We develop a benchmark Conditional Random Fields (CRF) ER model which achieves an F1 score of 0.59, and our best performing model utilises Bidirectional Encoder Representations from Transformers (BERT) and achieves an F1 score of 0.73. We consider different ways of framing the matching problem and develop Machine Learning (ML) models to address each. We propose that the Natural Language Inference (NLI) paradigm most closely aligns with the matching problem. Our best performing model utilises decomposable attention and achieves an F1 score of 0.73 on a job application success prediction task. Finally, we integrate the ER and success prediction models into a cohesive pipeline that predicts whether a given job application made by a user will be successful, which can be extended into a system that recommends suitable jobs to a user. Although we observe poorer results on this pipeline relative to a more simple input truncation approach, we suggest this may be limited by the ER component for feature selection and the entity encoding process

    Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020

    Get PDF
    On behalf of the Program Committee, a very warm welcome to the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020). This edition of the conference is held in Bologna and organised by the University of Bologna. The CLiC-it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after six years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges

    Aspect of Code Cloning Towards Software Bug and Imminent Maintenance: A Perspective on Open-source and Industrial Mobile Applications

    Get PDF
    As a part of the digital era of microtechnology, mobile application (app) development is evolving with lightning speed to enrich our lives and bring new challenges and risks. In particular, software bugs and failures cost trillions of dollars every year, including fatalities such as a software bug in a self-driving car that resulted in a pedestrian fatality in March 2018 and the recent Boeing-737 Max tragedies that resulted in hundreds of deaths. Software clones (duplicated fragments of code) are also found to be one of the crucial factors for having bugs or failures in software systems. There have been many significant studies on software clones and their relationships to software bugs for desktop-based applications. Unfortunately, while mobile apps have become an integral part of today’s era, there is a marked lack of such studies for mobile apps. In order to explore this important aspect, in this thesis, first, we studied the characteristics of software bugs in the context of mobile apps, which might not be prevalent for desktop-based apps such as energy-related (battery drain while using apps) and compatibility-related (different behaviors of same app in different devices) bugs/issues. Using Support Vector Machine (SVM), we classified about 3K mobile app bug reports of different open-source development sites into four categories: crash, energy, functionality and security bug. We then manually examined a subset of those bugs and found that over 50% of the bug-fixing code-changes occurred in clone code. There have been a number of studies with desktop-based software systems that clearly show the harmful impacts of code clones and their relationships to software bugs. Given that there is a marked lack of such studies for mobile apps, in our second study, we examined 11 open-source and industrial mobile apps written in two different languages (Java and Swift) and noticed that clone code is more bug-prone than non-clone code and that industrial mobile apps have a higher code clone ratio than open-source mobile apps. Furthermore, we correlated our study outcomes with those of existing desktop based studies and surveyed 23 mobile app developers to validate our findings. Along with validating our findings from the survey, we noticed that around 95% of the developers usually copy/paste (code cloning) code fragments from the popular Crowd-sourcing platform, Stack Overflow (SO) to their projects and that over 75% of such developers experience bugs after such activities (the code cloning from SO). Existing studies with desktop-based systems also showed that while SO is one of the most popular online platforms for code reuse (and code cloning), SO code fragments are usually toxic in terms of software maintenance perspective. Thus, in the third study of this thesis, we studied the consequences of code cloning from SO in different open source and industrial mobile apps. We observed that closed-source industrial apps even reused more SO code fragments than open-source mobile apps and that SO code fragments were more change-prone (such as bug) than non-SO code fragments. We also experienced that SO code fragments were related to more bugs in industrial projects than open-source ones. Our studies show how we could efficiently and effectively manage clone related software bugs for mobile apps by utilizing the positive sides of code cloning while overcoming (or at least minimizing) the negative consequences of clone fragments

    Front-Line Physicians' Satisfaction with Information Systems in Hospitals

    Get PDF
    Day-to-day operations management in hospital units is difficult due to continuously varying situations, several actors involved and a vast number of information systems in use. The aim of this study was to describe front-line physicians' satisfaction with existing information systems needed to support the day-to-day operations management in hospitals. A cross-sectional survey was used and data chosen with stratified random sampling were collected in nine hospitals. Data were analyzed with descriptive and inferential statistical methods. The response rate was 65 % (n = 111). The physicians reported that information systems support their decision making to some extent, but they do not improve access to information nor are they tailored for physicians. The respondents also reported that they need to use several information systems to support decision making and that they would prefer one information system to access important information. Improved information access would better support physicians' decision making and has the potential to improve the quality of decisions and speed up the decision making process.Peer reviewe

    Pacific Symposium on Biocomputing 2023

    Get PDF
    The Pacific Symposium on Biocomputing (PSB) 2023 is an international, multidisciplinary conference for the presentation and discussion of current research in the theory and application of computational methods in problems of biological significance. Presentations are rigorously peer reviewed and are published in an archival proceedings volume. PSB 2023 will be held on January 3-7, 2023 in Kohala Coast, Hawaii. Tutorials and workshops will be offered prior to the start of the conference.PSB 2023 will bring together top researchers from the US, the Asian Pacific nations, and around the world to exchange research results and address open issues in all aspects of computational biology. It is a forum for the presentation of work in databases, algorithms, interfaces, visualization, modeling, and other computational methods, as applied to biological problems, with emphasis on applications in data-rich areas of molecular biology.The PSB has been designed to be responsive to the need for critical mass in sub-disciplines within biocomputing. For that reason, it is the only meeting whose sessions are defined dynamically each year in response to specific proposals. PSB sessions are organized by leaders of research in biocomputing's 'hot topics.' In this way, the meeting provides an early forum for serious examination of emerging methods and approaches in this rapidly changing field
    corecore