9 research outputs found
Bringing order into the realm of Transformer-based language models for artificial intelligence and law
Transformer-based language models (TLMs) have widely been recognized to be a
cutting-edge technology for the successful development of deep-learning-based
solutions to problems and applications that require natural language processing
and understanding. Like for other textual domains, TLMs have indeed pushed the
state-of-the-art of AI approaches for many tasks of interest in the legal
domain. Despite the first Transformer model being proposed about six years ago,
there has been a rapid progress of this technology at an unprecedented rate,
whereby BERT and related models represent a major reference, also in the legal
domain. This article provides the first systematic overview of TLM-based
methods for AI-driven problems and tasks in the legal sphere. A major goal is
to highlight research advances in this field so as to understand, on the one
hand, how the Transformers have contributed to the success of AI in supporting
legal processes, and on the other hand, what are the current limitations and
opportunities for further research development.Comment: Please refer to the published version: Greco, C.M., Tagarelli, A.
(2023) Bringing order into the realm of Transformer-based language models for
artificial intelligence and law. Artif Intell Law, Springer Nature. November
2023. https://doi.org/10.1007/s10506-023-09374-
IberSPEECH 2020: XI Jornadas en TecnologĂa del Habla and VII Iberian SLTech
IberSPEECH2020 is a two-day event, bringing together the best researchers and practitioners in speech and language technologies in Iberian languages to promote interaction and discussion. The organizing committee has planned a wide variety of scientific and social activities, including technical paper presentations, keynote lectures, presentation of projects, laboratories activities, recent PhD thesis, discussion panels, a round table, and awards to the best thesis and papers. The program of IberSPEECH2020 includes a total of 32 contributions that will be presented distributed among 5 oral sessions, a PhD session, and a projects session. To ensure the quality of all the contributions, each submitted paper was reviewed by three members of the scientific review committee. All the papers in the conference will be accessible through the International Speech Communication Association (ISCA) Online Archive. Paper selection was based on the scores and comments provided by the scientific review committee, which includes 73 researchers from different institutions (mainly from Spain and Portugal, but also from France, Germany, Brazil, Iran, Greece, Hungary, Czech Republic, Ucrania, Slovenia). Furthermore, it is confirmed to publish an extension of selected papers as a special issue of the Journal of Applied Sciences, “IberSPEECH 2020: Speech and Language Technologies for Iberian Languages”, published by MDPI with fully open access. In addition to regular paper sessions, the IberSPEECH2020 scientific program features the following activities: the ALBAYZIN evaluation challenge session.Red Española de TecnologĂas del Habla. Universidad de Valladoli
Using NLP to resolve mismatches between jobseekers and positions in recruitment
Recruiting through online portals has seen a dramatic increase in recent decades and it is challenging for job seekers to evaluate the overwhelming amount of data to efficiently identify positions that align with their skills and qualifications. This research addresses this issue by investigating automatic approaches that leverage recent developments in Natural Language Processing (NLP) that search, parse, and evaluate the often unstructured data in order to find appropriate matches. We present the development of a benchmark suite consisting of an annotation schema, training corpus and baseline model for Entity Recognition (ER) in job descriptions, published under a Creative Commons licence. The dataset contains 18.6k entities comprising five types: Skill; Qualification; Experience; Occupation; and Domain. We develop a benchmark Conditional Random Fields (CRF) ER model which achieves an F1 score of 0.59, and our best performing model utilises Bidirectional Encoder Representations from Transformers (BERT) and achieves an F1 score of 0.73. We consider different ways of framing the matching problem and develop Machine Learning (ML) models to address each. We propose that the Natural Language Inference (NLI) paradigm most closely aligns with the matching problem. Our best performing model utilises decomposable attention and achieves an F1 score of 0.73 on a job application success prediction task. Finally, we integrate the ER and success prediction models into a cohesive pipeline that predicts whether a given job application made by a user will be successful, which can be extended into a system that recommends suitable jobs to a user. Although we observe poorer results on this pipeline relative to a more simple input truncation approach, we suggest this may be limited by the ER component for feature selection and the entity encoding process
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020
On behalf of the Program Committee, a very warm welcome to the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020). This edition of the conference is held in Bologna and organised by the University of Bologna. The CLiC-it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after six years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges
Aspect of Code Cloning Towards Software Bug and Imminent Maintenance: A Perspective on Open-source and Industrial Mobile Applications
As a part of the digital era of microtechnology, mobile application (app) development is evolving with
lightning speed to enrich our lives and bring new challenges and risks. In particular, software bugs and
failures cost trillions of dollars every year, including fatalities such as a software bug in a self-driving car
that resulted in a pedestrian fatality in March 2018 and the recent Boeing-737 Max tragedies that resulted
in hundreds of deaths. Software clones (duplicated fragments of code) are also found to be one of the
crucial factors for having bugs or failures in software systems. There have been many significant studies
on software clones and their relationships to software bugs for desktop-based applications. Unfortunately,
while mobile apps have become an integral part of today’s era, there is a marked lack of such studies for
mobile apps. In order to explore this important aspect, in this thesis, first, we studied the characteristics of
software bugs in the context of mobile apps, which might not be prevalent for desktop-based apps such as
energy-related (battery drain while using apps) and compatibility-related (different behaviors of same app
in different devices) bugs/issues. Using Support Vector Machine (SVM), we classified about 3K mobile app
bug reports of different open-source development sites into four categories: crash, energy, functionality and
security bug. We then manually examined a subset of those bugs and found that over 50% of the bug-fixing
code-changes occurred in clone code. There have been a number of studies with desktop-based software
systems that clearly show the harmful impacts of code clones and their relationships to software bugs. Given
that there is a marked lack of such studies for mobile apps, in our second study, we examined 11 open-source
and industrial mobile apps written in two different languages (Java and Swift) and noticed that clone code
is more bug-prone than non-clone code and that industrial mobile apps have a higher code clone ratio than
open-source mobile apps. Furthermore, we correlated our study outcomes with those of existing desktop based studies and surveyed 23 mobile app developers to validate our findings. Along with validating our
findings from the survey, we noticed that around 95% of the developers usually copy/paste (code cloning)
code fragments from the popular Crowd-sourcing platform, Stack Overflow (SO) to their projects and that
over 75% of such developers experience bugs after such activities (the code cloning from SO). Existing studies
with desktop-based systems also showed that while SO is one of the most popular online platforms for code
reuse (and code cloning), SO code fragments are usually toxic in terms of software maintenance perspective.
Thus, in the third study of this thesis, we studied the consequences of code cloning from SO in different open source and industrial mobile apps. We observed that closed-source industrial apps even reused more SO code
fragments than open-source mobile apps and that SO code fragments were more change-prone (such as bug)
than non-SO code fragments. We also experienced that SO code fragments were related to more bugs in
industrial projects than open-source ones. Our studies show how we could efficiently and effectively manage
clone related software bugs for mobile apps by utilizing the positive sides of code cloning while overcoming
(or at least minimizing) the negative consequences of clone fragments
Front-Line Physicians' Satisfaction with Information Systems in Hospitals
Day-to-day operations management in hospital units is difficult due to continuously varying situations, several actors involved and a vast number of information systems in use. The aim of this study was to describe front-line physicians' satisfaction with existing information systems needed to support the day-to-day operations management in hospitals. A cross-sectional survey was used and data chosen with stratified random sampling were collected in nine hospitals. Data were analyzed with descriptive and inferential statistical methods. The response rate was 65 % (n = 111). The physicians reported that information systems support their decision making to some extent, but they do not improve access to information nor are they tailored for physicians. The respondents also reported that they need to use several information systems to support decision making and that they would prefer one information system to access important information. Improved information access would better support physicians' decision making and has the potential to improve the quality of decisions and speed up the decision making process.Peer reviewe
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition. Volume 2 : Traitement Automatique des Langues Naturelles
@ 6ème conférence conjointe: JEP-TALN-RECITAL 2020no abstrac
Pacific Symposium on Biocomputing 2023
The Pacific Symposium on Biocomputing (PSB) 2023 is an international, multidisciplinary conference for the presentation and discussion of current research in the theory and application of computational methods in problems of biological significance. Presentations are rigorously peer reviewed and are published in an archival proceedings volume. PSB 2023 will be held on January 3-7, 2023 in Kohala Coast, Hawaii. Tutorials and workshops will be offered prior to the start of the conference.PSB 2023 will bring together top researchers from the US, the Asian Pacific nations, and around the world to exchange research results and address open issues in all aspects of computational biology. It is a forum for the presentation of work in databases, algorithms, interfaces, visualization, modeling, and other computational methods, as applied to biological problems, with emphasis on applications in data-rich areas of molecular biology.The PSB has been designed to be responsive to the need for critical mass in sub-disciplines within biocomputing. For that reason, it is the only meeting whose sessions are defined dynamically each year in response to specific proposals. PSB sessions are organized by leaders of research in biocomputing's 'hot topics.' In this way, the meeting provides an early forum for serious examination of emerging methods and approaches in this rapidly changing field