2,542 research outputs found
Lexical simplification for the systematic support of cognitive accessibility guidelines
The Internet has come a long way in recent years, contributing to the proliferation of
large volumes of digitally available information. Through user interfaces we can access
these contents, however, they are not accessible to everyone. The main users affected are
people with disabilities, who are already a considerable number, but accessibility barriers
affect a wide range of user groups and contexts of use in accessing digital information.
Some of these barriers are caused by language inaccessibility when texts contain long
sentences, unusual words and complex linguistic structures. These accessibility barriers
directly affect people with cognitive disabilities.
For the purpose of making textual content more accessible, there are initiatives such
as the Easy Reading guidelines, the Plain Language guidelines and some of the languagespecific
Web Content Accessibility Guidelines (WCAG). These guidelines provide documentation,
but do not specify methods for meeting the requirements implicit in these
guidelines in a systematic way. To obtain a solution, methods from the Natural Language
Processing (NLP) discipline can provide support for achieving compliance with the cognitive
accessibility guidelines for the language.
The task of text simplification aims at reducing the linguistic complexity of a text from
a syntactic and lexical perspective, the latter being the main focus of this Thesis. In this
sense, one solution space is to identify in a text which words are complex or uncommon,
and in the case that there were, to provide a more usual and simpler synonym, together
with a simple definition, all oriented to people with cognitive disabilities.
With this goal in mind, this Thesis presents the study, analysis, design and development
of an architecture, NLP methods, resources and tools for the lexical simplification of
texts for the Spanish language in a generic domain in the field of cognitive accessibility.
To achieve this, each of the steps present in the lexical simplification processes is studied,
together with methods for word sense disambiguation. As a contribution, different
types of word embedding are explored and created, supported by traditional and dynamic
embedding methods, such as transfer learning methods. In addition, since most of the
NLP methods require data for their operation, a resource in the framework of cognitive
accessibility is presented as a contribution.Internet ha avanzado mucho en los últimos años contribuyendo a la proliferación de
grandes volúmenes de información disponible digitalmente. A través de interfaces de
usuario podemos acceder a estos contenidos, sin embargo, estos no son accesibles a todas
las personas. Los usuarios afectados principalmente son las personas con discapacidad
siendo ya un número considerable, pero las barreras de accesibilidad afectan a un gran
rango de grupos de usuarios y contextos de uso en el acceso a la información digital. Algunas
de estas barreras son causadas por la inaccesibilidad al lenguaje cuando los textos
contienen oraciones largas, palabras inusuales y estructuras lingüísticas complejas. Estas
barreras de accesibilidad afectan directamente a las personas con discapacidad cognitiva.
Con el fin de hacer el contenido textual más accesible, existen iniciativas como las
pautas de Lectura Fácil, las pautas de Lenguaje Claro y algunas de las pautas de Accesibilidad
al Contenido en la Web (WCAG) específicas para el lenguaje. Estas pautas
proporcionan documentación, pero no especifican métodos para cumplir con los requisitos
implícitos en estas pautas de manera sistemática. Para obtener una solución, los
métodos de la disciplina del Procesamiento del Lenguaje Natural (PLN) pueden dar un
soporte para alcanzar la conformidad con las pautas de accesibilidad cognitiva relativas al
lenguaje
La tarea de la simplificación de textos del PLN tiene como objetivo reducir la complejidad
lingüística de un texto desde una perspectiva sintáctica y léxica, siendo esta última
el enfoque principal de esta Tesis. En este sentido, un espacio de solución es identificar
en un texto qué palabras son complejas o poco comunes, y en el caso de que sí hubiera,
proporcionar un sinónimo más usual y sencillo, junto con una definición sencilla, todo
ello orientado a las personas con discapacidad cognitiva.
Con tal meta, en esta Tesis, se presenta el estudio, análisis, diseño y desarrollo de
una arquitectura, métodos PLN, recursos y herramientas para la simplificación léxica de
textos para el idioma español en un dominio genérico en el ámbito de la accesibilidad
cognitiva. Para lograr esto, se estudia cada uno de los pasos presentes en los procesos
de simplificación léxica, junto con métodos para la desambiguación del sentido de las
palabras. Como contribución, diferentes tipos de word embedding son explorados y creados,
apoyados por métodos embedding tradicionales y dinámicos, como son los métodos
de transfer learning. Además, debido a que gran parte de los métodos PLN requieren
datos para su funcionamiento, se presenta como contribución un recurso en el marco de
la accesibilidad cognitiva.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: José Antonio Macías Iglesias.- Secretario: Israel González Carrasco.- Vocal: Raquel Hervás Ballestero
Core professionalism education in surgery: A systematic review
Background: Professionalism education is one of the major elements of surgical residency education. Aims: To evaluate the studies on core professionalism education programs in surgical professionalism education. Study Design: Systematic review. Methods: This systematic literature review was performed to analyze core professionalism programs for surgical residency education published in English with at least three of the following features: program developmental model/instructional design method, aims and competencies, methods of teaching, methods of assessment, and program evaluation model or method. A total of 27083 articles were retrieved using EBSCOHOST, PubMed, Science Direct, Web of Science, and manual search. Results: Eight articles met the selection criteria. The instructional design method was presented in only one article, which described the Analysis, Design, Development, Implementation, and Evaluation model. Six articles were based on the Accreditation Council for Graduate Medical Education criterion, although there was significant variability in content. The most common teaching method was role modeling with scenario- and case-based learning. A wide range of assessment methods for evaluating professionalism education were reported. The Kirkpatrick model was reported in one article as a method for program evaluation. Conclusion: It is suggested that for a core surgical professionalism education program, developmental/instructional design model, aims and competencies, content, teaching methods, assessment methods, and program evaluation methods/models should be well defined, and the content should be comparable. © 2018 by Trakya University Faculty of Medicine / The Balkan Medical Journal published by Galenos Publishing House
Deep Neural Mel-Subband Beamformer for In-car Speech Separation
While current deep learning (DL)-based beamforming techniques have been
proved effective in speech separation, they are often designed to process
narrow-band (NB) frequencies independently which results in higher
computational costs and inference times, making them unsuitable for real-world
use. In this paper, we propose DL-based mel-subband spatio-temporal beamformer
to perform speech separation in a car environment with reduced computation cost
and inference time. As opposed to conventional subband (SB) approaches, our
framework uses a mel-scale based subband selection strategy which ensures a
fine-grained processing for lower frequencies where most speech formant
structure is present, and coarse-grained processing for higher frequencies. In
a recursive way, robust frame-level beamforming weights are determined for each
speaker location/zone in a car from the estimated subband speech and noise
covariance matrices. Furthermore, proposed framework also estimates and
suppresses any echoes from the loudspeaker(s) by using the echo reference
signals. We compare the performance of our proposed framework to several NB,
SB, and full-band (FB) processing techniques in terms of speech quality and
recognition metrics. Based on experimental evaluations on simulated and
real-world recordings, we find that our proposed framework achieves better
separation performance over all SB and FB approaches and achieves performance
closer to NB processing techniques while requiring lower computing cost.Comment: Submitted to ICASSP 202
Beneath Surface Similarity: Large Language Models Make Reasonable Scientific Analogies after Structure Abduction
The vital role of analogical reasoning in human cognition allows us to grasp
novel concepts by linking them with familiar ones through shared relational
structures. Despite the attention previous research has given to word
analogies, this work suggests that Large Language Models (LLMs) often overlook
the structures that underpin these analogies, raising questions about the
efficacy of word analogies as a measure of analogical reasoning skills akin to
human cognition. In response to this, our paper introduces a task of analogical
structure abduction, grounded in cognitive psychology, designed to abduce
structures that form an analogy between two systems. In support of this task,
we establish a benchmark called SCAR, containing 400 scientific analogies from
13 distinct fields, tailored for evaluating analogical reasoning with structure
abduction. The empirical evidence underlines the continued challenges faced by
LLMs, including ChatGPT and GPT-4, in mastering this task, signifying the need
for future exploration to enhance their abilities.Comment: Accepted to EMNLP 2023 (Findings
Lexical complexity prediction: an overview
The occurrence of unknown words in texts significantly hinders reading comprehension. To improve accessibility for specific target populations, computational modeling has been applied to identify complex words in texts and substitute them for simpler alternatives. In this article, we present an overview of computational approaches to lexical complexity prediction focusing on the work carried out on English data. We survey relevant approaches to this problem which include traditional machine learning classifiers (e.g., SVMs, logistic regression) and deep neural networks as well as a variety of features, such as those inspired by literature in psycholinguistics as well as word frequency, word length, and many others. Furthermore, we introduce readers to past competitions and available datasets created on this topic. Finally, we include brief sections on applications of lexical complexity prediction, such as readability and text simplification, together with related studies on languages other than English
Identifying self-admitted technical debt in issue tracking systems using machine learning
Technical debt is a metaphor indicating sub-optimal solutions implemented for
short-term benefits by sacrificing the long-term maintainability and
evolvability of software. A special type of technical debt is explicitly
admitted by software engineers (e.g. using a TODO comment); this is called
Self-Admitted Technical Debt or SATD. Most work on automatically identifying
SATD focuses on source code comments. In addition to source code comments,
issue tracking systems have shown to be another rich source of SATD, but there
are no approaches specifically for automatically identifying SATD in issues. In
this paper, we first create a training dataset by collecting and manually
analyzing 4,200 issues (that break down to 23,180 sections of issues) from
seven open-source projects (i.e., Camel, Chromium, Gerrit, Hadoop, HBase,
Impala, and Thrift) using two popular issue tracking systems (i.e., Jira and
Google Monorail). We then propose and optimize an approach for automatically
identifying SATD in issue tracking systems using machine learning. Our findings
indicate that: 1) our approach outperforms baseline approaches by a wide margin
with regard to the F1-score; 2) transferring knowledge from suitable datasets
can improve the predictive performance of our approach; 3) extracted SATD
keywords are intuitive and potentially indicating types and indicators of SATD;
4) projects using different issue tracking systems have less common SATD
keywords compared to projects using the same issue tracking system; 5) a small
amount of training data is needed to achieve good accuracy.Comment: Accepted for publication in the EMSE journa
Defining and identifying the optimal embedding dimension of networks
Network embedding is a general-purpose machine learning technique that
encodes network structure in vector spaces with tunable dimension. Choosing an
appropriate embedding dimension -- small enough to be efficient and large
enough to be effective -- is challenging but necessary to generate embeddings
applicable to a multitude of tasks. Unlike most existing strategies that rely
on performance maximization in downstream tasks, here we propose a principled
method for the identification of an optimal dimension such that all structural
information of a network is parsimoniously encoded. The method is validated on
various embedding algorithms and a large corpus of real-world networks.
Estimated values of the optimal dimension in real-world networks suggest that
efficient encoding in low-dimensional spaces is usually possible.Comment: 9 pages, 5 figures + Suppl. Ma
- …