12,247 research outputs found
TCBR-HMM: An HMM-based text classifier with a CBR system
This paper presents an innovative solution to model distributed adaptive systems in biomedical environments. We present an original TCBR-HMM (Text Case Based Reasoning-Hidden Markov Model) for biomedical text classification based on document content. The main goal is to propose a more effective classifier than current methods in this environment where the model needs to be adapted to new documents in an iterative learning frame. To demonstrate its achievement, we include a set of experiments, which have been performed on OSHUMED corpus. Our classifier is compared with Naive Bayes and SVM techniques, commonly used in text classification tasks. The results suggest that the TCBR-HMM Model is indeed more suitable for document classification. The model is empirically and statistically comparable to the SVM classifier and outperforms it in terms of time efficiency.Ministerio de Ciencia e Innovación | Ref. TIN2009-14057-C03-0
Discontinuous grammar as a foreign language
[Abstract] In order to achieve deep natural language understanding, syntactic constituent parsing is a vital step, highly demanded by many artificial intelligence systems to process both text and speech. One of the most recent proposals is the use of standard sequence-to-sequence models to perform constituent parsing as a machine translation task, instead of applying task-specific parsers. While they show a competitive performance, these text-to-parse transducers are still lagging behind classic techniques in terms of accuracy,
coverage and speed. To close the gap, we here extend the framework of sequence-to-sequence models for constituent parsing, not only by providing a more powerful neural architecture for improving their performance, but also by enlarging their coverage to handle the most complex syntactic phenomena: discontinuous structures. To that end, we design several novel linearizations that can fully produce discontinuities and, for the first time, we test a sequence-to-sequence model on the main discontinuous benchmarks, obtaining competitive results on par with task-specific discontinuous constituent parsers and achieving state-of-the-art scores on the (discontinuous) English Penn Treebank.Xunta de Galicia; ED431G 2019/01Xunta de Galicia; ED431C 2020/11We acknowledge the European Research Council (ERC), which has funded this research under the European Union’s Horizon 2020 research and innovation programme (FASTPARSE, grant agreement No 714150) and the Horizon Europe research and innovation programme (SALSA, grant agreement No 101100615), ERDF/ MICINN-AEI (SCANNER-UDC, PID2020-113230RB-C21), Xunta de Galicia (ED431C 2020/11), and Centro de Investigación de Galicia ‘‘CITIC”, funded by Xunta de Galicia and the European Union (ERDF - Galicia 2014–2020 Program), by grant ED431G 2019/01. Funding for open access charge: Universidade da Coruña/CISUG
Explicating peer feedback quality and its impact on feedback implementation in EFL writing
IntroductionAlthough it is commonly acknowledged that peer feedback quality is crucial to the success of peer review, there is a lack of consensus on how it could be determined. More importantly, how feedback quality interacts with other factors like feedback features and focus, and ultimately influences peer feedback implementation remains insufficiently investigated.MethodsThe present study examined peer feedback quality and its impact on Chinese students’ feedback implementation in two argumentative writing tasks. Peer feedback quality was measured according to a self-designed two-dimensional measurement scale: accuracy and revision potential.ResultsQuantitative analyses of 5,606 implementable idea units of feedback and 440 writing drafts by 110 students revealed that feedback accuracy was at a medium level and revision potential was at a low level, with accuracy demonstrating stronger predictive power on implementation; the predictive strengths of feedback accuracy and revision potential were strongest when feedback features and focus were considered; the overall peer feedback quality was low and medium-quality feedback was implemented most frequently; feedback quality significantly and most strongly predicted implementation in combination with feedback features and focus.DiscussionThe study highlights the importance of future instructions in training students to provide and implement high-quality feedback with good accuracy and high revision potential
Designing a Direct Feedback Loop between Humans and Convolutional Neural Networks through Local Explanations
The local explanation provides heatmaps on images to explain how
Convolutional Neural Networks (CNNs) derive their output. Due to its visual
straightforwardness, the method has been one of the most popular explainable AI
(XAI) methods for diagnosing CNNs. Through our formative study (S1), however,
we captured ML engineers' ambivalent perspective about the local explanation as
a valuable and indispensable envision in building CNNs versus the process that
exhausts them due to the heuristic nature of detecting vulnerability. Moreover,
steering the CNNs based on the vulnerability learned from the diagnosis seemed
highly challenging. To mitigate the gap, we designed DeepFuse, the first
interactive design that realizes the direct feedback loop between a user and
CNNs in diagnosing and revising CNN's vulnerability using local explanations.
DeepFuse helps CNN engineers to systemically search "unreasonable" local
explanations and annotate the new boundaries for those identified as
unreasonable in a labor-efficient manner. Next, it steers the model based on
the given annotation such that the model doesn't introduce similar mistakes. We
conducted a two-day study (S2) with 12 experienced CNN engineers. Using
DeepFuse, participants made a more accurate and "reasonable" model than the
current state-of-the-art. Also, participants found the way DeepFuse guides
case-based reasoning can practically improve their current practice. We provide
implications for design that explain how future HCI-driven design can move our
practice forward to make XAI-driven insights more actionable.Comment: 32 pages, 6 figures, 5 tables. Accepted for publication in the
Proceedings of the ACM on Human-Computer Interaction (PACM HCI), CSCW 202
Automatic Caption Generation for Aerial Images: A Survey
Aerial images have attracted attention from researcher community since long time. Generating a caption for an aerial image describing its content in comprehensive way is less studied but important task as it has applications in agriculture, defence, disaster management and many more areas. Though different approaches were followed for natural image caption generation, generating a caption for aerial image remains a challenging task due to its special nature. Use of emerging techniques from Artificial Intelligence (AI) and Natural Language Processing (NLP) domains have resulted in generation of accepted quality captions for aerial images. However lot needs to be done to fully utilize potential of aerial image caption generation task. This paper presents detail survey of the various approaches followed by researchers for aerial image caption generation task. The datasets available for experimentation, criteria used for performance evaluation and future directions are also discussed
TeamSTEPPS and Organizational Culture
Patient safety issues remain despite several strategies developed for their deterrence. While many safety initiatives bring about improvement, they are repeatedly unsustainable and short-lived. The index hospital’s goal was to build an organizational culture within a groundwork that improves teamwork and continuing healthcare team engagement. Teamwork influences the efficiency of patient care, patient safety, and clinical outcomes, as it has been identified as an approach for enhancing collaboration, decreasing medical errors, and building a culture of safety in healthcare. The facility implemented Team Strategies and Tools to Enhance Performance and Patient Safety (TeamSTEPPS), an evidence-based framework which was used for team training to produce valuable and needed changes, facilitating modification of organizational culture, increasing patient safety compliance, or solving particular issues. This study aimed to identify the correlation between TeamSTEPPS enactment and improved organizational culture in the ambulatory care nursing department of a New York City public hospital
Automated Mapping of Adaptive App GUIs from Phones to TVs
With the increasing interconnection of smart devices, users often desire to
adopt the same app on quite different devices for identical tasks, such as
watching the same movies on both their smartphones and TV.
However, the significant differences in screen size, aspect ratio, and
interaction styles make it challenging to adapt Graphical User Interfaces
(GUIs) across these devices.
Although there are millions of apps available on Google Play, only a few
thousand are designed to support smart TV displays.
Existing techniques to map a mobile app GUI to a TV either adopt a responsive
design, which struggles to bridge the substantial gap between phone and TV or
use mirror apps for improved video display, which requires hardware support and
extra engineering efforts.
Instead of developing another app for supporting TVs, we propose a
semi-automated approach to generate corresponding adaptive TV GUIs, given the
phone GUIs as the input.
Based on our empirical study of GUI pairs for TV and phone in existing apps,
we synthesize a list of rules for grouping and classifying phone GUIs,
converting them to TV GUIs, and generating dynamic TV layouts and source code
for the TV display.
Our tool is not only beneficial to developers but also to GUI designers, who
can further customize the generated GUIs for their TV app development.
An evaluation and user study demonstrate the accuracy of our generated GUIs
and the usefulness of our tool.Comment: 30 pages, 15 figure
RSGPT: A Remote Sensing Vision Language Model and Benchmark
The emergence of large-scale large language models, with GPT-4 as a prominent
example, has significantly propelled the rapid advancement of artificial
general intelligence and sparked the revolution of Artificial Intelligence 2.0.
In the realm of remote sensing (RS), there is a growing interest in developing
large vision language models (VLMs) specifically tailored for data analysis in
this domain. However, current research predominantly revolves around visual
recognition tasks, lacking comprehensive, large-scale image-text datasets that
are aligned and suitable for training large VLMs, which poses significant
challenges to effectively training such models for RS applications. In computer
vision, recent research has demonstrated that fine-tuning large vision language
models on small-scale, high-quality datasets can yield impressive performance
in visual and language understanding. These results are comparable to
state-of-the-art VLMs trained from scratch on massive amounts of data, such as
GPT-4. Inspired by this captivating idea, in this work, we build a high-quality
Remote Sensing Image Captioning dataset (RSICap) that facilitates the
development of large VLMs in the RS field. Unlike previous RS datasets that
either employ model-generated captions or short descriptions, RSICap comprises
2,585 human-annotated captions with rich and high-quality information. This
dataset offers detailed descriptions for each image, encompassing scene
descriptions (e.g., residential area, airport, or farmland) as well as object
information (e.g., color, shape, quantity, absolute position, etc). To
facilitate the evaluation of VLMs in the field of RS, we also provide a
benchmark evaluation dataset called RSIEval. This dataset consists of
human-annotated captions and visual question-answer pairs, allowing for a
comprehensive assessment of VLMs in the context of RS
Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
This paper presents a paradigm that adapts general large-scale pretrained
models (PTMs) to speech emotion recognition task. Although PTMs shed new light
on artificial general intelligence, they are constructed with general tasks in
mind, and thus, their efficacy for specific tasks can be further improved.
Additionally, employing PTMs in practical applications can be challenging due
to their considerable size. Above limitations spawn another research direction,
namely, optimizing large-scale PTMs for specific tasks to generate
task-specific PTMs that are both compact and effective. In this paper, we focus
on the speech emotion recognition task and propose an improved emotion-specific
pretrained encoder called Vesper. Vesper is pretrained on a speech dataset
based on WavLM and takes into account emotional characteristics. To enhance
sensitivity to emotional information, Vesper employs an emotion-guided masking
strategy to identify the regions that need masking. Subsequently, Vesper
employs hierarchical and cross-layer self-supervision to improve its ability to
capture acoustic and semantic representations, both of which are crucial for
emotion recognition. Experimental results on the IEMOCAP, MELD, and CREMA-D
datasets demonstrate that Vesper with 4 layers outperforms WavLM Base with 12
layers, and the performance of Vesper with 12 layers surpasses that of WavLM
Large with 24 layers.Comment: 13 pages, 5 figures, 8 table
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis
Pre-trained large language models (LLMs) have recently achieved better
generalization and sample efficiency in autonomous web navigation. However, the
performance on real-world websites has still suffered from (1) open domainness,
(2) limited context length, and (3) lack of inductive bias on HTML. We
introduce WebAgent, an LLM-driven agent that can complete the tasks on real
websites following natural language instructions. WebAgent plans ahead by
decomposing instructions into canonical sub-instructions, summarizes long HTML
documents into task-relevant snippets, and acts on websites via generated
Python programs from those. We design WebAgent with Flan-U-PaLM, for grounded
code generation, and HTML-T5, new pre-trained LLMs for long HTML documents
using local and global attention mechanisms and a mixture of long-span
denoising objectives, for planning and summarization. We empirically
demonstrate that our recipe improves the success on a real website by over 50%,
and that HTML-T5 is the best model to solve HTML-based tasks; achieving 14.9%
higher success rate than prior SoTA on the MiniWoB web navigation benchmark and
better accuracy on offline task planning evaluation
- …