1,453,118 research outputs found
KnowLab at RadSum23: comparing pre-trained language models in radiology report summarization
This paper presents our contribution to the RadSum23 shared task organized as part of the BioNLP 2023. We compared state-of-the-art generative language models in generating high-quality summaries from radiology reports. A two-stage fine-tuning approach was introduced for utilizing knowledge learnt from different datasets. We evaluated the performance of our method using a variety of metrics, including BLEU, ROUGE, Bertscore, CheXbert, and RadGraph. Our results revealed the potentials of different models in summarizing radiology reports and demonstrated the effectiveness of the two-stage fine-tuning approach. We also discussed the limitations and future directions of our work, highlighting the need for better understanding the architecture design’s effect and optimal way of fine-tuning accordingly in automatic clinical summarizations
HumanTOMATO: Text-aligned Whole-body Motion Generation
This work targets a novel text-driven whole-body motion generation task,
which takes a given textual description as input and aims at generating
high-quality, diverse, and coherent facial expressions, hand gestures, and body
motions simultaneously. Previous works on text-driven motion generation tasks
mainly have two limitations: they ignore the key role of fine-grained hand and
face controlling in vivid whole-body motion generation, and lack a good
alignment between text and motion. To address such limitations, we propose a
Text-aligned whOle-body Motion generATiOn framework, named HumanTOMATO, which
is the first attempt to our knowledge towards applicable holistic motion
generation in this research area. To tackle this challenging task, our solution
includes two key designs: (1) a Holistic Hierarchical VQ-VAE (aka HVQ) and
a Hierarchical-GPT for fine-grained body and hand motion reconstruction and
generation with two structured codebooks; and (2) a pre-trained
text-motion-alignment model to help generated motion align with the input
textual description explicitly. Comprehensive experiments verify that our model
has significant advantages in both the quality of generated motions and their
alignment with text.Comment: 31 pages, 15 figures, 16 tables. Project page:
https://lhchen.top/HumanTOMAT
Response: Rights as Trumps of What?
Background: Smartphone technology presents a novel and promising opportunity to extend the reach of psychotherapeutic interventions by moving selected parts of the therapy into the real-life situations causing distress. This randomised controlled trial will investigate the effects of a transdiagnostic, Internet-administered cognitive behavioural (iCBT) self-help program for anxiety, supplemented with a smartphone application. The effect of added therapist support will also be studied. Methods/Design: One hundred and fifty participants meeting diagnostic criteria for social anxiety disorder and/or panic disorder will be evenly randomised to either one of three study groups: 1, smartphone-supplemented iCBT with therapist support; 2, smartphone-supplemented iCBT without therapist support; or 3, an active waiting list control group with delayed treatment. Primary outcome measure will be the Generalised Anxiety Disorder 7-item self-rating scale. Secondary measures include other anxiety, depression and quality of life measures. In addition to pre- and post-treatment measurements, the study includes two mid-treatment (days 24 and 48) and two follow-up assessments (12 and 36 months) to assess rapid and long-term effects. Discussion: To our knowledge, this is the first study to investigate the effectiveness of smartphone-supplemented iCBT for anxiety disorders. Hence, the findings from this trial will constitute great advancements in the burgeoning and promising field of smartphone-administered psychological interventions. Limitations are discussed
InitialGAN: A Language GAN with Completely Random Initialization
Text generative models trained via Maximum Likelihood Estimation (MLE) suffer
from the notorious exposure bias problem, and Generative Adversarial Networks
(GANs) are shown to have potential to tackle this problem. Existing language
GANs adopt estimators like REINFORCE or continuous relaxations to model word
distributions. The inherent limitations of such estimators lead current models
to rely on pre-training techniques (MLE pre-training or pre-trained
embeddings). Representation modeling methods which are free from those
limitations, however, are seldomly explored because of their poor performance
in previous attempts. Our analyses reveal that invalid sampling methods and
unhealthy gradients are the main contributors to such unsatisfactory
performance. In this work, we present two techniques to tackle these problems:
dropout sampling and fully normalized LSTM. Based on these two techniques, we
propose InitialGAN whose parameters are randomly initialized in full. Besides,
we introduce a new evaluation metric, Least Coverage Rate, to better evaluate
the quality of generated samples. The experimental results demonstrate that
InitialGAN outperforms both MLE and other compared models. To the best of our
knowledge, it is the first time a language GAN can outperform MLE without using
any pre-training techniques
TransFusion: Generating Long, High Fidelity Time Series using Diffusion Models with Transformers
The generation of high-quality, long-sequenced time-series data is essential
due to its wide range of applications. In the past, standalone Recurrent and
Convolutional Neural Network-based Generative Adversarial Networks (GAN) were
used to synthesize time-series data. However, they are inadequate for
generating long sequences of time-series data due to limitations in the
architecture. Furthermore, GANs are well known for their training instability
and mode collapse problem. To address this, we propose TransFusion, a
diffusion, and transformers-based generative model to generate high-quality
long-sequence time-series data. We have stretched the sequence length to 384,
and generated high-quality synthetic data. To the best of our knowledge, this
is the first study that has been done with this long-sequence length. Also, we
introduce two evaluation metrics to evaluate the quality of the synthetic data
as well as its predictive characteristics. We evaluate TransFusion with a wide
variety of visual and empirical metrics, and TransFusion outperforms the
previous state-of-the-art by a significant margin
Enabling Viewpoint Learning through Dynamic Label Generation
Optimal viewpoint prediction is an essential task in many computer graphics
applications. Unfortunately, common viewpoint qualities suffer from two major
drawbacks: dependency on clean surface meshes, which are not always available,
and the lack of closed-form expressions, which requires a costly search
involving rendering. To overcome these limitations we propose to separate
viewpoint selection from rendering through an end-to-end learning approach,
whereby we reduce the influence of the mesh quality by predicting viewpoints
from unstructured point clouds instead of polygonal meshes. While this makes
our approach insensitive to the mesh discretization during evaluation, it only
becomes possible when resolving label ambiguities that arise in this context.
Therefore, we additionally propose to incorporate the label generation into the
training procedure, making the label decision adaptive to the current network
predictions. We show how our proposed approach allows for learning viewpoint
predictions for models from different object categories and for different
viewpoint qualities. Additionally, we show that prediction times are reduced
from several minutes to a fraction of a second, as compared to state-of-the-art
(SOTA) viewpoint quality evaluation. We will further release the code and
training data, which will to our knowledge be the biggest viewpoint quality
dataset available
A knowledge-intensive approach to process similarity calculation
Process model comparison and similar processes retrieval are key issues to be addressed in many real world situations, and particularly relevant ones in some applications (e.g., in medicine), where similarity quantification can be exploited in a quality assessment perspective. Most of the process comparison techniques described in the literature suffer from two main limitations: (1) they adopt a purely syntactic (vs. semantic) approach in process activity comparison, and/or (2) they ignore complex control flow information (i.e., other than sequence). These limitations oversimplify the problem, and make the results of similarity-based process retrieval less reliable, especially when domain knowledge is available, and can be adopted to quantify activity or control flow construct differences. In this paper, we aim at overcoming both limitations, by introducing a framework which allows to extract the actual process model from the available process execution traces, through process mining techniques, and then to compare (mined) process models, by relying on a novel distance measure. The novel distance measure, which represents the main contribution of this paper, is able to address issues (1) and (2) above, since: (1) it provides a semantic, knowledge-intensive approach to process activity comparison, by making use of domain knowledge; (2) it explicitly takes into account complex control flow constructs (such as AND and XOR splits/joins), thus fully considering the different semantic meaning of control flow connections in a reliable way. The positive impact of the framework in practice has been tested in stroke management, where our approach has outperformed a state-of-the art literature metric on a real world event log, providing results that were closer to those of a human expert. Experiments in other domains are foreseen in the future
- …