91 research outputs found
Effects of Hot Water Immersion on Storage Quality of Fresh Broccoli Heads
Freshly harvested broccoli heads were immersed for 0, 1, 4 or 8 min into hot water at 45 °C, and then were hydrocooled rapidly for 10 min at 10 °C. Following these treatments, the broccoli were air-dried for 30 min, then packed in commercial polymeric film bags, and, finally, stored for 16 days at –1, 1, and 12 °C. The samples treated with hot water maintained high contents of chlorophyll concentrations, their yellowing rate was delayed, and fungal infection and chilling or freezing injury were inhibited markedly. Compared to non-heat-treated broccoli, a lower level of peroxidase activity with a relatively higher chlorophyll concentration was observed when broccoli were treated with hot water. Among these heat treatments, immersion in hot water for 4 min at 45 °C was the most effective for maintaining the quality of harvested broccoli heads
Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models
Generative models have recently exhibited exceptional capabilities in
text-to-image generation, but still struggle to generate image sequences
coherently. In this work, we focus on a novel, yet challenging task of
generating a coherent image sequence based on a given storyline, denoted as
open-ended visual storytelling. We make the following three contributions: (i)
to fulfill the task of visual storytelling, we propose a learning-based
auto-regressive image generation model, termed as StoryGen, with a novel
vision-language context module, that enables to generate the current frame by
conditioning on the corresponding text prompt and preceding image-caption
pairs; (ii) to address the data shortage of visual storytelling, we collect
paired image-text sequences by sourcing from online videos and open-source
E-books, establishing processing pipeline for constructing a large-scale
dataset with diverse characters, storylines, and artistic styles, named
StorySalon; (iii) Quantitative experiments and human evaluations have validated
the superiority of our StoryGen, where we show StoryGen can generalize to
unseen characters without any optimization, and generate image sequences with
coherent content and consistent character. Code, dataset, and models are
available at https://haoningwu3639.github.io/StoryGen_Webpage/Comment: Accepted by CVPR 2024. Project Page:
https://haoningwu3639.github.io/StoryGen_Webpage
Enhancing Diffusion Models with Text-Encoder Reinforcement Learning
Text-to-image diffusion models are typically trained to optimize the
log-likelihood objective, which presents challenges in meeting specific
requirements for downstream tasks, such as image aesthetics and image-text
alignment. Recent research addresses this issue by refining the diffusion U-Net
using human rewards through reinforcement learning or direct backpropagation.
However, many of them overlook the importance of the text encoder, which is
typically pretrained and fixed during training. In this paper, we demonstrate
that by finetuning the text encoder through reinforcement learning, we can
enhance the text-image alignment of the results, thereby improving the visual
quality. Our primary motivation comes from the observation that the current
text encoder is suboptimal, often requiring careful prompt adjustment. While
fine-tuning the U-Net can partially improve performance, it remains suffering
from the suboptimal text encoder. Therefore, we propose to use reinforcement
learning with low-rank adaptation to finetune the text encoder based on
task-specific rewards, referred as \textbf{TexForce}. We first show that
finetuning the text encoder can improve the performance of diffusion models.
Then, we illustrate that TexForce can be simply combined with existing U-Net
finetuned models to get much better results without additional training.
Finally, we showcase the adaptability of our method in diverse applications,
including the generation of high-quality face and hand images
Enhancing Pre-trained ASR System Fine-tuning for Dysarthric Speech Recognition using Adversarial Data Augmentation
Automatic recognition of dysarthric speech remains a highly challenging task
to date. Neuro-motor conditions and co-occurring physical disabilities create
difficulty in large-scale data collection for ASR system development. Adapting
SSL pre-trained ASR models to limited dysarthric speech via data-intensive
parameter fine-tuning leads to poor generalization. To this end, this paper
presents an extensive comparative study of various data augmentation approaches
to improve the robustness of pre-trained ASR model fine-tuning to dysarthric
speech. These include: a) conventional speaker-independent perturbation of
impaired speech; b) speaker-dependent speed perturbation, or GAN-based
adversarial perturbation of normal, control speech based on their time
alignment against parallel dysarthric speech; c) novel Spectral basis GAN-based
adversarial data augmentation operating on non-parallel data. Experiments
conducted on the UASpeech corpus suggest GAN-based data augmentation
consistently outperforms fine-tuned Wav2vec2.0 and HuBERT models using no data
augmentation and speed perturbation across different data expansion operating
points by statistically significant word error rate (WER) reductions up to
2.01% and 0.96% absolute (9.03% and 4.63% relative) respectively on the
UASpeech test set of 16 dysarthric speakers. After cross-system outputs
rescoring, the best system produced the lowest published WER of 16.53% (46.47%
on very low intelligibility) on UASpeech.Comment: To appear at IEEE ICASSP 202
The early high-energy afterglow emission from Short GRBs
We calculate the high energy afterglow emission from short Gamma-Ray Bursts
(SGRBs) in the external shock model. There are two possible components
contributing to the high energy afterglow: the electron synchrotron emission
and the synchrotron self-Compton (SSC) emission. We find that for typical
parameter values of SGRBs, the early high-energy afterglow emission in 10
MeV-10 GeV is dominated by the synchrotron emission. For a burst occurring at
redshift z =0.1, the high-energy emission can be detectable by Fermi LAT if the
blast wave has an energy E>=10^51 ergs and the fraction of energy in electrons
is \epsilon_e>=0.1 . This provides a possible explanation for the high energy
tail of SGRB 081024B.Comment: 5 pages, 5 figures. This is a slightly expanded version of the paper
that will appear in Science in China Series
Beyond the Obvious: Evaluating the Reasoning Ability In Real-life Scenarios of Language Models on Life Scapes Reasoning Benchmark~(LSR-Benchmark)
This paper introduces the Life Scapes Reasoning Benchmark (LSR-Benchmark), a
novel dataset targeting real-life scenario reasoning, aiming to close the gap
in artificial neural networks' ability to reason in everyday contexts. In
contrast to domain knowledge reasoning datasets, LSR-Benchmark comprises
free-text formatted questions with rich information on real-life scenarios,
human behaviors, and character roles. The dataset consists of 2,162 questions
collected from open-source online sources and is manually annotated to improve
its quality. Experiments are conducted using state-of-the-art language models,
such as gpt3.5-turbo and instruction fine-tuned llama models, to test the
performance in LSR-Benchmark. The results reveal that humans outperform these
models significantly, indicating a persisting challenge for machine learning
models in comprehending daily human life
Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision
The rapid evolution of Multi-modality Large Language Models (MLLMs) has
catalyzed a shift in computer vision from specialized models to general-purpose
foundation models. Nevertheless, there is still an inadequacy in assessing the
abilities of MLLMs on low-level visual perception and understanding. To address
this gap, we present Q-Bench, a holistic benchmark crafted to systematically
evaluate potential abilities of MLLMs on three realms: low-level visual
perception, low-level visual description, and overall visual quality
assessment. a) To evaluate the low-level perception ability, we construct the
LLVisionQA dataset, consisting of 2,990 diverse-sourced images, each equipped
with a human-asked question focusing on its low-level attributes. We then
measure the correctness of MLLMs on answering these questions. b) To examine
the description ability of MLLMs on low-level information, we propose the
LLDescribe dataset consisting of long expert-labelled golden low-level text
descriptions on 499 images, and a GPT-involved comparison pipeline between
outputs of MLLMs and the golden descriptions. c) Besides these two tasks, we
further measure their visual quality assessment ability to align with human
opinion scores. Specifically, we design a softmax-based strategy that enables
MLLMs to predict quantifiable quality scores, and evaluate them on various
existing image quality assessment (IQA) datasets. Our evaluation across the
three abilities confirms that MLLMs possess preliminary low-level visual
skills. However, these skills are still unstable and relatively imprecise,
indicating the need for specific enhancements on MLLMs towards these abilities.
We hope that our benchmark can encourage the research community to delve deeper
to discover and enhance these untapped potentials of MLLMs. Project Page:
https://vqassessment.github.io/Q-Bench.Comment: 25 pages, 14 figures, 9 tables, preprint versio
Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation
New Natural Langauge Process~(NLP) benchmarks are urgently needed to align
with the rapid development of large language models (LLMs). We present Xiezhi,
the most comprehensive evaluation suite designed to assess holistic domain
knowledge. Xiezhi comprises multiple-choice questions across 516 diverse
disciplines ranging from 13 different subjects with 220,000 questions and
accompanied by Xiezhi-Specialty and Xiezhi-Interdiscipline, both with 15k
questions. We conduct evaluation of the 47 cutting-edge LLMs on Xiezhi. Results
indicate that LLMs exceed average performance of humans in science,
engineering, agronomy, medicine, and art, but fall short in economics,
jurisprudence, pedagogy, literature, history, and management. We anticipate
Xiezhi will help analyze important strengths and shortcomings of LLMs, and the
benchmark is released in https://github.com/MikeGu721/XiezhiBenchmark .Comment: Under review of NeurIPS 202
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
Multi-modality foundation models, as represented by GPT-4V, have brought a
new paradigm for low-level visual perception and understanding tasks, that can
respond to a broad range of natural human instructions in a model. While
existing foundation models have shown exciting potentials on low-level visual
tasks, their related abilities are still preliminary and need to be improved.
In order to enhance these models, we conduct a large-scale subjective
experiment collecting a vast number of real human feedbacks on low-level
vision. Each feedback follows a pathway that starts with a detailed description
on the low-level visual appearance (*e.g. clarity, color, brightness* of an
image, and ends with an overall conclusion, with an average length of 45 words.
The constructed **Q-Pathway** dataset includes 58K detailed human feedbacks on
18,973 images with diverse low-level appearance. Moreover, to enable foundation
models to robustly respond to diverse types of questions, we design a
GPT-participated conversion to process these feedbacks into diverse-format 200K
instruction-response pairs. Experimental results indicate that the
**Q-Instruct** consistently elevates low-level perception and understanding
abilities across several foundational models. We anticipate that our datasets
can pave the way for a future that general intelligence can perceive,
understand low-level visual appearance and evaluate visual quality like a
human. Our dataset, model zoo, and demo is published at:
https://q-future.github.io/Q-Instruct.Comment: 16 pages, 11 figures, page 12-16 as appendi
- …