Search CORE

91 research outputs found

Effects of Hot Water Immersion on Storage Quality of Fresh Broccoli Heads

Author: Haoning Guan
Huaqiang Dong
Ruang Liu
Yuehua Wang
Yueming Jiang
Publication venue: Faculty of Food Technology and Biotechnology, University of Zagreb
Publication date: 01/01/2004
Field of study

Freshly harvested broccoli heads were immersed for 0, 1, 4 or 8 min into hot water at 45 °C, and then were hydrocooled rapidly for 10 min at 10 °C. Following these treatments, the broccoli were air-dried for 30 min, then packed in commercial polymeric film bags, and, finally, stored for 16 days at –1, 1, and 12 °C. The samples treated with hot water maintained high contents of chlorophyll concentrations, their yellowing rate was delayed, and fungal infection and chilling or freezing injury were inhibited markedly. Compared to non-heat-treated broccoli, a lower level of peroxidase activity with a relatively higher chlorophyll concentration was observed when broccoli were treated with hot water. Among these heat treatments, immersion in hot water for 4 min at 45 °C was the most effective for maintaining the quality of harvested broccoli heads

Directory of Open Access Journals

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models

Author: Liu Chang
Wang Yanfeng
Wu Haoning
Xie Weidi
Zhang Xiaoyun
Zhong Yujie
Publication venue
Publication date: 04/03/2024
Field of study

Generative models have recently exhibited exceptional capabilities in text-to-image generation, but still struggle to generate image sequences coherently. In this work, we focus on a novel, yet challenging task of generating a coherent image sequence based on a given storyline, denoted as open-ended visual storytelling. We make the following three contributions: (i) to fulfill the task of visual storytelling, we propose a learning-based auto-regressive image generation model, termed as StoryGen, with a novel vision-language context module, that enables to generate the current frame by conditioning on the corresponding text prompt and preceding image-caption pairs; (ii) to address the data shortage of visual storytelling, we collect paired image-text sequences by sourcing from online videos and open-source E-books, establishing processing pipeline for constructing a large-scale dataset with diverse characters, storylines, and artistic styles, named StorySalon; (iii) Quantitative experiments and human evaluations have validated the superiority of our StoryGen, where we show StoryGen can generalize to unseen characters without any optimization, and generate image sequences with coherent content and consistent character. Code, dataset, and models are available at https://haoningwu3639.github.io/StoryGen_Webpage/Comment: Accepted by CVPR 2024. Project Page: https://haoningwu3639.github.io/StoryGen_Webpage

arXiv.org e-Print Archive

Enhancing Diffusion Models with Text-Encoder Reinforcement Learning

Author: Chen Chaofeng
Liao Liang
Lin Weisi
Sun Wenxiu
Wang Annan
Wu Haoning
Yan Qiong
Publication venue
Publication date: 27/11/2023
Field of study

Text-to-image diffusion models are typically trained to optimize the log-likelihood objective, which presents challenges in meeting specific requirements for downstream tasks, such as image aesthetics and image-text alignment. Recent research addresses this issue by refining the diffusion U-Net using human rewards through reinforcement learning or direct backpropagation. However, many of them overlook the importance of the text encoder, which is typically pretrained and fixed during training. In this paper, we demonstrate that by finetuning the text encoder through reinforcement learning, we can enhance the text-image alignment of the results, thereby improving the visual quality. Our primary motivation comes from the observation that the current text encoder is suboptimal, often requiring careful prompt adjustment. While fine-tuning the U-Net can partially improve performance, it remains suffering from the suboptimal text encoder. Therefore, we propose to use reinforcement learning with low-rank adaptation to finetune the text encoder based on task-specific rewards, referred as \textbf{TexForce}. We first show that finetuning the text encoder can improve the performance of diffusion models. Then, we illustrate that TexForce can be simply combined with existing U-Net finetuned models to get much better results without additional training. Finally, we showcase the adaptability of our method in diverse applications, including the generation of high-quality face and hand images

arXiv.org e-Print Archive

Enhancing Pre-trained ASR System Fine-tuning for Dysarthric Speech Recognition using Adversarial Data Augmentation

Author: Geng Mengzhe
Hu Shujie
Jin Zengrui
Li Guinan
Liu Xunying
Wang Huimeng
Wang Tianzi
Xu Haoning
Publication venue
Publication date: 31/12/2023
Field of study

Automatic recognition of dysarthric speech remains a highly challenging task to date. Neuro-motor conditions and co-occurring physical disabilities create difficulty in large-scale data collection for ASR system development. Adapting SSL pre-trained ASR models to limited dysarthric speech via data-intensive parameter fine-tuning leads to poor generalization. To this end, this paper presents an extensive comparative study of various data augmentation approaches to improve the robustness of pre-trained ASR model fine-tuning to dysarthric speech. These include: a) conventional speaker-independent perturbation of impaired speech; b) speaker-dependent speed perturbation, or GAN-based adversarial perturbation of normal, control speech based on their time alignment against parallel dysarthric speech; c) novel Spectral basis GAN-based adversarial data augmentation operating on non-parallel data. Experiments conducted on the UASpeech corpus suggest GAN-based data augmentation consistently outperforms fine-tuned Wav2vec2.0 and HuBERT models using no data augmentation and speed perturbation across different data expansion operating points by statistically significant word error rate (WER) reductions up to 2.01% and 0.96% absolute (9.03% and 4.63% relative) respectively on the UASpeech test set of 16 dysarthric speakers. After cross-system outputs rescoring, the best system produced the lowest published WER of 16.53% (46.47% on very low intelligibility) on UASpeech.Comment: To appear at IEEE ICASSP 202

arXiv.org e-Print Archive

The early high-energy afterglow emission from Short GRBs

Author: E. Nakar
HaoNing He
L. J. Gou
R. D. Blandford
R. Sari
X. Y. Wang
XiangYu Wang
Y. F. Huang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/08/2009
Field of study

We calculate the high energy afterglow emission from short Gamma-Ray Bursts (SGRBs) in the external shock model. There are two possible components contributing to the high energy afterglow: the electron synchrotron emission and the synchrotron self-Compton (SSC) emission. We find that for typical parameter values of SGRBs, the early high-energy afterglow emission in 10 MeV-10 GeV is dominated by the synchrotron emission. For a burst occurring at redshift z =0.1, the high-energy emission can be detectable by Fermi LAT if the blast wave has an energy E>=10^51 ergs and the fraction of energy in electrons is \epsilon_e>=0.1 . This provides a possible explanation for the high energy tail of SGRB 081024B.Comment: 5 pages, 5 figures. This is a slightly expanded version of the paper that will appear in Science in China Series

arXiv.org e-Print Archive

Crossref

Beyond the Obvious: Evaluating the Reasoning Ability In Real-life Scenarios of Language Models on Life Scapes Reasoning Benchmark~(LSR-Benchmark)

Author: Feng Hongwei
Gu Zhouhong
Huang Wenhao
Jiang Sihang
Li Zihan
Wang Jianchen
Wang Shusen
Wang Zili
Xiao Yanghua
Xiong Zhuozhi
Ye Haoning
Zhang Lin
Zhang Yikai
Zhu Xiaoxuan
Publication venue
Publication date: 11/07/2023
Field of study

This paper introduces the Life Scapes Reasoning Benchmark (LSR-Benchmark), a novel dataset targeting real-life scenario reasoning, aiming to close the gap in artificial neural networks' ability to reason in everyday contexts. In contrast to domain knowledge reasoning datasets, LSR-Benchmark comprises free-text formatted questions with rich information on real-life scenarios, human behaviors, and character roles. The dataset consists of 2,162 questions collected from open-source online sources and is manually annotated to improve its quality. Experiments are conducted using state-of-the-art language models, such as gpt3.5-turbo and instruction fine-tuned llama models, to test the performance in LSR-Benchmark. The results reveal that humans outperform these models significantly, indicating a persisting challenge for machine learning models in comprehending daily human life

arXiv.org e-Print Archive

Q-Bench: A Benchmark for General-Purpose Foundation Models on Low-level Vision

Author: Chen Chaofeng
Li Chunyi
Liao Liang
Lin Weisi
Sun Wenxiu
Wang Annan
Wu Haoning
Yan Qiong
Zhai Guangtao
Zhang Erli
Zhang Zicheng
Publication venue
Publication date: 28/09/2023
Field of study

The rapid evolution of Multi-modality Large Language Models (MLLMs) has catalyzed a shift in computer vision from specialized models to general-purpose foundation models. Nevertheless, there is still an inadequacy in assessing the abilities of MLLMs on low-level visual perception and understanding. To address this gap, we present Q-Bench, a holistic benchmark crafted to systematically evaluate potential abilities of MLLMs on three realms: low-level visual perception, low-level visual description, and overall visual quality assessment. a) To evaluate the low-level perception ability, we construct the LLVisionQA dataset, consisting of 2,990 diverse-sourced images, each equipped with a human-asked question focusing on its low-level attributes. We then measure the correctness of MLLMs on answering these questions. b) To examine the description ability of MLLMs on low-level information, we propose the LLDescribe dataset consisting of long expert-labelled golden low-level text descriptions on 499 images, and a GPT-involved comparison pipeline between outputs of MLLMs and the golden descriptions. c) Besides these two tasks, we further measure their visual quality assessment ability to align with human opinion scores. Specifically, we design a softmax-based strategy that enables MLLMs to predict quantifiable quality scores, and evaluate them on various existing image quality assessment (IQA) datasets. Our evaluation across the three abilities confirms that MLLMs possess preliminary low-level visual skills. However, these skills are still unstable and relatively imprecise, indicating the need for specific enhancements on MLLMs towards these abilities. We hope that our benchmark can encourage the research community to delve deeper to discover and enhance these untapped potentials of MLLMs. Project Page: https://vqassessment.github.io/Q-Bench.Comment: 25 pages, 14 figures, 9 tables, preprint versio

arXiv.org e-Print Archive

Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation

Author: Feng Hongwei
Gu Zhouhong
He Qianyu
Huang Wenhao
Jiang Sihang
Li Zihan
Wang Jianchen
Xiao Yanghua
Xiong Zhuozhi
Xu Rui
Ye Haoning
Zhang Lin
Zheng Weiguo
Zhu Xiaoxuan
Publication venue
Publication date: 09/06/2023
Field of study

New Natural Langauge Process~(NLP) benchmarks are urgently needed to align with the rapid development of large language models (LLMs). We present Xiezhi, the most comprehensive evaluation suite designed to assess holistic domain knowledge. Xiezhi comprises multiple-choice questions across 516 diverse disciplines ranging from 13 different subjects with 220,000 questions and accompanied by Xiezhi-Specialty and Xiezhi-Interdiscipline, both with 15k questions. We conduct evaluation of the 47 cutting-edge LLMs on Xiezhi. Results indicate that LLMs exceed average performance of humans in science, engineering, agronomy, medicine, and art, but fall short in economics, jurisprudence, pedagogy, literature, history, and management. We anticipate Xiezhi will help analyze important strengths and shortcomings of LLMs, and the benchmark is released in https://github.com/MikeGu721/XiezhiBenchmark .Comment: Under review of NeurIPS 202

arXiv.org e-Print Archive

Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Author: Chen Chaofeng
Hou Jingwen
Li Chunyi
Liao Liang
Lin Weisi
Sun Wenxiu
Wang Annan
Wu Haoning
Xu Kaixin
Xue Geng
Yan Qiong
Zhai Guangtao
Zhang Erli
Zhang Zicheng
Publication venue
Publication date: 12/11/2023
Field of study

Multi-modality foundation models, as represented by GPT-4V, have brought a new paradigm for low-level visual perception and understanding tasks, that can respond to a broad range of natural human instructions in a model. While existing foundation models have shown exciting potentials on low-level visual tasks, their related abilities are still preliminary and need to be improved. In order to enhance these models, we conduct a large-scale subjective experiment collecting a vast number of real human feedbacks on low-level vision. Each feedback follows a pathway that starts with a detailed description on the low-level visual appearance (*e.g. clarity, color, brightness* of an image, and ends with an overall conclusion, with an average length of 45 words. The constructed **Q-Pathway** dataset includes 58K detailed human feedbacks on 18,973 images with diverse low-level appearance. Moreover, to enable foundation models to robustly respond to diverse types of questions, we design a GPT-participated conversion to process these feedbacks into diverse-format 200K instruction-response pairs. Experimental results indicate that the **Q-Instruct** consistently elevates low-level perception and understanding abilities across several foundational models. We anticipate that our datasets can pave the way for a future that general intelligence can perceive, understand low-level visual appearance and evaluate visual quality like a human. Our dataset, model zoo, and demo is published at: https://q-future.github.io/Q-Instruct.Comment: 16 pages, 11 figures, page 12-16 as appendi

arXiv.org e-Print Archive