266 research outputs found
Joint appearance and motion model for multi-class multi-object tracking
Model-free tracking is a widely-accepted approach to track an arbitrary object in a video using a single frame annotation with no further prior knowledge about the object of interest. Extending this problem to track multiple objects is really challenging because: a) the tracker is not aware of the objects’ type while trying to distinguish them from background (detection task) , and b) The tracker needs to distinguish one object from other potentially similar objects (data association task) to generate stable trajectories. In order to track multiple arbitrary objects, most existing model-free tracking approaches rely on tracking each target individually by updating their appearance model independently. Therefore, in this scenario they often fail to perform well due to confusion between the appearance of similar objects, their sudden appearance changes and occlusion. To tackle this problem, we propose to use both appearance and motion models, and to learn them jointly using graphical models and deep neural networks features. We introduce an indicator variable to predict sudden appearance change and/or occlusion. When these happen, our model does not update the appearance model thus avoiding using the background and/or incorrect object to update the appearance of the object of interest mistakenly, and relies on our motion model to track. Moreover, we consider the correlation among all targets, and seek the joint optimal locations for all targets simultaneously as a graphical model inference problem. We learn the joint parameters for both appearance model and motion model in an online fashion under the framework of LaRank. Experiment results show that our method outperforms the state-of-the-art.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 201
Datasets for Large Language Models: A Comprehensive Survey
This paper embarks on an exploration into the Large Language Model (LLM)
datasets, which play a crucial role in the remarkable advancements of LLMs. The
datasets serve as the foundational infrastructure analogous to a root system
that sustains and nurtures the development of LLMs. Consequently, examination
of these datasets emerges as a critical topic in research. In order to address
the current lack of a comprehensive overview and thorough analysis of LLM
datasets, and to gain insights into their current status and future trends,
this survey consolidates and categorizes the fundamental aspects of LLM
datasets from five perspectives: (1) Pre-training Corpora; (2) Instruction
Fine-tuning Datasets; (3) Preference Datasets; (4) Evaluation Datasets; (5)
Traditional Natural Language Processing (NLP) Datasets. The survey sheds light
on the prevailing challenges and points out potential avenues for future
investigation. Additionally, a comprehensive review of the existing available
dataset resources is also provided, including statistics from 444 datasets,
covering 8 language categories and spanning 32 domains. Information from 20
dimensions is incorporated into the dataset statistics. The total data size
surveyed surpasses 774.5 TB for pre-training corpora and 700M instances for
other datasets. We aim to present the entire landscape of LLM text datasets,
serving as a comprehensive reference for researchers in this field and
contributing to future studies. Related resources are available at:
https://github.com/lmmlzn/Awesome-LLMs-Datasets.Comment: 181 pages, 21 figure
DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks
Document image restoration is a crucial aspect of Document AI systems, as the
quality of document images significantly influences the overall performance.
Prevailing methods address distinct restoration tasks independently, leading to
intricate systems and the incapability to harness the potential synergies of
multi-task learning. To overcome this challenge, we propose DocRes, a
generalist model that unifies five document image restoration tasks including
dewarping, deshadowing, appearance enhancement, deblurring, and binarization.
To instruct DocRes to perform various restoration tasks, we propose a novel
visual prompt approach called Dynamic Task-Specific Prompt (DTSPrompt). The
DTSPrompt for different tasks comprises distinct prior features, which are
additional characteristics extracted from the input image. Beyond its role as a
cue for task-specific execution, DTSPrompt can also serve as supplementary
information to enhance the model's performance. Moreover, DTSPrompt is more
flexible than prior visual prompt approaches as it can be seamlessly applied
and adapted to inputs with high and variable resolutions. Experimental results
demonstrate that DocRes achieves competitive or superior performance compared
to existing state-of-the-art task-specific models. This underscores the
potential of DocRes across a broader spectrum of document image restoration
tasks. The source code is publicly available at
https://github.com/ZZZHANG-jx/DocResComment: Accepted by CVPR 202
UPOCR: Towards Unified Pixel-Level OCR Interface
In recent years, the optical character recognition (OCR) field has been
proliferating with plentiful cutting-edge approaches for a wide spectrum of
tasks. However, these approaches are task-specifically designed with divergent
paradigms, architectures, and training strategies, which significantly
increases the complexity of research and maintenance and hinders the fast
deployment in applications. To this end, we propose UPOCR, a
simple-yet-effective generalist model for Unified Pixel-level OCR interface.
Specifically, the UPOCR unifies the paradigm of diverse OCR tasks as
image-to-image transformation and the architecture as a vision Transformer
(ViT)-based encoder-decoder. Learnable task prompts are introduced to push the
general feature representations extracted by the encoder toward task-specific
spaces, endowing the decoder with task awareness. Moreover, the model training
is uniformly aimed at minimizing the discrepancy between the generated and
ground-truth images regardless of the inhomogeneity among tasks. Experiments
are conducted on three pixel-level OCR tasks including text removal, text
segmentation, and tampered text detection. Without bells and whistles, the
experimental results showcase that the proposed method can simultaneously
achieve state-of-the-art performance on three tasks with a unified single
model, which provides valuable strategies and insights for future research on
generalist OCR models. Code will be publicly available
Exploring OCR Capabilities of GPT-4V(ision) : A Quantitative and In-depth Evaluation
This paper presents a comprehensive evaluation of the Optical Character
Recognition (OCR) capabilities of the recently released GPT-4V(ision), a Large
Multimodal Model (LMM). We assess the model's performance across a range of OCR
tasks, including scene text recognition, handwritten text recognition,
handwritten mathematical expression recognition, table structure recognition,
and information extraction from visually-rich document. The evaluation reveals
that GPT-4V performs well in recognizing and understanding Latin contents, but
struggles with multilingual scenarios and complex tasks. Specifically, it
showed limitations when dealing with non-Latin languages and complex tasks such
as handwriting mathematical expression recognition, table structure
recognition, and end-to-end semantic entity recognition and pair extraction
from document image. Based on these observations, we affirm the necessity and
continued research value of specialized OCR models. In general, despite its
versatility in handling diverse OCR tasks, GPT-4V does not outperform existing
state-of-the-art OCR models. How to fully utilize pre-trained general-purpose
LMMs such as GPT-4V for OCR downstream tasks remains an open problem. The study
offers a critical reference for future research in OCR with LMMs. Evaluation
pipeline and results are available at
https://github.com/SCUT-DLVCLab/GPT-4V_OCR
Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution
Visual information extraction (VIE) has attracted considerable attention
recently owing to its various advanced applications such as document
understanding, automatic marking and intelligent education. Most existing works
decoupled this problem into several independent sub-tasks of text spotting
(text detection and recognition) and information extraction, which completely
ignored the high correlation among them during optimization. In this paper, we
propose a robust visual information extraction system (VIES) towards real-world
scenarios, which is a unified end-to-end trainable framework for simultaneous
text detection, recognition and information extraction by taking a single
document image as input and outputting the structured information.
Specifically, the information extraction branch collects abundant visual and
semantic representations from text spotting for multimodal feature fusion and
conversely, provides higher-level semantic clues to contribute to the
optimization of text spotting. Moreover, regarding the shortage of public
benchmarks, we construct a fully-annotated dataset called EPHOIE
(https://github.com/HCIILAB/EPHOIE), which is the first Chinese benchmark for
both text spotting and visual information extraction. EPHOIE consists of 1,494
images of examination paper head with complex layouts and background, including
a total of 15,771 Chinese handwritten or printed text instances. Compared with
the state-of-the-art methods, our VIES shows significant superior performance
on the EPHOIE dataset and achieves a 9.01% F-score gain on the widely used
SROIE dataset under the end-to-end scenario.Comment: 8 pages, 5 figures, to be published in AAAI 202
Prognostic Significance of Serum Cysteine-Rich Protein 61 in Patients with Acute Heart Failure
Background/Aims: Cyr61-cysteine-rich protein 61 (CCN1/CYR61) is a multifunctional matricellular protein involved in the regulation of fibrogenesis. Animal experiments have demonstrated that CCN1 can inhibit cardiac fibrosis in cardiac hypertrophy. However, no study has been conducted to assess the relation between serum CCN1 and prognosis of acute heart failure (AHF). Methods: We measured the serum CCN1 levels of 183 patients with AHF, and the patients were followed up for 6 months. The associations between CCN1 levels and some clinical covariates, especially left ventricular ejection fraction (LVEF), estimated glomerular filtration rate (eGFR), atrial fibrillation and age, were estimated. The AHF patients were followed up for 6 months. The endpoint was all-cause mortality. Kaplan-Meier curve analysis and multivariable Cox proportional hazards analysis were employed to evaluate the prognostic ability of CCN1. We used calibration, discrimination and reclassification to assess the mortality risk prediction of adding CCN1. Results: Serum CCN1 concentrations in AHF patients were significantly increased compared with those in individuals without AHF (237 pg/ml vs. 124.8 pg/ml, p< 0.001). CCN1 level was associated with the level of NT-proBNP (r=0.349, p< 0.001) and was not affected by LVEF, eGFR, age or atrial fibrillation in AHF patients. Importantly, Kaplan-Meier curve analysis illustrated that the AHF patients with serum CCN1 level > 260 pg/ ml had a lower survival rate (p< 0.001). Multivariate Cox hazard analysis suggests that CCN1 functions as an independent predictor of mortality for AHF patients (LgCCN1, hazard ratio 5.825, 95% confidence interval: 1.828-18.566, p=0.003). In addition, the inclusion of CCN1 in the model with NT-proBNP significantly improved the C-statistic for predicting death (0.758, p< 0.001). The integrated discrimination index was 0.019 (p< 0.001), and the net reclassification index increased significantly after addition of CCN1 (23.9%, p=0.0179). Conclusions: CCN1 is strongly predictive of 6-month mortality in patients with AHF, suggesting serum CCN1 as a promising candidate prognostic biomarker for AHF patients
Identification of the U-box gene family in peach (Prunus persica) and functional analysis of PpPUB20 in response to salt stress
BackgroundWith the rising proportion of saline soils in the global irrigated soil area, improving salt stress tolerance in peach is of great significance and value for the development of peach industry. Plant U-box proteins (PUBs) are widely involved in various stress response processes. In this study, genome-wide identification and analysis of PUBs genes in cultivated peach were carried out, and the expression profiles of peach PUB genes in different tissues of peach as well as their responses under salt stress were also investigated.MethodsThe genome-wide identification of PUBs genes in cultivated peach was analysed by gene localisation, gene structure and evolutionary analysis. Subsequently, the expression profiles of PpPUB genes in different tissues of peach and the changes in the relative expression of peach PUB genes under ABA, GA3, IAA, 6-BA treatments, low-temperature stress and salt stress were investigated.Results and discussionIn this study, 51 U-box protein genes (PUB) were identified in the cultivated peach “SJZX” and divided into six groups. Most of the PpPUB were predicted to be located in the nucleus and chloroplasts. Promoter analyses indicated that most members may be associated with lightresponsive processes. Expression analysis based on RT-qPCR showed that most PUB members in peach were highly expressed in a certain tissues or organs. Based on the results of RT-qPCR expression analysis of 18 representative PpPUB after abiotic stress and hormone induction, all detected genes except for PpPUB19 were induced by salt stress, and PpPUB3/20/23/49 were induced by low temperature. Multiple genes were induced or repressed by exogenous hormone treatments. Furthermore, Arabidopsis seedlings heterologously overexpressing PpPUB20 exhibited greater salt tolerance than wild-type seedlings under the same salt stress conditions. These findings provide comprehensive information on the PpPUB family and identify PpPUB members that may be involved in the regulation of hormones and salt stress. Therefore, this study enhances the understanding of potential role of PpPUB in stress adaptation in peach, thereby establishing a foundation for subsequent functional investigations and applications in stress-resistant crop breeding
Construction and Evaluation of the Brucella Double Gene Knock-out Vaccine Strain MB6 Δbp26ΔwboA (RM6)
Brucellosis is a serious zoonotic infection worldwide. To date, vaccination is the most effective measure against brucellosis. This study was aimed at obtaining a vaccine strain that has high protective efficacy and low toxicity, and allows vaccination to be differentiated from infection. Using homologous recombination, we constructed a double gene-deletion Brucella strain MB6 Δbp26ΔwboA (RM6) and evaluated its characteristics, safety and efficacy. The RM6 strain had good proliferative ability and stable biological characteristics in vivo and in vitro. Moreover, it had a favorable safety profile and elicited specific immune responses in mice and sheep. The RM6 strain may have substantial practical application value
- …
