266 research outputs found

    Joint appearance and motion model for multi-class multi-object tracking

    Get PDF
    Model-free tracking is a widely-accepted approach to track an arbitrary object in a video using a single frame annotation with no further prior knowledge about the object of interest. Extending this problem to track multiple objects is really challenging because: a) the tracker is not aware of the objects’ type while trying to distinguish them from background (detection task) , and b) The tracker needs to distinguish one object from other potentially similar objects (data association task) to generate stable trajectories. In order to track multiple arbitrary objects, most existing model-free tracking approaches rely on tracking each target individually by updating their appearance model independently. Therefore, in this scenario they often fail to perform well due to confusion between the appearance of similar objects, their sudden appearance changes and occlusion. To tackle this problem, we propose to use both appearance and motion models, and to learn them jointly using graphical models and deep neural networks features. We introduce an indicator variable to predict sudden appearance change and/or occlusion. When these happen, our model does not update the appearance model thus avoiding using the background and/or incorrect object to update the appearance of the object of interest mistakenly, and relies on our motion model to track. Moreover, we consider the correlation among all targets, and seek the joint optimal locations for all targets simultaneously as a graphical model inference problem. We learn the joint parameters for both appearance model and motion model in an online fashion under the framework of LaRank. Experiment results show that our method outperforms the state-of-the-art.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 201

    Datasets for Large Language Models: A Comprehensive Survey

    Full text link
    This paper embarks on an exploration into the Large Language Model (LLM) datasets, which play a crucial role in the remarkable advancements of LLMs. The datasets serve as the foundational infrastructure analogous to a root system that sustains and nurtures the development of LLMs. Consequently, examination of these datasets emerges as a critical topic in research. In order to address the current lack of a comprehensive overview and thorough analysis of LLM datasets, and to gain insights into their current status and future trends, this survey consolidates and categorizes the fundamental aspects of LLM datasets from five perspectives: (1) Pre-training Corpora; (2) Instruction Fine-tuning Datasets; (3) Preference Datasets; (4) Evaluation Datasets; (5) Traditional Natural Language Processing (NLP) Datasets. The survey sheds light on the prevailing challenges and points out potential avenues for future investigation. Additionally, a comprehensive review of the existing available dataset resources is also provided, including statistics from 444 datasets, covering 8 language categories and spanning 32 domains. Information from 20 dimensions is incorporated into the dataset statistics. The total data size surveyed surpasses 774.5 TB for pre-training corpora and 700M instances for other datasets. We aim to present the entire landscape of LLM text datasets, serving as a comprehensive reference for researchers in this field and contributing to future studies. Related resources are available at: https://github.com/lmmlzn/Awesome-LLMs-Datasets.Comment: 181 pages, 21 figure

    DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks

    Full text link
    Document image restoration is a crucial aspect of Document AI systems, as the quality of document images significantly influences the overall performance. Prevailing methods address distinct restoration tasks independently, leading to intricate systems and the incapability to harness the potential synergies of multi-task learning. To overcome this challenge, we propose DocRes, a generalist model that unifies five document image restoration tasks including dewarping, deshadowing, appearance enhancement, deblurring, and binarization. To instruct DocRes to perform various restoration tasks, we propose a novel visual prompt approach called Dynamic Task-Specific Prompt (DTSPrompt). The DTSPrompt for different tasks comprises distinct prior features, which are additional characteristics extracted from the input image. Beyond its role as a cue for task-specific execution, DTSPrompt can also serve as supplementary information to enhance the model's performance. Moreover, DTSPrompt is more flexible than prior visual prompt approaches as it can be seamlessly applied and adapted to inputs with high and variable resolutions. Experimental results demonstrate that DocRes achieves competitive or superior performance compared to existing state-of-the-art task-specific models. This underscores the potential of DocRes across a broader spectrum of document image restoration tasks. The source code is publicly available at https://github.com/ZZZHANG-jx/DocResComment: Accepted by CVPR 202

    UPOCR: Towards Unified Pixel-Level OCR Interface

    Full text link
    In recent years, the optical character recognition (OCR) field has been proliferating with plentiful cutting-edge approaches for a wide spectrum of tasks. However, these approaches are task-specifically designed with divergent paradigms, architectures, and training strategies, which significantly increases the complexity of research and maintenance and hinders the fast deployment in applications. To this end, we propose UPOCR, a simple-yet-effective generalist model for Unified Pixel-level OCR interface. Specifically, the UPOCR unifies the paradigm of diverse OCR tasks as image-to-image transformation and the architecture as a vision Transformer (ViT)-based encoder-decoder. Learnable task prompts are introduced to push the general feature representations extracted by the encoder toward task-specific spaces, endowing the decoder with task awareness. Moreover, the model training is uniformly aimed at minimizing the discrepancy between the generated and ground-truth images regardless of the inhomogeneity among tasks. Experiments are conducted on three pixel-level OCR tasks including text removal, text segmentation, and tampered text detection. Without bells and whistles, the experimental results showcase that the proposed method can simultaneously achieve state-of-the-art performance on three tasks with a unified single model, which provides valuable strategies and insights for future research on generalist OCR models. Code will be publicly available

    Exploring OCR Capabilities of GPT-4V(ision) : A Quantitative and In-depth Evaluation

    Full text link
    This paper presents a comprehensive evaluation of the Optical Character Recognition (OCR) capabilities of the recently released GPT-4V(ision), a Large Multimodal Model (LMM). We assess the model's performance across a range of OCR tasks, including scene text recognition, handwritten text recognition, handwritten mathematical expression recognition, table structure recognition, and information extraction from visually-rich document. The evaluation reveals that GPT-4V performs well in recognizing and understanding Latin contents, but struggles with multilingual scenarios and complex tasks. Specifically, it showed limitations when dealing with non-Latin languages and complex tasks such as handwriting mathematical expression recognition, table structure recognition, and end-to-end semantic entity recognition and pair extraction from document image. Based on these observations, we affirm the necessity and continued research value of specialized OCR models. In general, despite its versatility in handling diverse OCR tasks, GPT-4V does not outperform existing state-of-the-art OCR models. How to fully utilize pre-trained general-purpose LMMs such as GPT-4V for OCR downstream tasks remains an open problem. The study offers a critical reference for future research in OCR with LMMs. Evaluation pipeline and results are available at https://github.com/SCUT-DLVCLab/GPT-4V_OCR

    Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

    Full text link
    Visual information extraction (VIE) has attracted considerable attention recently owing to its various advanced applications such as document understanding, automatic marking and intelligent education. Most existing works decoupled this problem into several independent sub-tasks of text spotting (text detection and recognition) and information extraction, which completely ignored the high correlation among them during optimization. In this paper, we propose a robust visual information extraction system (VIES) towards real-world scenarios, which is a unified end-to-end trainable framework for simultaneous text detection, recognition and information extraction by taking a single document image as input and outputting the structured information. Specifically, the information extraction branch collects abundant visual and semantic representations from text spotting for multimodal feature fusion and conversely, provides higher-level semantic clues to contribute to the optimization of text spotting. Moreover, regarding the shortage of public benchmarks, we construct a fully-annotated dataset called EPHOIE (https://github.com/HCIILAB/EPHOIE), which is the first Chinese benchmark for both text spotting and visual information extraction. EPHOIE consists of 1,494 images of examination paper head with complex layouts and background, including a total of 15,771 Chinese handwritten or printed text instances. Compared with the state-of-the-art methods, our VIES shows significant superior performance on the EPHOIE dataset and achieves a 9.01% F-score gain on the widely used SROIE dataset under the end-to-end scenario.Comment: 8 pages, 5 figures, to be published in AAAI 202

    Prognostic Significance of Serum Cysteine-Rich Protein 61 in Patients with Acute Heart Failure

    Get PDF
    Background/Aims: Cyr61-cysteine-rich protein 61 (CCN1/CYR61) is a multifunctional matricellular protein involved in the regulation of fibrogenesis. Animal experiments have demonstrated that CCN1 can inhibit cardiac fibrosis in cardiac hypertrophy. However, no study has been conducted to assess the relation between serum CCN1 and prognosis of acute heart failure (AHF). Methods: We measured the serum CCN1 levels of 183 patients with AHF, and the patients were followed up for 6 months. The associations between CCN1 levels and some clinical covariates, especially left ventricular ejection fraction (LVEF), estimated glomerular filtration rate (eGFR), atrial fibrillation and age, were estimated. The AHF patients were followed up for 6 months. The endpoint was all-cause mortality. Kaplan-Meier curve analysis and multivariable Cox proportional hazards analysis were employed to evaluate the prognostic ability of CCN1. We used calibration, discrimination and reclassification to assess the mortality risk prediction of adding CCN1. Results: Serum CCN1 concentrations in AHF patients were significantly increased compared with those in individuals without AHF (237 pg/ml vs. 124.8 pg/ml, p< 0.001). CCN1 level was associated with the level of NT-proBNP (r=0.349, p< 0.001) and was not affected by LVEF, eGFR, age or atrial fibrillation in AHF patients. Importantly, Kaplan-Meier curve analysis illustrated that the AHF patients with serum CCN1 level > 260 pg/ ml had a lower survival rate (p< 0.001). Multivariate Cox hazard analysis suggests that CCN1 functions as an independent predictor of mortality for AHF patients (LgCCN1, hazard ratio 5.825, 95% confidence interval: 1.828-18.566, p=0.003). In addition, the inclusion of CCN1 in the model with NT-proBNP significantly improved the C-statistic for predicting death (0.758, p< 0.001). The integrated discrimination index was 0.019 (p< 0.001), and the net reclassification index increased significantly after addition of CCN1 (23.9%, p=0.0179). Conclusions: CCN1 is strongly predictive of 6-month mortality in patients with AHF, suggesting serum CCN1 as a promising candidate prognostic biomarker for AHF patients

    Identification of the U-box gene family in peach (Prunus persica) and functional analysis of PpPUB20 in response to salt stress

    Get PDF
    BackgroundWith the rising proportion of saline soils in the global irrigated soil area, improving salt stress tolerance in peach is of great significance and value for the development of peach industry. Plant U-box proteins (PUBs) are widely involved in various stress response processes. In this study, genome-wide identification and analysis of PUBs genes in cultivated peach were carried out, and the expression profiles of peach PUB genes in different tissues of peach as well as their responses under salt stress were also investigated.MethodsThe genome-wide identification of PUBs genes in cultivated peach was analysed by gene localisation, gene structure and evolutionary analysis. Subsequently, the expression profiles of PpPUB genes in different tissues of peach and the changes in the relative expression of peach PUB genes under ABA, GA3, IAA, 6-BA treatments, low-temperature stress and salt stress were investigated.Results and discussionIn this study, 51 U-box protein genes (PUB) were identified in the cultivated peach “SJZX” and divided into six groups. Most of the PpPUB were predicted to be located in the nucleus and chloroplasts. Promoter analyses indicated that most members may be associated with lightresponsive processes. Expression analysis based on RT-qPCR showed that most PUB members in peach were highly expressed in a certain tissues or organs. Based on the results of RT-qPCR expression analysis of 18 representative PpPUB after abiotic stress and hormone induction, all detected genes except for PpPUB19 were induced by salt stress, and PpPUB3/20/23/49 were induced by low temperature. Multiple genes were induced or repressed by exogenous hormone treatments. Furthermore, Arabidopsis seedlings heterologously overexpressing PpPUB20 exhibited greater salt tolerance than wild-type seedlings under the same salt stress conditions. These findings provide comprehensive information on the PpPUB family and identify PpPUB members that may be involved in the regulation of hormones and salt stress. Therefore, this study enhances the understanding of potential role of PpPUB in stress adaptation in peach, thereby establishing a foundation for subsequent functional investigations and applications in stress-resistant crop breeding

    Construction and Evaluation of the Brucella Double Gene Knock-out Vaccine Strain MB6 Δbp26ΔwboA (RM6)

    Get PDF
    Brucellosis is a serious zoonotic infection worldwide. To date, vaccination is the most effective measure against brucellosis. This study was aimed at obtaining a vaccine strain that has high protective efficacy and low toxicity, and allows vaccination to be differentiated from infection. Using homologous recombination, we constructed a double gene-deletion Brucella strain MB6 Δbp26ΔwboA (RM6) and evaluated its characteristics, safety and efficacy. The RM6 strain had good proliferative ability and stable biological characteristics in vivo and in vitro. Moreover, it had a favorable safety profile and elicited specific immune responses in mice and sheep. The RM6 strain may have substantial practical application value
    corecore