50 research outputs found

    AnomalyGPT: Detecting Industrial Anomalies using Large Vision-Language Models

    Full text link
    Large Vision-Language Models (LVLMs) such as MiniGPT-4 and LLaVA have demonstrated the capability of understanding images and achieved remarkable performance in various visual tasks. Despite their strong abilities in recognizing common objects due to extensive training datasets, they lack specific domain knowledge and have a weaker understanding of localized details within objects, which hinders their effectiveness in the Industrial Anomaly Detection (IAD) task. On the other hand, most existing IAD methods only provide anomaly scores and necessitate the manual setting of thresholds to distinguish between normal and abnormal samples, which restricts their practical implementation. In this paper, we explore the utilization of LVLM to address the IAD problem and propose AnomalyGPT, a novel IAD approach based on LVLM. We generate training data by simulating anomalous images and producing corresponding textual descriptions for each image. We also employ an image decoder to provide fine-grained semantic and design a prompt learner to fine-tune the LVLM using prompt embeddings. Our AnomalyGPT eliminates the need for manual threshold adjustments, thus directly assesses the presence and locations of anomalies. Additionally, AnomalyGPT supports multi-turn dialogues and exhibits impressive few-shot in-context learning capabilities. With only one normal shot, AnomalyGPT achieves the state-of-the-art performance with an accuracy of 86.1%, an image-level AUC of 94.1%, and a pixel-level AUC of 95.3% on the MVTec-AD dataset. Code is available at https://github.com/CASIA-IVA-Lab/AnomalyGPT.Comment: Project page: https://anomalygpt.github.i

    Feature Distilled Tracking

    Get PDF
    Feature extraction and representation is one of the most important components for fast, accurate, and robust visual tracking. Very deep convolutional neural networks (CNNs) provide effective tools for feature extraction with good generalization ability. However, extracting features using very deep CNN models needs high performance hardware due to its large computation complexity, which prohibits its extensions in real-time applications. To alleviate this problem, we aim at obtaining small and fast-to-execute shallow models based on model compression for visual tracking. Specifically, we propose a small feature distilled network (FDN) for tracking by imitating the intermediate representations of a much deeper network. The FDN extracts rich visual features with higher speed than the original deeper network. To further speed-up, we introduce a shift-and-stitch method to reduce the arithmetic operations, while preserving the spatial resolution of the distilled feature maps unchanged. Finally, a scale adaptive discriminative correlation filter is learned on the distilled feature for visual tracking to handle scale variation of the target. Comprehensive experimental results on object tracking benchmark datasets show that the proposed approach achieves 5x speed-up with competitive performance to the state-of-the-art deep trackers

    ChineseWebText: Large-scale High-quality Chinese Web Text Extracted with Effective Evaluation Model

    Full text link
    During the development of large language models (LLMs), the scale and quality of the pre-training data play a crucial role in shaping LLMs' capabilities. To accelerate the research of LLMs, several large-scale datasets, such as C4 [1], Pile [2], RefinedWeb [3] and WanJuan [4], have been released to the public. However, most of the released corpus focus mainly on English, and there is still lack of complete tool-chain for extracting clean texts from web data. Furthermore, fine-grained information of the corpus, e.g. the quality of each text, is missing. To address these challenges, we propose in this paper a new complete tool-chain EvalWeb to extract Chinese clean texts from noisy web data. First, similar to previous work, manually crafted rules are employed to discard explicit noisy texts from the raw crawled web contents. Second, a well-designed evaluation model is leveraged to assess the remaining relatively clean data, and each text is assigned a specific quality score. Finally, we can easily utilize an appropriate threshold to select the high-quality pre-training data for Chinese. Using our proposed approach, we release the largest and latest large-scale high-quality Chinese web text ChineseWebText, which consists of 1.42 TB and each text is associated with a quality score, facilitating the LLM researchers to choose the data according to the desired quality thresholds. We also release a much cleaner subset of 600 GB Chinese data with the quality exceeding 90%

    The 3rd Anti-UAV Workshop & Challenge: Methods and Results

    Full text link
    The 3rd Anti-UAV Workshop & Challenge aims to encourage research in developing novel and accurate methods for multi-scale object tracking. The Anti-UAV dataset used for the Anti-UAV Challenge has been publicly released. There are two main differences between this year's competition and the previous two. First, we have expanded the existing dataset, and for the first time, released a training set so that participants can focus on improving their models. Second, we set up two tracks for the first time, i.e., Anti-UAV Tracking and Anti-UAV Detection & Tracking. Around 76 participating teams from the globe competed in the 3rd Anti-UAV Challenge. In this paper, we provide a brief summary of the 3rd Anti-UAV Workshop & Challenge including brief introductions to the top three methods in each track. The submission leaderboard will be reopened for researchers that are interested in the Anti-UAV challenge. The benchmark dataset and other information can be found at: https://anti-uav.github.io/.Comment: Technical report for 3rd Anti-UAV Workshop and Challenge. arXiv admin note: text overlap with arXiv:2108.0990

    Перспективы развития фундаментальных наук: сборник научных трудов XII Международной конференция студентов и молодых ученых, г. Томск, 21-24 апреля 2015 г.

    Get PDF
    Сборник содержит труды участников XII Международной конференции студентов и молодых учёных «Перспективы развития фундаментальных наук». Включает доклады студентов и молодых ученых, представленные на секциях «Физика», «Химия», «Математика», «Биология и медицина», «Наноматериалы и нанотехнологии», «Технология», «Конкурс архитектурных работ», «IT- технологии и электроника». Предназначен для студентов, аспирантов, молодых ученых, преподавателей в области естественных наук и высшей математики

    The Thermal Infrared Visual Object Tracking VOT-TIR2015 challenge results

    Get PDF
    The Thermal Infrared Visual Object Tracking challenge 2015, VOT-TIR2015, aims at comparing short-term single-object visual trackers that work on thermal infrared (TIR) sequences and do not apply pre-learned models of object appearance. VOT-TIR2015 is the first benchmark on short-term tracking in TIR sequences. Results of 24 trackers are presented. For each participating tracker, a short description is provided in the appendix. The VOT-TIR2015 challenge is based on the VOT2013 challenge, but introduces the following novelties: (i) the newly collected LTIR (Link - ping TIR) dataset is used, (ii) the VOT2013 attributes are adapted to TIR data, (iii) the evaluation is performed using insights gained during VOT2013 and VOT2014 and is similar to VOT2015

    Foreground Removal Approach for Hole Filling in 3D Video and FVV Synthesis

    No full text

    Part Context Learning for Visual Tracking

    No full text
    Abstract Context information is widely used in computer vision for tracking arbitrary objects. Most existing works focus on how to distinguish the tracked object from background or inter-frame object similarity information or key-points supporters as their auxiliary information to assist them in tracking. However, in most cases, how to discover and represent both the intrinsic property inside the object and surrounding information is still an open problem. In this paper, we propose a unified context learning framework that can capture stable structure relations of in-object parts, context parts and the object itself to enhance the tracker's performance. The proposed Part Context Tracker (PCT) consists of an appearance model, an internal relation model and an context relation model. The appearance model represents the appearances of the object and parts. The internal relation model utilizes the parts inside the object to describe the spatio-temporal structure property directly, while the context relation model takes advantage of the latent intersection between the object and background parts. Then the appearance model, internal relation model and context relation model are embedded in a max-margin structured learning framework. Furthermore, a simple robust update strategy using median filter is utilized, which can deal with appearance change effectively and alleviate the drift problem. Extensive experiments are conducted on various benchmark dataset, and the comparisons with state-of-the-arts demonstrate the effectiveness of our work
    corecore