40 research outputs found
Exploring Bottom-up and Top-down Cues with Attentive Learning for Webly Supervised Object Detection
Fully supervised object detection has achieved great success in recent years.
However, abundant bounding boxes annotations are needed for training a detector
for novel classes. To reduce the human labeling effort, we propose a novel
webly supervised object detection (WebSOD) method for novel classes which only
requires the web images without further annotations. Our proposed method
combines bottom-up and top-down cues for novel class detection. Within our
approach, we introduce a bottom-up mechanism based on the well-trained fully
supervised object detector (i.e. Faster RCNN) as an object region estimator for
web images by recognizing the common objectiveness shared by base and novel
classes. With the estimated regions on the web images, we then utilize the
top-down attention cues as the guidance for region classification. Furthermore,
we propose a residual feature refinement (RFR) block to tackle the domain
mismatch between web domain and the target domain. We demonstrate our proposed
method on PASCAL VOC dataset with three different novel/base splits. Without
any target-domain novel-class images and annotations, our proposed webly
supervised object detection model is able to achieve promising performance for
novel classes. Moreover, we also conduct transfer learning experiments on large
scale ILSVRC 2013 detection dataset and achieve state-of-the-art performance
PGDiff: Guiding Diffusion Models for Versatile Face Restoration via Partial Guidance
Exploiting pre-trained diffusion models for restoration has recently become a
favored alternative to the traditional task-specific training approach.
Previous works have achieved noteworthy success by limiting the solution space
using explicit degradation models. However, these methods often fall short when
faced with complex degradations as they generally cannot be precisely modeled.
In this paper, we propose PGDiff by introducing partial guidance, a fresh
perspective that is more adaptable to real-world degradations compared to
existing works. Rather than specifically defining the degradation process, our
approach models the desired properties, such as image structure and color
statistics of high-quality images, and applies this guidance during the reverse
diffusion process. These properties are readily available and make no
assumptions about the degradation process. When combined with a diffusion
prior, this partial guidance can deliver appealing results across a range of
restoration tasks. Additionally, PGDiff can be extended to handle composite
tasks by consolidating multiple high-quality image properties, achieved by
integrating the guidance from respective tasks. Experimental results
demonstrate that our method not only outperforms existing diffusion-prior-based
approaches but also competes favorably with task-specific models.Comment: GitHub: https://github.com/pq-yang/PGDif
3D Unet-based Kidney and Kidney Tumer Segmentation with Attentive Feature Learning
To study the kidney diseases and kidney tumor from Computed Tomography(CT) imaging data, it is helpful to segment the region of interest through computer aided auto-segmentation tool. In the KiTs 2019 challenge [1], we are provided 3D volumetric CT data to train a model for kidney and kidney tumor segmentation. We introduce an improved deep 3D Unet by enriching the feature representation in CT images using an attention module. We achieve 1.5% improvement in the segmentation accuracy when evaluated on the validation set
Sterically Induced Binding Selectivity of Single m-Terphenyl Isocyanide Ligands
Sterically encumbering m-terphenyl isocyanides are a class of metal-binding
group that foster low-coordinate metal-center environments in coordination
chemistry by exerting considerable intermolecular steric pressures between
neighboring ligands. In the context of metal surfaces, the encumbering steric
properties of the m-terphenyl isocyanides are shown to weaken the interaction
between the metal-binding group and a planar substrate, leading to a preference
for molecular adsorption at sites with convex curvature, such as the step edges
and herringbone elbow sites on Au(111). Here, we investigate the site-selective
binding of individual m-terphenyl isocyanide ligands on a Au(111) surface
through scanning tunneling microscopy (STM) and inelastic electron tunneling
spectroscopy (IETS). The site-dependent steric pressure alters the vibrational
fingerprint of the m-terphenyl isocyanides, which is characterized with
single-molecule precision through joint experimental and theoretical
approaches. This study for the first time provides molecular-level insights
into the steric-pressure-enabled surface binding selectivity as well as its
effect on the chemical properties of individual m-terphenyl isocyanide ligands,
thereby highlighting the potential to control the physical and chemical
properties of metal surfaces through tailored ligand design
Dual Semantic Fusion Network for Video Object Detection
Video object detection is a tough task due to the deteriorated quality of
video sequences captured under complex environments. Currently, this area is
dominated by a series of feature enhancement based methods, which distill
beneficial semantic information from multiple frames and generate enhanced
features through fusing the distilled information. However, the distillation
and fusion operations are usually performed at either frame level or instance
level with external guidance using additional information, such as optical flow
and feature memory. In this work, we propose a dual semantic fusion network
(abbreviated as DSFNet) to fully exploit both frame-level and instance-level
semantics in a unified fusion framework without external guidance. Moreover, we
introduce a geometric similarity measure into the fusion process to alleviate
the influence of information distortion caused by noise. As a result, the
proposed DSFNet can generate more robust features through the multi-granularity
fusion and avoid being affected by the instability of external guidance. To
evaluate the proposed DSFNet, we conduct extensive experiments on the ImageNet
VID dataset. Notably, the proposed dual semantic fusion network achieves, to
the best of our knowledge, the best performance of 84.1\% mAP among the current
state-of-the-art video object detectors with ResNet-101 and 85.4\% mAP with
ResNeXt-101 without using any post-processing steps.Comment: 9 pages,6 figure
TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT
Tables are prevalent in real-world databases, requiring significant time and
effort for humans to analyze and manipulate. The advancements in large language
models (LLMs) have made it possible to interact with tables using natural
language input, bringing this capability closer to reality. In this paper, we
present TableGPT, a unified fine-tuned framework that enables LLMs to
understand and operate on tables using external functional commands. It
introduces the capability to seamlessly interact with tables, enabling a wide
range of functionalities such as question answering, data manipulation (e.g.,
insert, delete, query, and modify operations), data visualization, analysis
report generation, and automated prediction. TableGPT aims to provide
convenience and accessibility to users by empowering them to effortlessly
leverage tabular data. At the core of TableGPT lies the novel concept of global
tabular representations, which empowers LLMs to gain a comprehensive
understanding of the entire table beyond meta-information. By jointly training
LLMs on both table and text modalities, TableGPT achieves a deep understanding
of tabular data and the ability to perform complex operations on tables through
chain-of-command instructions. Importantly, TableGPT offers the advantage of
being a self-contained system rather than relying on external API interfaces.
Moreover, it supports efficient data process flow, query rejection (when
appropriate) and private deployment, enabling faster domain data fine-tuning
and ensuring data privacy, which enhances the framework's adaptability to
specific use cases.Comment: Technical Repor
Learning to recognize objects by adaptive knowledge transfer
When humans learn new knowledge and skills, we can naturally transfer them to other domains. Along with the learning procedures, we learn knowledge and skills for certain tasks and transfer them to similar tasks; we also can use the old knowledge to facilitate the learning of new knowledge. While effective knowledge transfer is a congenital and important learning ability of humans, it is not easy for machine learning mechanisms to adopt the ability of knowledge transfer. In recent years, there are plenty of works studying transfer learning in deep learning. There are still some practical challenges that remain undiscovered, especially under different problem settings encountered in real situations.
In this thesis, we explore how to adopt knowledge transfer mechanisms in deep learning approaches in several practical scenarios. Four different works are proposed to study knowledge transfer across different domains and tasks via domain adaptation and model transfer. In particular, we study the web knowledge transfer for object detection task by adapting the web data to the real target dataset, which aims at reducing the human annotation effort for training object detector. In the incremental learning scenario, we study the cross-utilization of the old and new knowledge to overcome the catastrophic forgetting during the incremental and progressive learning process. Lastly, we explore transfer learning in the medical imaging domain by transferring the model pre-trained on normal images. Overall, the major contributions are summarized as follows:
- A web knowledge transfer method to enhance the learning of weakly supervised object detection. The proposed method includes an effective web data collection pipeline and a curriculum learning scheme to achieve more effective model optimization during multi-instance learning.
- An annotation-effective object detection method by adapting web data to the target data for object detection. This work attempts to learn an object detector from web supervision by adversarial domain adaptation.
- An incremental learning scheme that adapts an old model to a new model without forgetting the old knowledge. A systematic study is performed to explore different class incremental methods. Furthermore, we propose a graph-based method to mine the old sample forgettability along with the training of the new tasks, and dynamically select samples that are more forgettable to overcome the catastrophic forgetting.
- A lesion detection method for 3D CT images by utilizing model weights that are pre-trained on normal 2D RGB images. Furthermore, an attention-based feature aggregation method is proposed to adaptively transfer the information from neighboring slices to the key slices for more discriminative representation.
Through this thesis, we demonstrate three different paradigms of knowledge transfer, including (1) the cross-domain knowledge transfer for adapting web data to an application with real unconstrained data, (2) the continual knowledge transfer from old tasks to new tasks without forgetting the old knowledge, and (3) the model transfer from the normal image domain to the medical domain. Under several practical tasks, the experiments are conducted to demonstrate the effectiveness of the proposed knowledge transfer approaches.Doctor of Philosoph