609 research outputs found
Visual analysis for drum sequence transcription
A system is presented for analysing drum performance video sequences. A novel ellipse detection algorithm is introduced that automatically locates drum tops. This algorithm fits ellipses to edge clusters, and ranks them according to various fitness criteria. A background/foreground segmentation method is then used to extract the silhouette of the drummer and drum sticks. Coupled with a motion
intensity feature, this allows for the detection of āhitsā in each of the extracted regions. In order to obtain a transcription of the performance, each of these regions is automatically labeled with the corresponding instrument class. A partial audio transcription and color cues are used to measure the compatibility between a region and its label, the Kuhn-Munkres algorithm is then employed to find the optimal labeling. Experimental results demonstrate the ability of visual analysis to enhance the performance of an audio drum transcription system
TFDet: Target-aware Fusion for RGB-T Pedestrian Detection
Pedestrian detection plays a critical role in computer vision as it
contributes to ensuring traffic safety. Existing methods that rely solely on
RGB images suffer from performance degradation under low-light conditions due
to the lack of useful information. To address this issue, recent multispectral
detection approaches have combined thermal images to provide complementary
information and have obtained enhanced performances. Nevertheless, few
approaches focus on the negative effects of false positives caused by noisy
fused feature maps. Different from them, we comprehensively analyze the impacts
of false positives on the detection performance and find that enhancing feature
contrast can significantly reduce these false positives. In this paper, we
propose a novel target-aware fusion strategy for multispectral pedestrian
detection, named TFDet. Our fusion strategy highlights the pedestrian-related
features while suppressing unrelated ones, resulting in more discriminative
fused features. TFDet achieves state-of-the-art performance on both KAIST and
LLVIP benchmarks, with an efficiency comparable to the previous
state-of-the-art counterpart. Importantly, TFDet performs remarkably well even
under low-light conditions, which is a significant advancement for ensuring
road safety. The code will be made publicly available at
\url{https://github.com/XueZ-phd/TFDet.git}
Recognition of Human Actions in Video
Recognition and analysis of Human actions is an important task in the area of computer vision. There are many applications of this research which include surveillance systems, patient monitoring systems, human performance analysis, con tent - based image/video retrieval/storage, virtual reality and a variety of syst ems that involve interactions between persons or interactions between person and devices, etc. The need for such system is increasing day - by - day, with the increase in number of surveillance cameras deployed in public spaces. Automated systems are required that can detect, categorize and recognize human activities and request the human attention only when necessary. In this paper, important steps of such a system are described that can robustly tracks human in various environments and recognizes their actions through image sequences acquired from a single fixed camera. The overall system consists of major th ree steps: blob extraction, feature extraction, and human action recognition. Given the sequence of images, a statistical method is demonstrated to extract the blobs and to remove the shadows and highlights in order to obtain a more accurate object silhouet te. Shape context is used to extract features in next step and at - last human action is recognized using neural networ
WATUNet: A Deep Neural Network for Segmentation of Volumetric Sweep Imaging Ultrasound
Objective. Limited access to breast cancer diagnosis globally leads to
delayed treatment. Ultrasound, an effective yet underutilized method, requires
specialized training for sonographers, which hinders its widespread use.
Approach. Volume sweep imaging (VSI) is an innovative approach that enables
untrained operators to capture high-quality ultrasound images. Combined with
deep learning, like convolutional neural networks (CNNs), it can potentially
transform breast cancer diagnosis, enhancing accuracy, saving time and costs,
and improving patient outcomes. The widely used UNet architecture, known for
medical image segmentation, has limitations, such as vanishing gradients and a
lack of multi-scale feature extraction and selective region attention. In this
study, we present a novel segmentation model known as Wavelet_Attention_UNet
(WATUNet). In this model, we incorporate wavelet gates (WGs) and attention
gates (AGs) between the encoder and decoder instead of a simple connection to
overcome the limitations mentioned, thereby improving model performance. Main
results. Two datasets are utilized for the analysis. The public "Breast
Ultrasound Images" (BUSI) dataset of 780 images and a VSI dataset of 3818
images. Both datasets contained segmented lesions categorized into three types:
no mass, benign mass, and malignant mass. Our segmentation results show
superior performance compared to other deep networks. The proposed algorithm
attained a Dice coefficient of 0.94 and an F1 score of 0.94 on the VSI dataset
and scored 0.93 and 0.94 on the public dataset, respectively.Comment: N/
Salient Object Detection via Integrity Learning
Albeit current salient object detection (SOD) works have achieved fantastic
progress, they are cast into the shade when it comes to the integrity of the
predicted salient regions. We define the concept of integrity at both the micro
and macro level. Specifically, at the micro level, the model should highlight
all parts that belong to a certain salient object, while at the macro level,
the model needs to discover all salient objects from the given image scene. To
facilitate integrity learning for salient object detection, we design a novel
Integrity Cognition Network (ICON), which explores three important components
to learn strong integrity features. 1) Unlike the existing models that focus
more on feature discriminability, we introduce a diverse feature aggregation
(DFA) component to aggregate features with various receptive fields (i.e.,,
kernel shape and context) and increase the feature diversity. Such diversity is
the foundation for mining the integral salient objects. 2) Based on the DFA
features, we introduce the integrity channel enhancement (ICE) component with
the goal of enhancing feature channels that highlight the integral salient
objects at the macro level, while suppressing the other distracting ones. 3)
After extracting the enhanced features, the part-whole verification (PWV)
method is employed to determine whether the part and whole object features have
strong agreement. Such part-whole agreements can further improve the
micro-level integrity for each salient object. To demonstrate the effectiveness
of ICON, comprehensive experiments are conducted on seven challenging
benchmarks, where promising results are achieved
Salient Object Detection Techniques in Computer Vision-A Survey.
Detection and localization of regions of images that attract immediate human visual attention is currently an intensive area of research in computer vision. The capability of automatic identification and segmentation of such salient image regions has immediate consequences for applications in the field of computer vision, computer graphics, and multimedia. A large number of salient object detection (SOD) methods have been devised to effectively mimic the capability of the human visual system to detect the salient regions in images. These methods can be broadly categorized into two categories based on their feature engineering mechanism: conventional or deep learning-based. In this survey, most of the influential advances in image-based SOD from both conventional as well as deep learning-based categories have been reviewed in detail. Relevant saliency modeling trends with key issues, core techniques, and the scope for future research work have been discussed in the context of difficulties often faced in salient object detection. Results are presented for various challenging cases for some large-scale public datasets. Different metrics considered for assessment of the performance of state-of-the-art salient object detection models are also covered. Some future directions for SOD are presented towards end
Direction Selective Contour Detection for Salient Objects
The active contour model is a widely used technique
for automatic object contour extraction. Existing methods based
on this model can perform with high accuracy even in case of
complex contours, but challenging issues remain, like the need
for precise contour initialization for high curvature boundary
segments or the handling of cluttered backgrounds. To deal
with such issues, this paper presents a salient object extraction
method, the first step of which is the introduction of an improved
edge map that incorporates edge direction as a feature. The
direction information in the small neighborhoods of image feature
points are extracted, and the imagesā prominent orientations
are defined for direction-selective edge extraction. Using such
improved edge information, we provide a highly accurate shape
contour representation, which we also combine with texture
features. The principle of the paper is to interpret an object as
the fusion of its components: its extracted contour and its inner
texture. Our goal in fusing textural and structural information is
twofold: it is applied for automatic contour initialization, and it is
also used to establish an improved external force field. This fusion
then produces highly accurate salient object extractions. We
performed extensive evaluations which confirm that the presented
object extraction method outperforms parametric active contour
models and achieves higher efficiency than the majority of the
evaluated automatic saliency methods
- ā¦