1,014 research outputs found
Context-driven Object Detection and Segmentation with Auxiliary Information
One fundamental problem in computer vision and robotics is to
localize objects of interest in an image. The task can either be
formulated as an object detection problem if the objects are
described by a set of pose parameters, or an object segmentation
one if we recover object boundary precisely. A key issue in
object detection and segmentation concerns exploiting the spatial
context, as local evidence is often insufficient to determine
object pose in the presence of heavy occlusions or large object
appearance variations. This thesis addresses the object detection
and segmentation problem in such adverse conditions with
auxiliary depth data provided by RGBD cameras. We focus on four
main issues in context-aware object detection and segmentation:
1) what are the effective context representations? 2) how can we
work with limited and imperfect depth data? 3) how to design
depth-aware features and integrate depth cues into conventional
visual inference tasks? 4) how to make use of unlabeled data to
relax the labeling requirements for training data?
We discuss three object detection and segmentation scenarios
based on varying amounts of available auxiliary information. In
the first case, depth data are available for model training but
not available for testing. We propose a structured Hough voting
method for detecting objects with heavy occlusion in indoor
environments, in which we extend the Hough hypothesis space to
include both the object's location, and its visibility pattern.
We design a new score function that accumulates votes for object
detection and occlusion prediction. In addition, we explore the
correlation between objects and their environment, building a
depth-encoded object-context model based on RGBD data. In the
second case, we address the problem of localizing glass objects
with noisy and incomplete depth data. Our method integrates the
intensity and depth information from a single view point, and
builds a Markov Random Field that predicts glass boundary and
region jointly. In addition, we propose a nonparametric,
data-driven label transfer scheme for local glass boundary
estimation. A weighted voting scheme based on a joint feature
manifold is adopted to integrate depth and appearance cues, and
we learn a distance metric on the depth-encoded feature manifold.
In the third case, we make use of unlabeled data to relax the
annotation requirements for object detection and segmentation,
and propose a novel data-dependent margin distribution learning
criterion for boosting, which utilizes the intrinsic geometric
structure of datasets. One key aspect of this method is that it
can seamlessly incorporate unlabeled data by including a graph
Laplacian regularizer. We demonstrate the performance of our
models and compare with baseline methods on several real-world
object detection and segmentation tasks, including indoor object
detection, glass object segmentation and foreground segmentation
in video
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
Investigating Class-level Difficulty Factors in Multi-label Classification Problems
This work investigates the use of class-level difficulty factors in
multi-label classification problems for the first time. Four class-level
difficulty factors are proposed: frequency, visual variation, semantic
abstraction, and class co-occurrence. Once computed for a given multi-label
classification dataset, these difficulty factors are shown to have several
potential applications including the prediction of class-level performance
across datasets and the improvement of predictive performance through
difficulty weighted optimisation. Significant improvements to mAP and AUC
performance are observed for two challenging multi-label datasets (WWW Crowd
and Visual Genome) with the inclusion of difficulty weighted optimisation. The
proposed technique does not require any additional computational complexity
during training or inference and can be extended over time with inclusion of
other class-level difficulty factors.Comment: Published in ICME 202
Classification Arabic Twitter Userâs Insights Using Rough Set Theory
Nowadays, people using social media from around the world to share their daily affairs. Arabic twitter for example is a platform where users read, reply, post which known âtweetsâ. Users trading their opinions on different trends that are not equal in important and differed based on their power and interest. Tweets can provide rich information to make decision. The main objective of this paper is to present a framework for making a valuable decision through analyzing social users' insights based on their proximity to a particular trend with highlights their power in this trend. Tweets are exceedingly unstructured that makes it difficult to analyze. Nevertheless, our proposed model differs from previous research in this field it gathered the use of supervised and unsupervised machine learning algorithms. The process of performing this work as follows: classifying users based on the degree of their closeness/interest utilizing Mendelowâs power/interest matrix, rough set theory to eliminate the features that may be found in user profiles to find minimal sets of data. The proposed model applied two attribute reduction algorithms on our dataset to determine the optimal number of reducts for improving decision making from the user replies. In addition to, unsupervised machine learning to group their replies into subcategories such as positive, negative, or neutral. The experimental evaluation shows that Johnson algorithm has reduced the user attributes by 71% than genetic algorithm that utilized in a classification model
It is all about where you start: Text-to-image generation with seed selection
Text-to-image diffusion models can synthesize a large variety of concepts in
new compositions and scenarios. However, they still struggle with generating
uncommon concepts, rare unusual combinations, or structured concepts like hand
palms. Their limitation is partly due to the long-tail nature of their training
data: web-crawled data sets are strongly unbalanced, causing models to
under-represent concepts from the tail of the distribution. Here we
characterize the effect of unbalanced training data on text-to-image models and
offer a remedy. We show that rare concepts can be correctly generated by
carefully selecting suitable generation seeds in the noise space, a technique
that we call SeedSelect. SeedSelect is efficient and does not require
retraining the diffusion model. We evaluate the benefit of SeedSelect on a
series of problems. First, in few-shot semantic data augmentation, where we
generate semantically correct images for few-shot and long-tail benchmarks. We
show classification improvement on all classes, both from the head and tail of
the training data of diffusion models. We further evaluate SeedSelect on
correcting images of hands, a well-known pitfall of current diffusion models,
and show that it improves hand generation substantially
Word sense disambiguation for event trigger word detection in biomedicine
This paper describes a method for detecting event trigger words in biomedical text based on a word sense disambiguation (WSD) approach. We first investigate the applicability of existing WSD techniques to trigger word disambiguation in the BioNLP 2009 shared task data, and find that we are able to outperform a traditional CRF-based approach for certain word types. On the basis of this finding, we combine the WSD approach with the CRF, and obtain significant improvements over the standalone CRF, gaining particularly in recall
- âŚ