1,796 research outputs found
Total Jensen divergences: Definition, Properties and k-Means++ Clustering
We present a novel class of divergences induced by a smooth convex function
called total Jensen divergences. Those total Jensen divergences are invariant
by construction to rotations, a feature yielding regularization of ordinary
Jensen divergences by a conformal factor. We analyze the relationships between
this novel class of total Jensen divergences and the recently introduced total
Bregman divergences. We then proceed by defining the total Jensen centroids as
average distortion minimizers, and study their robustness performance to
outliers. Finally, we prove that the k-means++ initialization that bypasses
explicit centroid computations is good enough in practice to guarantee
probabilistically a constant approximation factor to the optimal k-means
clustering.Comment: 27 page
Distribution-Based Categorization of Classifier Transfer Learning
Transfer Learning (TL) aims to transfer knowledge acquired in one problem,
the source problem, onto another problem, the target problem, dispensing with
the bottom-up construction of the target model. Due to its relevance, TL has
gained significant interest in the Machine Learning community since it paves
the way to devise intelligent learning models that can easily be tailored to
many different applications. As it is natural in a fast evolving area, a wide
variety of TL methods, settings and nomenclature have been proposed so far.
However, a wide range of works have been reporting different names for the same
concepts. This concept and terminology mixture contribute however to obscure
the TL field, hindering its proper consideration. In this paper we present a
review of the literature on the majority of classification TL methods, and also
a distribution-based categorization of TL with a common nomenclature suitable
to classification problems. Under this perspective three main TL categories are
presented, discussed and illustrated with examples
Structure Preserving Large Imagery Reconstruction
With the explosive growth of web-based cameras and mobile devices, billions
of photographs are uploaded to the internet. We can trivially collect a huge
number of photo streams for various goals, such as image clustering, 3D scene
reconstruction, and other big data applications. However, such tasks are not
easy due to the fact the retrieved photos can have large variations in their
view perspectives, resolutions, lighting, noises, and distortions.
Fur-thermore, with the occlusion of unexpected objects like people, vehicles,
it is even more challenging to find feature correspondences and reconstruct
re-alistic scenes. In this paper, we propose a structure-based image completion
algorithm for object removal that produces visually plausible content with
consistent structure and scene texture. We use an edge matching technique to
infer the potential structure of the unknown region. Driven by the estimated
structure, texture synthesis is performed automatically along the estimated
curves. We evaluate the proposed method on different types of images: from
highly structured indoor environment to natural scenes. Our experimental
results demonstrate satisfactory performance that can be potentially used for
subsequent big data processing, such as image localization, object retrieval,
and scene reconstruction. Our experiments show that this approach achieves
favorable results that outperform existing state-of-the-art techniques
Environmental Sensing by Wearable Device for Indoor Activity and Location Estimation
We present results from a set of experiments in this pilot study to
investigate the causal influence of user activity on various environmental
parameters monitored by occupant carried multi-purpose sensors. Hypotheses with
respect to each type of measurements are verified, including temperature,
humidity, and light level collected during eight typical activities: sitting in
lab / cubicle, indoor walking / running, resting after physical activity,
climbing stairs, taking elevators, and outdoor walking. Our main contribution
is the development of features for activity and location recognition based on
environmental measurements, which exploit location- and activity-specific
characteristics and capture the trends resulted from the underlying
physiological process. The features are statistically shown to have good
separability and are also information-rich. Fusing environmental sensing
together with acceleration is shown to achieve classification accuracy as high
as 99.13%. For building applications, this study motivates a sensor fusion
paradigm for learning individualized activity, location, and environmental
preferences for energy management and user comfort.Comment: submitted to the 40th Annual Conference of the IEEE Industrial
Electronics Society (IECON
Bootstrapping Named Entity Annotation by Means of Active Machine Learning: A Method for Creating Corpora
This thesis describes the development and in-depth empirical investigation of a
method, called BootMark, for bootstrapping the marking up of named entities
in textual documents. The reason for working with documents, as opposed to
for instance sentences or phrases, is that the BootMark method is concerned
with the creation of corpora. The claim made in the thesis is that BootMark
requires a human annotator to manually annotate fewer documents in order to
produce a named entity recognizer with a given performance, than would be
needed if the documents forming the basis for the recognizer were randomly
drawn from the same corpus. The intention is then to use the created named en-
tity recognizer as a pre-tagger and thus eventually turn the manual annotation
process into one in which the annotator reviews system-suggested annotations
rather than creating new ones from scratch. The BootMark method consists of
three phases: (1) Manual annotation of a set of documents; (2) Bootstrapping
– active machine learning for the purpose of selecting which document to an-
notate next; (3) The remaining unannotated documents of the original corpus
are marked up using pre-tagging with revision.
Five emerging issues are identified, described and empirically investigated
in the thesis. Their common denominator is that they all depend on the real-
ization of the named entity recognition task, and as such, require the context
of a practical setting in order to be properly addressed. The emerging issues
are related to: (1) the characteristics of the named entity recognition task and
the base learners used in conjunction with it; (2) the constitution of the set of
documents annotated by the human annotator in phase one in order to start the
bootstrapping process; (3) the active selection of the documents to annotate in
phase two; (4) the monitoring and termination of the active learning carried out
in phase two, including a new intrinsic stopping criterion for committee-based
active learning; and (5) the applicability of the named entity recognizer created
during phase two as a pre-tagger in phase three.
The outcomes of the empirical investigations concerning the emerging is-
sues support the claim made in the thesis. The results also suggest that while
the recognizer produced in phases one and two is as useful for pre-tagging as
a recognizer created from randomly selected documents, the applicability of
the recognizer as a pre-tagger is best investigated by conducting a user study
involving real annotators working on a real named entity recognition task
- …