29,335 research outputs found
Blockout: Dynamic Model Selection for Hierarchical Deep Networks
Most deep architectures for image classification--even those that are trained
to classify a large number of diverse categories--learn shared image
representations with a single model. Intuitively, however, categories that are
more similar should share more information than those that are very different.
While hierarchical deep networks address this problem by learning separate
features for subsets of related categories, current implementations require
simplified models using fixed architectures specified via heuristic clustering
methods. Instead, we propose Blockout, a method for regularization and model
selection that simultaneously learns both the model architecture and
parameters. A generalization of Dropout, our approach gives a novel
parametrization of hierarchical architectures that allows for structure
learning via back-propagation. To demonstrate its utility, we evaluate Blockout
on the CIFAR and ImageNet datasets, demonstrating improved classification
accuracy, better regularization performance, faster training, and the clear
emergence of hierarchical network structures
Steganographer Identification
Conventional steganalysis detects the presence of steganography within single
objects. In the real-world, we may face a complex scenario that one or some of
multiple users called actors are guilty of using steganography, which is
typically defined as the Steganographer Identification Problem (SIP). One might
use the conventional steganalysis algorithms to separate stego objects from
cover objects and then identify the guilty actors. However, the guilty actors
may be lost due to a number of false alarms. To deal with the SIP, most of the
state-of-the-arts use unsupervised learning based approaches. In their
solutions, each actor holds multiple digital objects, from which a set of
feature vectors can be extracted. The well-defined distances between these
feature sets are determined to measure the similarity between the corresponding
actors. By applying clustering or outlier detection, the most suspicious
actor(s) will be judged as the steganographer(s). Though the SIP needs further
study, the existing works have good ability to identify the steganographer(s)
when non-adaptive steganographic embedding was applied. In this chapter, we
will present foundational concepts and review advanced methodologies in SIP.
This chapter is self-contained and intended as a tutorial introducing the SIP
in the context of media steganography.Comment: A tutorial with 30 page
Ensemble clustering for result diversification
This paper describes the participation of the University of Twente in the Web track of TREC 2012. Our baseline approach uses the Mirex toolkit, an open source tool that sequantially scans all the documents. For result diversification, we experimented with improving the quality of clusters through ensemble clustering. We combined clusters obtained by different clustering methods (such as LDA and K-means) and clusters obtained by using different types of data (such as document text and anchor text). Our two-layer ensemble run performed better than the LDA based diversification and also better than a non-diversification run
EC3: Combining Clustering and Classification for Ensemble Learning
Classification and clustering algorithms have been proved to be successful
individually in different contexts. Both of them have their own advantages and
limitations. For instance, although classification algorithms are more powerful
than clustering methods in predicting class labels of objects, they do not
perform well when there is a lack of sufficient manually labeled reliable data.
On the other hand, although clustering algorithms do not produce label
information for objects, they provide supplementary constraints (e.g., if two
objects are clustered together, it is more likely that the same label is
assigned to both of them) that one can leverage for label prediction of a set
of unknown objects. Therefore, systematic utilization of both these types of
algorithms together can lead to better prediction performance. In this paper,
We propose a novel algorithm, called EC3 that merges classification and
clustering together in order to support both binary and multi-class
classification. EC3 is based on a principled combination of multiple
classification and multiple clustering methods using an optimization function.
We theoretically show the convexity and optimality of the problem and solve it
by block coordinate descent method. We additionally propose iEC3, a variant of
EC3 that handles imbalanced training data. We perform an extensive experimental
analysis by comparing EC3 and iEC3 with 14 baseline methods (7 well-known
standalone classifiers, 5 ensemble classifiers, and 2 existing methods that
merge classification and clustering) on 13 standard benchmark datasets. We show
that our methods outperform other baselines for every single dataset, achieving
at most 10% higher AUC. Moreover our methods are faster (1.21 times faster than
the best baseline), more resilient to noise and class imbalance than the best
baseline method.Comment: 14 pages, 7 figures, 11 table
The structure of borders in a small world
Geographic borders are not only essential for the effective functioning of
government, the distribution of administrative responsibilities and the
allocation of public resources, they also influence the interregional flow of
information, cross-border trade operations, the diffusion of innovation and
technology, and the spatial spread of infectious diseases. However, as growing
interactions and mobility across long distances, cultural, and political
borders continue to amplify the small world effect and effectively decrease the
relative importance of local interactions, it is difficult to assess the
location and structure of effective borders that may play the most significant
role in mobility-driven processes. The paradigm of spatially coherent
communities may no longer be a plausible one, and it is unclear what structures
emerge from the interplay of interactions and activities across spatial scales.
Here we analyse a multi-scale proxy network for human mobility that
incorporates travel across a few to a few thousand kilometres. We determine an
effective system of geographically continuous borders implicitly encoded in
multi-scale mobility patterns. We find that effective large scale boundaries
define spatially coherent subdivisions and only partially coincide with
administrative borders. We find that spatial coherence is partially lost if
only long range traffic is taken into account and show that prevalent models
for multi-scale mobility networks cannot account for the observed patterns.
These results will allow for new types of quantitative, comparative analyses of
multi-scale interaction networks in general and may provide insight into a
multitude of spatiotemporal phenomena generated by human activity.Comment: 9 page
Microbial community pattern detection in human body habitats via ensemble clustering framework
The human habitat is a host where microbial species evolve, function, and
continue to evolve. Elucidating how microbial communities respond to human
habitats is a fundamental and critical task, as establishing baselines of human
microbiome is essential in understanding its role in human disease and health.
However, current studies usually overlook a complex and interconnected
landscape of human microbiome and limit the ability in particular body habitats
with learning models of specific criterion. Therefore, these methods could not
capture the real-world underlying microbial patterns effectively. To obtain a
comprehensive view, we propose a novel ensemble clustering framework to mine
the structure of microbial community pattern on large-scale metagenomic data.
Particularly, we first build a microbial similarity network via integrating
1920 metagenomic samples from three body habitats of healthy adults. Then a
novel symmetric Nonnegative Matrix Factorization (NMF) based ensemble model is
proposed and applied onto the network to detect clustering pattern. Extensive
experiments are conducted to evaluate the effectiveness of our model on
deriving microbial community with respect to body habitat and host gender. From
clustering results, we observed that body habitat exhibits a strong bound but
non-unique microbial structural patterns. Meanwhile, human microbiome reveals
different degree of structural variations over body habitat and host gender. In
summary, our ensemble clustering framework could efficiently explore integrated
clustering results to accurately identify microbial communities, and provide a
comprehensive view for a set of microbial communities. Such trends depict an
integrated biography of microbial communities, which offer a new insight
towards uncovering pathogenic model of human microbiome.Comment: BMC Systems Biology 201
- …