316 research outputs found
Lifelong Spectral Clustering
In the past decades, spectral clustering (SC) has become one of the most
effective clustering algorithms. However, most previous studies focus on
spectral clustering tasks with a fixed task set, which cannot incorporate with
a new spectral clustering task without accessing to previously learned tasks.
In this paper, we aim to explore the problem of spectral clustering in a
lifelong machine learning framework, i.e., Lifelong Spectral Clustering (L2SC).
Its goal is to efficiently learn a model for a new spectral clustering task by
selectively transferring previously accumulated experience from knowledge
library. Specifically, the knowledge library of L2SC contains two components:
1) orthogonal basis library: capturing latent cluster centers among the
clusters in each pair of tasks; 2) feature embedding library: embedding the
feature manifold information shared among multiple related tasks. As a new
spectral clustering task arrives, L2SC firstly transfers knowledge from both
basis library and feature library to obtain encoding matrix, and further
redefines the library base over time to maximize performance across all the
clustering tasks. Meanwhile, a general online update formulation is derived to
alternatively update the basis library and feature library. Finally, the
empirical experiments on several real-world benchmark datasets demonstrate that
our L2SC model can effectively improve the clustering performance when
comparing with other state-of-the-art spectral clustering algorithms.Comment: 9 pages,7 figure
Sparsity Based Poisson Denoising with Dictionary Learning
The problem of Poisson denoising appears in various imaging applications,
such as low-light photography, medical imaging and microscopy. In cases of high
SNR, several transformations exist so as to convert the Poisson noise into an
additive i.i.d. Gaussian noise, for which many effective algorithms are
available. However, in a low SNR regime, these transformations are
significantly less accurate, and a strategy that relies directly on the true
noise statistics is required. A recent work by Salmon et al. took this route,
proposing a patch-based exponential image representation model based on GMM
(Gaussian mixture model), leading to state-of-the-art results. In this paper,
we propose to harness sparse-representation modeling to the image patches,
adopting the same exponential idea. Our scheme uses a greedy pursuit with
boot-strapping based stopping condition and dictionary learning within the
denoising process. The reconstruction performance of the proposed scheme is
competitive with leading methods in high SNR, and achieving state-of-the-art
results in cases of low SNR.Comment: 13 pages, 9 figure
Multitask Online Mirror Descent
We introduce and analyze MT-OMD, a multitask generalization of Online Mirror
Descent (OMD) which operates by sharing updates between tasks. We prove that
the regret of MT-OMD is of order , where
is the task variance according to the geometry induced by the
regularizer, is the number of tasks, and is the time horizon. Whenever
tasks are similar, that is , our method improves upon the
bound obtained by running independent OMDs on each task. We further
provide a matching lower bound, and show that our multitask extensions of
Online Gradient Descent and Exponentiated Gradient, two major instances of OMD,
enjoy closed-form updates, making them easy to use in practice. Finally, we
present experiments on both synthetic and real-world datasets supporting our
findings
Unsupervised learning of relation detection patterns
L'extracció d'informació és l'à rea del processament de llenguatge natural l'objectiu de la qual és l'obtenir dades
estructurades a partir de la informació rellevant continguda en fragments textuals.
L'extracció d'informació requereix una quantitat considerable de coneixement lingüÃstic. La especificitat d'aquest
coneixement suposa un inconvenient de cara a la portabilitat dels sistemes, ja que un canvi d'idioma, domini o estil té un
cost en termes d'esforç humà . Durant dècades, s'han aplicat tècniques d'aprenentatge automà tic per tal de superar aquest
coll d'ampolla de portabilitat, reduint progressivament la supervisió humana involucrada. Tanmateix, a mida que augmenta
la disponibilitat de grans col·leccions de documents, esdevenen necessà ries aproximacions completament nosupervisades
per tal d'explotar el coneixement que hi ha en elles.
La proposta d'aquesta tesi és la d'incorporar tècniques de clustering a l'adquisició de patrons per a extracció d'informació,
per tal de reduir encara més els elements de supervisió involucrats en el procés En particular, el treball se centra en el
problema de la detecció de relacions. L'assoliment d'aquest objectiu final ha requerit, en primer lloc, el considerar les
diferents estratègies en què aquesta combinació es podia dur a terme; en segon lloc, el desenvolupar o adaptar algorismes
de clustering adequats a les nostres necessitats; i en tercer lloc, el disseny de procediments d'adquisició de patrons que
incorporessin la informació de clustering.
Al final d'aquesta tesi, havÃem estat capaços de desenvolupar i implementar una aproximació per a l'aprenentatge de
patrons per a detecció de relacions que, utilitzant tècniques de clustering i un mÃnim de supervisió humana, és competitiu i
fins i tot supera altres aproximacions comparables en l'estat de l'art.Information extraction is the natural language processing area whose goal is to obtain structured data from the relevant
information contained in textual fragments.
Information extraction requires a significant amount of linguistic knowledge. The specificity of such knowledge supposes a
drawback on the portability of the systems, as a change of language, domain or style demands a costly human effort.
Machine learning techniques have been applied for decades so as to overcome this portability bottleneck¿progressively
reducing the amount of involved human supervision. However, as the availability of large document collections increases,
completely unsupervised approaches become necessary in order to mine the knowledge contained in them.
The proposal of this thesis is to incorporate clustering techniques into pattern learning for information extraction, in order to
further reduce the elements of supervision involved in the process. In particular, the work focuses on the problem of relation
detection. The achievement of this ultimate goal has required, first, considering the different strategies in which this
combination could be carried out; second, developing or adapting clustering algorithms suitable to our needs; and third,
devising pattern learning procedures which incorporated clustering information.
By the end of this thesis, we had been able to develop and implement an approach for learning of relation detection patterns
which, using clustering techniques and minimal human supervision, is competitive and even outperforms other comparable
approaches in the state of the art.Postprint (published version
Learning With An Insufficient Supply Of Data Via Knowledge Transfer And Sharing
As machine learning methods extend to more complex and diverse set of problems, situations arise where the complexity and availability of data presents a situation where the information source is not adequate to generate a representative hypothesis. Learning from multiple sources of data is a promising research direction as researchers leverage ever more diverse sources of information. Since data is not readily available, knowledge has to be transferred from other sources and new methods (both supervised and un-supervised) have to be developed to selectively share and transfer knowledge. In this dissertation, we present both supervised and un-supervised techniques to tackle a problem where learning algorithms cannot generalize and require an extension to leverage knowledge from different sources of data. Knowledge transfer is a difficult problem as diverse sources of data can overwhelm each individual dataset\u27s distribution and a careful set of transformations has to be applied to increase the relevant knowledge at the risk of biasing a dataset\u27s distribution and inducing negative transfer that can degrade a learner\u27s performance.
We give an overview of the issues encountered when the learning dataset does not have a sufficient supply of training examples. We categorize the structure of small datasets and highlight the need for further research. We present an instance-transfer supervised classification algorithm to improve classification performance in a target dataset via knowledge transfer from an auxiliary dataset. The improved classification performance of our algorithm is demonstrated with several real-world experiments. We extend the instance-transfer paradigm to supervised classification with Absolute Rarity\u27 , where a dataset has an insufficient supply of training examples and a skewed class distribution. We demonstrate one solution with a transfer learning approach and another with an imbalanced learning approach and demonstrate the effectiveness of our algorithms with several real world text and demographics classification problems (among others). We present an unsupervised multi-task clustering algorithm where several small datasets are simultaneously clustered and knowledge is transferred between the datasets to improve clustering performance on each individual dataset and we demonstrate the improved clustering performance with an extensive set of experiments
SVS-JOIN : efficient spatial visual similarity join for geo-multimedia
In the big data era, massive amount of multimedia data with geo-tags has been generated and collected by smart devices equipped with mobile communications module and position sensor module. This trend has put forward higher request on large-scale geo-multimedia retrieval. Spatial similarity join is one of the significant problems in the area of spatial database. Previous works focused on spatial textual document search problem, rather than geo-multimedia retrieval. In this paper, we investigate a novel geo-multimedia retrieval paradigm named spatial visual similarity join (SVS-JOIN for short), which aims to search similar geo-image pairs in both aspects of geo-location and visual content. Firstly, the definition of SVS-JOIN is proposed and then we present the geographical similarity and visual similarity measurement. Inspired by the approach for textual similarity join, we develop an algorithm named SVS-JOIN B by combining the PPJOIN algorithm and visual similarity. Besides, an extension of it named SVS-JOIN G is developed, which utilizes spatial grid strategy to improve the search efficiency. To further speed up the search, a novel approach called SVS-JOIN Q is carefully designed, in which a quadtree and a global inverted index are employed. Comprehensive experiments are conducted on two geo-image datasets and the results demonstrate that our solution can address the SVS-JOIN problem effectively and efficiently
- …