Search CORE

803 research outputs found

Generalized residual vector quantization for large scale data

Author: Liu Shicong
Lu Hongtao
Shao Junru
Publication venue
Publication date: 17/09/2016
Field of study

Vector quantization is an essential tool for tasks involving large scale data, for example, large scale similarity search, which is crucial for content-based information retrieval and analysis. In this paper, we propose a novel vector quantization framework that iteratively minimizes quantization error. First, we provide a detailed review on a relevant vector quantization method named \textit{residual vector quantization} (RVQ). Next, we propose \textit{generalized residual vector quantization} (GRVQ) to further improve over RVQ. Many vector quantization methods can be viewed as the special cases of our proposed framework. We evaluate GRVQ on several large scale benchmark datasets for large scale search, classification and object retrieval. We compared GRVQ with existing methods in detail. Extensive experiments demonstrate our GRVQ framework substantially outperforms existing methods in term of quantization accuracy and computation efficiency.Comment: published on International Conference on Multimedia and Expo 201

arXiv.org e-Print Archive

Crossref

Discovery of Protein Phosphorylation Motifs through Exploratory Data Analysis

Author: Aguan Kripamoy
Chen Yi-Cheng
Chung I-Fang
Pal Nikhil R.
Wang Yao-Tsung
Yang Chu-Wen
Publication venue: Public Library of Science
Publication date: 25/05/2011
Field of study

BACKGROUND: The need for efficient algorithms to uncover biologically relevant phosphorylation motifs has become very important with rapid expansion of the proteomic sequence database along with a plethora of new information on phosphorylation sites. Here we present a novel unsupervised method, called Motif Finder (in short, F-Motif) for identification of phosphorylation motifs. F-Motif uses clustering of sequence information represented by numerical features that exploit the statistical information hidden in some foreground data. Furthermore, these identified motifs are then filtered to find "actual" motifs with statistically significant motif scores. RESULTS AND DISCUSSION: We have applied F-Motif to several new and existing data sets and compared its performance with two well known state-of-the-art methods. In almost all cases F-Motif could identify all statistically significant motifs extracted by the state-of-the-art methods. More importantly, in addition to this, F-Motif uncovers several novel motifs. We have demonstrated using clues from the literature that most of these new motifs discovered by F-Motif are indeed novel. We have also found some interesting phenomena. For example, for CK2 kinase, the conserved sites appear only on the right side of S. However, for CDK kinase, the adjacent site on the right of S is conserved with residue P. In addition, three different encoding methods, including a novel position contrast matrix (PCM) and the simplest binary coding, are used and the ability of F-motif to discover motifs remains quite robust with respect to encoding schemes. CONCLUSIONS: An iterative algorithm proposed here uses exploratory data analysis to discover motifs from phosphorylated data. The effectiveness of F-Motif has been demonstrated using several real data sets as well as using a synthetic data set. The method is quite general in nature and can be used to find other types of motifs also. We have also provided a server for F-Motif at http://f-motif.classcloud.org/, http://bio.classcloud.org/f-motif/ or http://ymu.classcloud.org/f-motif/

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

Adaptive Evolutionary Clustering

Author: AC Harvey
Alfred O. Hero III
DJ Fenn
GW Milligan
H Lütkepohl
H Ning
HW Kuhn
J Schäfer
J Shi
Kevin S. Xu
M Charikar
Mark Kliger
N Eagle
O Ledoit
PJ Mucha
S Haykin
S Tadepalli
T Hastie
T Yang
TW Anderson
U Luxburg von
Y Chen
Y Chi
YR Lin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

In many practical applications of clustering, the objects to be clustered evolve over time, and a clustering result is desired at each time step. In such applications, evolutionary clustering typically outperforms traditional static clustering by producing clustering results that reflect long-term trends while being robust to short-term variations. Several evolutionary clustering algorithms have recently been proposed, often by adding a temporal smoothness penalty to the cost function of a static clustering method. In this paper, we introduce a different approach to evolutionary clustering by accurately tracking the time-varying proximities between objects followed by static clustering. We present an evolutionary clustering framework that adaptively estimates the optimal smoothing parameter using shrinkage estimation, a statistical approach that improves a naive estimate using additional information. The proposed framework can be used to extend a variety of static clustering algorithms, including hierarchical, k-means, and spectral clustering, into evolutionary clustering algorithms. Experiments on synthetic and real data sets indicate that the proposed framework outperforms static clustering and existing evolutionary clustering algorithms in many scenarios.Comment: To appear in Data Mining and Knowledge Discovery, MATLAB toolbox available at http://tbayes.eecs.umich.edu/xukevin/affec

arXiv.org e-Print Archive

CiteSeerX

Crossref

Discrete-Continuous ADMM for Transductive Inference in Higher-Order MRFs

Author: Andres Bjoern
Cremers Daniel
Domokos Csaba
Lange Jan-Hendrik
Laude Emanuel
Leal-Taixé Laura
Schmidt Frank R.
Schüpfer Jonas
Publication venue
Publication date: 01/01/2018
Field of study

This paper introduces a novel algorithm for transductive inference in higher-order MRFs, where the unary energies are parameterized by a variable classifier. The considered task is posed as a joint optimization problem in the continuous classifier parameters and the discrete label variables. In contrast to prior approaches such as convex relaxations, we propose an advantageous decoupling of the objective function into discrete and continuous subproblems and a novel, efficient optimization method related to ADMM. This approach preserves integrality of the discrete label variables and guarantees global convergence to a critical point. We demonstrate the advantages of our approach in several experiments including video object segmentation on the DAVIS data set and interactive image segmentation

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Machine learning methods for discriminating natural targets in seabed imagery

Author: Harrison Richard, John Patrick
Publication venue
Publication date: 01/12/2012
Field of study

The research in this thesis concerns feature-based machine learning processes and methods for discriminating qualitative natural targets in seabed imagery. The applications considered, typically involve time-consuming manual processing stages in an industrial setting. An aim of the research is to facilitate a means of assisting human analysts by expediting the tedious interpretative tasks, using machine methods. Some novel approaches are devised and investigated for solving the application problems. These investigations are compartmentalised in four coherent case studies linked by common underlying technical themes and methods. The first study addresses pockmark discrimination in a digital bathymetry model. Manual identification and mapping of even a relatively small number of these landform objects is an expensive process. A novel, supervised machine learning approach to automating the task is presented. The process maps the boundaries of ≈ 2000 pockmarks in seconds - a task that would take days for a human analyst to complete. The second case study investigates different feature creation methods for automatically discriminating sidescan sonar image textures characteristic of Sabellaria spinulosa colonisation. Results from a comparison of several textural feature creation methods on sonar waterfall imagery show that Gabor filter banks yield some of the best results. A further empirical investigation into the filter bank features created on sonar mosaic imagery leads to the identification of a useful configuration and filter parameter ranges for discriminating the target textures in the imagery. Feature saliency estimation is a vital stage in the machine process. Case study three concerns distance measures for the evaluation and ranking of features on sonar imagery. Two novel consensus methods for creating a more robust ranking are proposed. Experimental results show that the consensus methods can improve robustness over a range of feature parameterisations and various seabed texture classification tasks. The final case study is more qualitative in nature and brings together a number of ideas, applied to the classification of target regions in real-world sonar mosaic imagery. A number of technical challenges arose and these were surmounted by devising a novel, hybrid unsupervised method. This fully automated machine approach was compared with a supervised approach in an application to the problem of image-based sediment type discrimination. The hybrid unsupervised method produces a plausible class map in a few minutes of processing time. It is concluded that the versatile, novel process should be generalisable to the discrimination of other subjective natural targets in real-world seabed imagery, such as Sabellaria textures and pockmarks (with appropriate features and feature tuning.) Further, the full automation of pockmark and Sabellaria discrimination is feasible within this framework

University of East Anglia digital repository

Convex optimization over intersection of simple sets: improved convergence rate guarantees via an exact penalty approach

Author: Bach Francis
Bhattacharyya Chiranjib
Kundu Achintya
Publication venue
Publication date: 17/10/2017
Field of study

We consider the problem of minimizing a convex function over the intersection of finitely many simple sets which are easy to project onto. This is an important problem arising in various domains such as machine learning. The main difficulty lies in finding the projection of a point in the intersection of many sets. Existing approaches yield an infeasible point with an iteration-complexity of

O(1/\varepsilon^2)

for nonsmooth problems with no guarantees on the in-feasibility. By reformulating the problem through exact penalty functions, we derive first-order algorithms which not only guarantees that the distance to the intersection is small but also improve the complexity to

O(1/\varepsilon)

and

O(1/\sqrt{\varepsilon})

for smooth functions. For composite and smooth problems, this is achieved through a saddle-point reformulation where the proximal operators required by the primal-dual algorithms can be computed in closed form. We illustrate the benefits of our approach on a graph transduction problem and on graph matching

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Mobile Interface for Content-Based Image Management

Author: LA CASCIA M.
Morana M.
Sorce S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

People make more and more use of digital image acquisition devices to capture screenshots of their everyday life. The growing number of personal pictures raise the problem of their classification. Some of the authors proposed an automatic technique for personal photo album management dealing with multiple aspects (i.e., people, time and background) in a homogenous way. In this paper we discuss a solution that allows mobile users to remotely access such technique by means of their mobile phones, almost from everywhere, in a pervasive fashion. This allows users to classify pictures they store on their devices. The whole solution is presented, with particular regard to the user interface implemented on the mobile phone, along with some experimental results

Crossref

Archivio istituzionale della ricerca - Università di Palermo

Memristors for the Curious Outsiders

Author: Caravelli Francesco
Carbajal Juan Pablo
Publication venue
Publication date: 01/12/2018
Field of study

We present both an overview and a perspective of recent experimental advances and proposed new approaches to performing computation using memristors. A memristor is a 2-terminal passive component with a dynamic resistance depending on an internal parameter. We provide an brief historical introduction, as well as an overview over the physical mechanism that lead to memristive behavior. This review is meant to guide nonpractitioners in the field of memristive circuits and their connection to machine learning and neural computation.Comment: Perpective paper for MDPI Technologies; 43 page

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Elucidating structure–property relationships in aluminum alloy corrosion inhibitors by machine learning

Author: Galvão Tiago L. P.
Gomes José R. B.
Kuznetsova Alena
Novell-Leruth Gerard
Tedim João
Publication venue: 'American Chemical Society (ACS)'
Publication date: 31/03/2021
Field of study

Organic corrosion inhibitors are playing a crucial role to substitute traditional protective technologies, which have acute toxicity problems associated. However, why some organic compounds inhibit corrosion and others do not, is still not well understood. Therefore, we tested different machine learning (ML) methods to distinguish efficient corrosion inhibitors for aluminum alloys commonly used in aeronautical applications. In this work, we have obtained information that can greatly contribute to automate the search for new and more efficient protective solutions in the future: i) a ML algorithm was selected that is able to classify correctly efficient inhibitors (i.e., with more than 50 % efficiency) and non-inhibitors (i.e. with lower-equal than 50 % efficiency), even when information about different alloys at different pHs is included in the same dataset, which can significantly increase the information available to train the model; ii) new descriptors related to the self-association of the molecules were evaluated, but improvements to the predictive power of the models are limited; iii) average differences concerning the descriptors in this work were identified for inhibitors and non-inhibitors, having the potential to serve as guidelines to select potentially inhibitive molecular systems. This work demonstrates that ML can significantly accelerate research in the field by serving as a tool to perform an initial virtual screen of the molecules.publishe

Repositório Institucional da Universidade de Aveiro