6 research outputs found
INTRODUCING A NEW WAY OF VISUAL SEARCH SYSTEM BASED ON THE CLASSIFIERS
Attribute-based image representation was in addition revealed great promises for discriminative and descriptive ability because of instinctive interpretation additionally to combine-category generalization property. They explain image regions that are regular in an object group but unusual outdoors from this hence attribute-based visual descriptor has accomplished high-quality performance in managing in the task of image classification. We advise to produce utilization of tough semantic relationship within graph for re-ranking of image search and based on classifiers for the entire predefined qualities, all the images is symbolized using a characteristic feature which includes responses within the classifiers. A hyper-graph may be used to model relationship among images by means of integrating low-level visual features additionally to attribute features after which ranking of hyper-graph is transported to buy images. Its fundamental principle is always that visually related images must have related ranking scores
Clustering with Multi-Layer Graphs: A Spectral Perspective
Observational data usually comes with a multimodal nature, which means that
it can be naturally represented by a multi-layer graph whose layers share the
same set of vertices (users) with different edges (pairwise relationships). In
this paper, we address the problem of combining different layers of the
multi-layer graph for improved clustering of the vertices compared to using
layers independently. We propose two novel methods, which are based on joint
matrix factorization and graph regularization framework respectively, to
efficiently combine the spectrum of the multiple graph layers, namely the
eigenvectors of the graph Laplacian matrices. In each case, the resulting
combination, which we call a "joint spectrum" of multiple graphs, is used for
clustering the vertices. We evaluate our approaches by simulations with several
real world social network datasets. Results demonstrate the superior or
competitive performance of the proposed methods over state-of-the-art technique
and common baseline methods, such as co-regularization and summation of
information from individual graphs
Supervised and unsupervised methods for learning representations of linguistic units
Word representations, also called word embeddings, are generic representations, often high-dimensional vectors. They map the discrete space of words into a continuous vector space, which allows us to handle rare or even unseen events, e.g. by considering the nearest neighbors. Many Natural Language Processing tasks can be improved by word representations if we extend the task specific training data by the general knowledge incorporated in the word representations.
The first publication investigates a supervised, graph-based method to create word representations. This method leads to a graph-theoretic similarity measure, CoSimRank, with equivalent formalizations that show CoSimRankâs close relationship to Personalized Page-Rank and SimRank. The new formalization is efficient because it can use the graph-based word representation to compute a single node similarity without having to compute the similarities of the entire graph. We also show how we can take advantage of fast matrix multiplication algorithms.
In the second publication, we use existing unsupervised methods for word representation learning and combine these with semantic resources by learning representations for non-word objects like synsets and entities. We also investigate improved word representations which incorporate the semantic information from the resource. The method is flexible in that it can take any word representations as input and does not need an additional training corpus. A sparse tensor formalization guarantees efficiency and parallelizability.
In the third publication, we introduce a method that learns an orthogonal transformation of the word representation space that focuses the information relevant for a task in an ultradense subspace of a dimensionality that is smaller by a factor of 100 than the original space. We use ultradense representations for a Lexicon Creation task in which words are annotated with three types of lexical information â sentiment, concreteness and frequency.
The final publication introduces a new calculus for the interpretable ultradense subspaces, including polarity, concreteness, frequency and part-of-speech (POS). The calculus supports operations like ââ1 Ă hate = loveâ and âgive me a neutral word for greasyâ (i.e., oleaginous) and extends existing analogy computations like âking â man + woman = queenâ.WortreprĂ€sentationen, sogenannte Word Embeddings, sind generische ReprĂ€sentationen, meist hochdimensionale Vektoren. Sie bilden den diskreten Raum der Wörter in einen stetigen Vektorraum ab und erlauben uns, seltene oder ungesehene Ereignisse zu behandeln -- zum Beispiel durch die Betrachtung der nĂ€chsten Nachbarn. Viele Probleme der Computerlinguistik können durch WortreprĂ€sentationen gelöst werden, indem wir spezifische Trainingsdaten um die allgemeinen Informationen erweitern, welche in den WortreprĂ€sentationen enthalten sind.
In der ersten Publikation untersuchen wir ĂŒberwachte, graphenbasierte Methodenn um WortreprĂ€sentationen zu erzeugen. Diese Methoden fĂŒhren zu einem graphenbasierten ĂhnlichkeitsmaĂ, CoSimRank, fĂŒr welches zwei Ă€quivalente Formulierungen existieren, die sowohl die enge Beziehung zum personalisierten PageRank als auch zum SimRank zeigen. Die neue Formulierung kann einzelne KnotenĂ€hnlichkeiten effektiv berechnen, da graphenbasierte WortreprĂ€sentationen benutzt werden können.
In der zweiten Publikation verwenden wir existierende WortreprĂ€sentationen und kombinieren diese mit semantischen Ressourcen, indem wir ReprĂ€sentationen fĂŒr Objekte lernen, welche keine Wörter sind, wie zum Beispiel Synsets und EntitĂ€ten. Die FlexibilitĂ€t unserer Methode zeichnet sich dadurch aus, dass wir beliebige WortreprĂ€sentationen als Eingabe verwenden können und keinen zusĂ€tzlichen Trainingskorpus benötigen.
In der dritten Publikation stellen wir eine Methode vor, die eine Orthogonaltransformation des Vektorraums der WortreprĂ€sentationen lernt. Diese Transformation fokussiert relevante Informationen in einen ultra-kompakten Untervektorraum. Wir benutzen die ultra-kompakten ReprĂ€sentationen zur Erstellung von WörterbĂŒchern mit drei verschiedene Angaben -- Stimmung, Konkretheit und HĂ€ufigkeit.
Die letzte Publikation prĂ€sentiert eine neue Rechenmethode fĂŒr die interpretierbaren ultra-kompakten UntervektorrĂ€ume -- Stimmung, Konkretheit, HĂ€ufigkeit und Wortart. Diese Rechenmethode beinhaltet Operationen wie ââ1 Ă Hass = Liebeâ und âneutrales Wort fĂŒr Winkeladvokatâ (d.h., Anwalt) und erweitert existierende Rechenmethoden, wie âOnkel â Mann + Frau = Tanteâ
Unsupervised Graph-Based Similarity Learning Using Heterogeneous Features.
Relational data refers to data that contains explicit relations among objects. Nowadays, relational
data are universal and have a broad appeal in many different application domains. The
problem of estimating similarity between objects is a core requirement for many standard
Machine Learning (ML), Natural Language Processing (NLP) and Information Retrieval
(IR) problems such as clustering, classiffication, word sense disambiguation, etc. Traditional
machine learning approaches represent the data using simple, concise representations such
as feature vectors. While this works very well for homogeneous data, i.e, data with a single
feature type such as text, it does not exploit the availability of dfferent feature types fully.
For example, scientic publications have text, citations, authorship information, venue information.
Each of the features can be used for estimating similarity. Representing such
objects has been a key issue in efficient mining (Getoor and Taskar, 2007). In this thesis,
we propose natural representations for relational data using multiple, connected layers of
graphs; one for each feature type. Also, we propose novel algorithms for estimating similarity
using multiple heterogeneous features. Also, we present novel algorithms for tasks like topic detection and music recommendation using the estimated similarity measure. We
demonstrate superior performance of the proposed algorithms (root mean squared error of
24.81 on the Yahoo! KDD Music recommendation data set and classiffication accuracy of
88% on the ACL Anthology Network data set) over many of the state of the art algorithms,
such as Latent Semantic Analysis (LSA), Multiple Kernel Learning (MKL) and spectral
clustering and baselines on large, standard data sets.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89824/1/mpradeep_1.pd