Search CORE

10 research outputs found

Pyramidal Stochastic Graphlet Embedding for Document Pattern Classification

Author: Dutta A
Fornes A
Llados J
Riba P
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/10/2019
Field of study

This is the author accepted manuscript. The final version is available from IEEE via the DOI in this recordDocument pattern classification methods using graphs have received a lot of attention because of its robust representation paradigm and rich theoretical background. However, the way of preserving and the process for delineating documents with graphs introduce noise in the rendition of underlying data, which creates instability in the graph representation. To deal with such unreliability in representation, in this paper, we propose Pyramidal Stochastic Graphlet Embedding (PSGE). Given a graph representing a document pattern, our method first computes a graph pyramid by successively reducing the base graph. Once the graph pyramid is computed, we apply Stochastic Graphlet Embedding (SGE) for each level of the pyramid and combine their embedded representation to obtain a global delineation of the original graph. The consideration of pyramid of graphs rather than just a base graph extends the representational power of the graph embedding, which reduces the instability caused due to noise and distortion. When plugged with support vector machine, our proposed PSGE has outperformed the state-of-The-art results in recognition of handwritten words as well as graphical symbols.European Union Horizon 2020Ministerio de Educación, Cultura y Deporte, SpainRamon y Cajal FellowshipCERCA Program/Generalitat de Cataluny

Open Research Exeter

Designing labeled graph classifiers by exploiting the R\'enyi entropy of the dissimilarity representation

Author: Livi Lorenzo
Publication venue: 'MDPI AG'
Publication date: 20/04/2017
Field of study

Representing patterns as labeled graphs is becoming increasingly common in the broad field of computational intelligence. Accordingly, a wide repertoire of pattern recognition tools, such as classifiers and knowledge discovery procedures, are nowadays available and tested for various datasets of labeled graphs. However, the design of effective learning procedures operating in the space of labeled graphs is still a challenging problem, especially from the computational complexity viewpoint. In this paper, we present a major improvement of a general-purpose classifier for graphs, which is conceived on an interplay between dissimilarity representation, clustering, information-theoretic techniques, and evolutionary optimization algorithms. The improvement focuses on a specific key subroutine devised to compress the input data. We prove different theorems which are fundamental to the setting of the parameters controlling such a compression operation. We demonstrate the effectiveness of the resulting classifier by benchmarking the developed variants on well-known datasets of labeled graphs, considering as distinct performance indicators the classification accuracy, computing time, and parsimony in terms of structural complexity of the synthesized classification models. The results show state-of-the-art standards in terms of test set accuracy and a considerable speed-up for what concerns the computing time.Comment: Revised versio

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Entropic graph embedding via multivariate degree distributions

Author: C. Ye
D. Berwanger
D. Perrault-Joncas
F. Chung
H. Bunke
K. Riesen
L. Han
P. Ren
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Although there are many existing alternative methods for using structural characterizations of undirected graphs for embedding, clustering and classification problems, there is relatively little literature aimed at dealing with such problems for directed graphs. In this paper we present a novel method for characterizing graph structure that can be used to embed directed graphs into a feature space. The method commences from a characterization based on the distribution of the von Neumann entropy of a directed graph with the in and out-degree configurations associated with directed edges. We start from a recently developed expression for the von Neumann entropy of a directed graph, which depends on vertex in-degree and out-degree statistics, and thus obtain a multivariate edge-based distribution of entropy. We show how this distribution can be encoded as a multi-dimensional histogram, which captures the structure of a directed graph and reflects its complexity. By performing principal components analysis on a sample of histograms, we embed populations of directed graphs into a low dimensional space. Finally, we undertake experiments on both artificial and real-world data to demonstrate that our directed graph embedding method is effective in distinguishing different types of directed graphs

Crossref

White Rose Research Online

Input variable selection in time-critical knowledge integration applications: A review, analysis, and recommendation paper

Author: A. Mousavi
Ambrosetti
Askin
Banks
Banks
Beylkin
Blum
Borgonovo
Braddock
Brodersen
Buchenneder
Bunke
Buonomo
Charaniya
Chen
Chi
Cloke
Cukier
De Pauw
Duffy
Durkee
Faghihi
Gaweda
Guyon
Hand
He
Hung
Jain
James
Joliffe
Kang
Kim
Kohavi
Krugera
Krzykacz-Hausmann
Kwak
Lallemand
Lavrač
Lemaire
Li
Li
Liu
McRae
Mirkin
Mladenić
Norvig
Park
Quevedo
Ragg
Robert
S. Poslad
S. Tavakoli
Saltelli
Saltelli
Shonkwiler
Sobol
Takagi
Talavera
Tavakoli
Unler
Uysal
Xing
Xu
Yang
Øksendal
Publication venue: 'Elsevier BV'
Publication date: 01/10/2013
Field of study

This is the post-print version of the final paper published in Advanced Engineering Informatics. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2013 Elsevier B.V.The purpose of this research is twofold: first, to undertake a thorough appraisal of existing Input Variable Selection (IVS) methods within the context of time-critical and computation resource-limited dimensionality reduction problems; second, to demonstrate improvements to, and the application of, a recently proposed time-critical sensitivity analysis method called EventTracker to an environment science industrial use-case, i.e., sub-surface drilling. Producing time-critical accurate knowledge about the state of a system (effect) under computational and data acquisition (cause) constraints is a major challenge, especially if the knowledge required is critical to the system operation where the safety of operators or integrity of costly equipment is at stake. Understanding and interpreting, a chain of interrelated events, predicted or unpredicted, that may or may not result in a specific state of the system, is the core challenge of this research. The main objective is then to identify which set of input data signals has a significant impact on the set of system state information (i.e. output). Through a cause-effect analysis technique, the proposed technique supports the filtering of unsolicited data that can otherwise clog up the communication and computational capabilities of a standard supervisory control and data acquisition system. The paper analyzes the performance of input variable selection techniques from a series of perspectives. It then expands the categorization and assessment of sensitivity analysis methods in a structured framework that takes into account the relationship between inputs and outputs, the nature of their time series, and the computational effort required. The outcome of this analysis is that established methods have a limited suitability for use by time-critical variable selection applications. By way of a geological drilling monitoring scenario, the suitability of the proposed EventTracker Sensitivity Analysis method for use in high volume and time critical input variable selection problems is demonstrated.E

Crossref

Brunel University Research Archive

Sacola de grafos textuais : um modelo de representação de textos baseado em grafos, preciso, eficiente e de propósito geral

Author: Dourado Ícaro Cavalcante, 1985-
Publication venue: [s.n.]
Publication date: 01/09/2018
Field of study

Orientador: Ricardo da Silva TorresDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Modelos de representação de textos são o alicerce fundamental para as tarefas de Recuperação de Informação e Mineração de Textos. Apesar de diferentes modelos de representação de textos terem sido propostos, eles não são ao mesmo tempo eficientes, precisos e flexíveis para serem usados em aplicações variadas. Neste projeto, apresentamos a Sacola de Grafos Textuais (do inglês \textit{Bag of Textual Graphs}), um modelo de representação de textos que satisfaz esses três requisitos, ao propor uma combinação de um modelo de representação baseado em grafos com um arcabouço genérico de síntese de grafos em representações vetoriais. Avaliamos nosso método em experimentos considerando quatro coleções textuais bem conhecidas: Reuters-21578, 20-newsgroups, 4-universidades e K-series. Os resultados experimentais demonstram que o nosso modelo é genérico o bastante para lidar com diferentes coleções, e é mais eficiente do que métodos atuais e largamente utilizados em tarefas de classificação e recuperação de textos, sem perda de precisãoAbstract: Text representation models are the fundamental basis for Information Retrieval and Text Mining tasks. Despite different text models have been proposed, they are not at the same time efficient, accurate, and flexible to be used in several applications. Here we present Bag of Textual Graphs, a text representation model that addresses these three requirements, by combining a graph-representation model with an generic framework for graph-to-vector synthesis. We evaluate our method on experiments considering four well-known text collections: Reuters-21578, 20-newsgroups, 4-universities, and K-series. Experimental results demonstrate that our model is generic enough to handle different collections, and is more efficient than widely-used state-of-the-art methods in textual classification and retrieval tasks, without losing accuracy performanceMestradoCiência da ComputaçãoMestre em Ciência da Computaçã

Repositorio da Producao Cientifica e Intelectual da Unicamp

Agregação de ranks baseada em grafos

Author: Dourado Ícaro Cavalcante, 1985-
Publication venue: [s.n.]
Publication date: 24/06/2020
Field of study

Orientador: Ricardo da Silva TorresTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Neste trabalho, apresentamos uma abordagem robusta de agregação de listas baseada em grafos, capaz de combinar resultados de modelos de recuperação isolados. O método segue um esquema não supervisionado, que é independente de como as listas isoladas são geradas. Nossa abordagem é capaz de incorporar modelos heterogêneos, de diferentes critérios de recuperação, tal como baseados em conteúdo textual, de imagem ou híbridos. Reformulamos o problema de recuperação ad-hoc como uma recuperação baseada em fusion graphs, que propomos como um novo modelo de representação unificada capaz de mesclar várias listas e expressar automaticamente inter-relações de resultados de recuperação. Assim, mostramos que o sistema de recuperação se beneficia do aprendizado da estrutura intrínseca das coleções, levando a melhores resultados de busca. Nossa formulação de agregação baseada em grafos, diferentemente das abordagens existentes, permite encapsular informação contextual oriunda de múltiplas listas, que podem ser usadas diretamente para ranqueamento. Experimentos realizados demonstram que o método apresenta alto desempenho, produzindo melhores eficácias que métodos recentes da literatura e promovendo ganhos expressivos sobre os métodos de recuperação fundidos. Outra contribuição é a extensão da proposta de grafo de fusão visando consulta eficiente. Trabalhos anteriores são promissores quanto à eficácia, mas geralmente ignoram questões de eficiência. Propomos uma função inovadora de agregação de consulta, não supervisionada, intrinsecamente multimodal almejando recuperação eficiente e eficaz. Introduzimos os conceitos de projeção e indexação de modelos de representação de agregação de consulta com base em grafos, e a sua aplicação em tarefas de busca. Formulações de projeção são propostas para representações de consulta baseadas em grafos. Introduzimos os fusion vectors, uma representação de fusão tardia de objetos com base em listas, a partir da qual é definido um modelo de recuperação baseado intrinsecamente em agregação. A seguir, apresentamos uma abordagem para consulta rápida baseada nos vetores de fusão, promovendo agregação de consultas eficiente. O método apresentou alta eficácia quanto ao estado da arte, além de trazer uma perspectiva de eficiência pouco abordada. Ganhos consistentes de eficiência são alcançadas em relação aos trabalhos recentes. Também propomos modelos de representação baseados em consulta para problemas gerais de predição. Os conceitos de grafos de fusão e vetores de fusão são estendidos para cenários de predição, nos quais podem ser usados para construir um modelo de estimador para determinar se um objeto de avaliação (ainda que multimodal) se refere a uma classe ou não. Experimentos em tarefas de classificação multimodal, tal como detecção de inundação, mostraram que a solução é altamente eficaz para diferentes cenários de predição que envolvam dados textuais, visuais e multimodais, produzindo resultados melhores que vários métodos recentes. Por fim, investigamos a adoção de abordagens de aprendizagem para ajudar a otimizar a criação de modelos de representação baseados em consultas, a fim de maximizar seus aspectos de capacidade discriminativa e eficiência em tarefas de predição e de buscaAbstract: In this work, we introduce a robust graph-based rank aggregation approach, capable of combining results of isolated ranker models in retrieval tasks. The method follows an unsupervised scheme, which is independent of how the isolated ranks are formulated. Our approach is able to incorporate heterogeneous models, defined in terms of different ranking criteria, such as those based on textual, image, or hybrid content representations. We reformulate the ad-hoc retrieval problem as a graph-based retrieval based on {\em fusion graphs}, which we propose as a new unified representation model capable of merging multiple ranks and expressing inter-relationships of retrieval results automatically. By doing so, we show that the retrieval system can benefit from learning the manifold structure of datasets, thus leading to more effective results. Our graph-based aggregation formulation, unlike existing approaches, allows for encapsulating contextual information encoded from multiple ranks, which can be directly used for ranking. Performed experiments demonstrate that our method reaches top performance, yielding better effectiveness scores than state-of-the-art baseline methods and promoting large gains over the rankers being fused. Another contribution refers to the extension of the fusion graph solution for efficient rank aggregation. Although previous works are promising with respect to effectiveness, they usually overlook efficiency aspects. We propose an innovative rank aggregation function that it is unsupervised, intrinsically multimodal, and targeted for fast retrieval and top effectiveness performance. We introduce the concepts of embedding and indexing graph-based rank-aggregation representation models, and their application for search tasks. Embedding formulations are also proposed for graph-based rank representations. We introduce the concept of {\em fusion vectors}, a late-fusion representation of objects based on ranks, from which an intrinsically rank-aggregation retrieval model is defined. Next, we present an approach for fast retrieval based on fusion vectors, thus promoting an efficient rank aggregation system. Our method presents top effectiveness performance among state-of-the-art related work, while promoting an efficiency perspective not yet covered. Consistent speedups are achieved against the recent baselines in all datasets considered. Derived from the fusion graphs and fusion vectors, we propose rank-based representation models for general prediction problems. The concepts of fusion graphs and fusion vectors are extended to prediction scenarios, where they can be used to build an estimator model to determine whether an input (even multimodal) object refers to a class or not. Performed experiments in the context of multimodal classification tasks, such as flood detection, show that the proposed solution is highly effective for different detection scenarios involving textual, visual, and multimodal features, yielding better detection results than several state-of-the-art methods. Finally, we investigate the adoption of learning approaches to help optimize the creation of rank-based representation models, in order to maximize their discriminative power and efficiency aspects in prediction and search tasksDoutoradoCiência da ComputaçãoDoutor em Ciência da Computaçã

Repositorio da Producao Cientifica e Intelectual da Unicamp

Recommended from our members

Ageneric predictive information system for resource planning and optimisation

Author: Tavakoli Siamak
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2010
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel UniversityThe purpose of this research work is to demonstrate the feasibility of creating a quick response decision platform for middle management in industry. It utilises the strengths of current, but more importantly creates a leap forward in the theory and practice of Supervisory and Data Acquisition (SCADA) systems and Discrete Event Simulation and Modelling (DESM). The proposed research platform uses real-time data and creates an automatic platform for real-time and predictive system analysis, giving current and ahead of time information on the performance of the system in an efficient manner. Data acquisition as the backend connection of data integration system to the shop floor faces both hardware and software challenges for coping with large scale real-time data collection. Limited scope of SCADA systems does not make them suitable candidates for this. Cost effectiveness, complexity, and efficiency-orientation of proprietary solutions leave space for more challenge. A Flexible Data Input Layer Architecture (FDILA) is proposed to address generic data integration platform so a multitude of data sources can be connected to the data processing unit. The efficiency of the proposed integration architecture lies in decentralising and distributing services between different layers. A novel Sensitivity Analysis (SA) method called EvenTracker is proposed as an effective tool to measure the importance and priority of inputs to the system. The EvenTracker method is introduced to deal with the complexity systems in real-time. The approach takes advantage of event-based definition of data involved in process flow. The underpinning logic behind EvenTracker SA method is capturing the cause-effect relationships between triggers (input variables) and events (output variables) at a specified period of time determined by an expert. The approach does not require estimating data distribution of any kind. Neither the performance model requires execution beyond the real-time. The proposed EvenTracker sensitivity analysis method has the lowest computational complexity compared with other popular sensitivity analysis methods. For proof of concept, a three tier data integration system was designed and developed by using National Instruments’ LabVIEW programming language, Rockwell Automation’s Arena simulation and modelling software, and OPC data communication software. A laboratory-based conveyor system with 29 sensors was installed to simulate a typical shop floor production line. In addition, EvenTracker SA method has been implemented on the data extracted from 28 sensors of one manufacturing line in a real factory. The experiment has resulted 14% of the input variables to be unimportant for evaluation of model outputs. The method proved a time efficiency gain of 52% on the analysis of filtered system when unimportant input variables were not sampled anymore. The EvenTracker SA method compared to Entropy-based SA technique, as the only other method that can be used for real-time purposes, is quicker, more accurate and less computationally burdensome. Additionally, theoretic estimation of computational complexity of SA methods based on both structural complexity and energy-time analysis resulted in favour of the efficiency of the proposed EvenTracker SA method. Both laboratory and factory-based experiments demonstrated flexibility and efficiency of the proposed solution.The Engineering and Physical Sciences Research Council

Brunel University Research Archive