5 research outputs found

    SHRuB: searching through heuristics for the better query-execution plan

    Get PDF
    An important aspect to be considered for systems aiming at integrating similarity-queries into RDBMS is how to represent and optimize query-plans that involve traditional and complex predicates. Toward developing facilities for such integration, we developed a technique to extract a canonical queryplan command tree from an similarity-extended SQL expression. The SHRuB tool, presented in this paper, is able to interactively represent a query parsetree. We developed a catalog model which allows estimating the execution cost as well as provides hints for optimizing the query-plan by adopting a three stage heuristic. Through a case study and initial experiments, we have demonstrated that the tool is able to find a local-minimum query-execution plan. Moreover, SHRuB can be plugged on existing frameworks that support similarity queries or employed as a course-ware aid for database teaching.FAPESPCNPqCAPE

    Multi-Variate Time Series Similarity Measures and Their Robustness Against Temporal Asynchrony

    Get PDF
    abstract: The amount of time series data generated is increasing due to the integration of sensor technologies with everyday applications, such as gesture recognition, energy optimization, health care, video surveillance. The use of multiple sensors simultaneously for capturing different aspects of the real world attributes has also led to an increase in dimensionality from uni-variate to multi-variate time series. This has facilitated richer data representation but also has necessitated algorithms determining similarity between two multi-variate time series for search and analysis. Various algorithms have been extended from uni-variate to multi-variate case, such as multi-variate versions of Euclidean distance, edit distance, dynamic time warping. However, it has not been studied how these algorithms account for asynchronous in time series. Human gestures, for example, exhibit asynchrony in their patterns as different subjects perform the same gesture with varying movements in their patterns at different speeds. In this thesis, we propose several algorithms (some of which also leverage metadata describing the relationships among the variates). In particular, we present several techniques that leverage the contextual relationships among the variates when measuring multi-variate time series similarities. Based on the way correlation is leveraged, various weighing mechanisms have been proposed that determine the importance of a dimension for discriminating between the time series as giving the same weight to each dimension can led to misclassification. We next study the robustness of the considered techniques against different temporal asynchronies, including shifts and stretching. Exhaustive experiments were carried on datasets with multiple types and amounts of temporal asynchronies. It has been observed that accuracy of algorithms that rely on data to discover variate relationships can be low under the presence of temporal asynchrony, whereas in case of algorithms that rely on external metadata, robustness against asynchronous distortions tends to be stronger. Specifically, algorithms using external metadata have better classification accuracy and cluster separation than existing state-of-the-art work, such as EROS, PCA, and naive dynamic time warping.Dissertation/ThesisMasters Thesis Computer Science 201

    Eficácia de medidas de similaridade para a classificação de séries temporais associadas ao comportamento fenológico de plantas

    Get PDF
    Orientadores: Luiz Camolesi Júnior, Ricardo da Silva TorresDissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de TecnologiaResumo: Fenologia é o estudo de fenômenos naturais periódicos e sua relação com o clima. Nos últimos anos, tem se apresentado relevante como o indicador mais simples e confiável dos efeitos das mudanças climáticas em plantas e animais. É nesse contexto que se destaca o e-phenology, um projeto multidisciplinar envolvendo pesquisas na área de computação e fenologia. Suas principais características são: o uso de novas tecnologias de monitoramento ambiental, o fornecimento de modelos, métodos e algoritmos para apoiar o gerenciamento, a integração e a análise remota de dados de fenologia, além da criação de um protocolo para um programa de monitoramento de fenologia. Do ponto de vista da computação, as pesquisas científicas buscam modelos, ferramentas e técnicas baseadas em processamento de imagem, extraindo e indexando características de imagens associadas a diferentes tipos de vegetação, além de se concentrar no gerenciamento e mineração de dados e no processamento de séries temporais. Diante desse cenário, esse trabalho especificamente, tem como objetivo investigar a eficácia de medidas de similaridade para a classificação de séries temporais sobre fenômenos fenológicos caracterizados por vetores de características extraídos de imagens de vegetação. Os cálculos foram realizados considerando regiões de imagens de vegetação e foram considerados diferentes critérios de avaliação: espécies de planta, hora do dia e canais de cor. Os resultados obtidos oferecem algumas possibilidades de análise, porém na visão geral, a medida de distância Edit Distance with Real Penalty (ERP) apresentou o índice de acerto mais alto com 29,90%. Adicionalmente, resultados obtidos mostram que as primeiras horas do dia e no final da tarde, provavelmente devido à luminosidade, apresentam os índices de acerto mais altos para todas as visões de análiseAbstract: Phenology is the study of periodic natural phenomena and their relationship to climate. In recent years, it has gained importance as the more simple and reliable indicator of effects of climate changes on plants and animals. In this context, we emphasizes the e-phenology, a multidisciplinary research project in computer science and phenology. Its main characteristics are: The use of new technologies for environmental monitoring, providing models, methods and algorithms to support management, integration and remote analysis of data on phenology, and the creation a protocol for a program to monitoring phenology. From the computer science point of view, the e-phenology project has been dedicated to creating models, tools and techniques based on image processing algorithms, extracting and indexing image features associated with different types of vegetation, and implementing data mining algorithms for processing time series. This project has as main goal to investigate the effectiveness of similarity measures for the classification of time series associated with phenological phenomena characterized by feature vectors extracted from images. Conducted experiments considered different regions containing individuals of different species and considering different criteria such as: plant species, time of day and color channels. Obtained results show that the Edit Distance with Real Penalty (ERP) distance measure yields the highest accuracy. Additionally, the analyzes show that in the early morning and late afternoon, probably due to light conditions, it can be observed the highest accuracy rates for all views analysisMestradoTecnologia e InovaçãoMestre em Tecnologi

    Uma proposta para execução de consultas complexas em uma grande base de dados de imagens horizontalmente fragmentada

    Get PDF
    Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Ciência da Computação, Florianópolis, 2014.Sistemas de recuperação de informação têm se tornado cada vez mais populares e eficientes. Porém, a recuperação de objetos complexos (e.g., imagens, vídeos, séries temporais) ainda apresenta enormes desafios, principalmente quando envolve similaridade de conteúdo. O problema se torna ainda mais intrincado se as condições de busca incluem predicados convencionais conectados logicamente à predicados baseados em similaridade. A otimização de tais consultas é um problema em aberto hoje em dia. Este trabalho valida uma proposta para melhorar o desempenho de consultas que podem ser expressas por conjunções de predicados convencionais e baseados em similaridade. Tal proposta utiliza fragmentação de dados, segundo predicados diversos e compatíveis com predicados utilizados em consultas. A validação da proposta é feita sobre uma grande base de dados chamada CoPhIR a respeito de imagens, com dados convencionais a elas relacionados. Esta base é manipulada em um sistema de banco de dados relacional com extensões para o tratamento de predicados baseados em similaridade, caracterizada segundo a distribuição do seu conteúdo, fragmentada e indexada, com métodos de acesso convencionais e métricos. Verificou-se um melhor desempenho na execução de algumas consultas com cláusulas conjuntivas para filtragem de dados utilizando os fragmentos propostos do que sobre a base completa.Abstract : Information retrieval systems are growing in popularity and efficiency. However, the retrieval of complex data (e.g., images, video, temporal series) presents huge challenges yet, particularly when it involves content similarity. The problem becomes even more intricate if the search condition includes conventional predicates logically connected to similarity-based predicates. The optimization of such queries is an open problem nowadays. This work validates a proposal for improving the performance of queries that can be expressed by conjunctions of conventional predicates and similarity-based predicates. This proposal employs data fragmentation, according to diverse predicates, that are compatible with the predicates used in queries. The validation of this proposal is done on a large image database, named CoPhIR with conventional data associated with the images. This database is handled in a relational database system with extensions for coping with similarity-based predicates, characterized according to contents distribution, fragmented and indexed, for efficient access with conventional methods and metric methods. The result of the experiments shows that for some queries with conjunctive filtering clauses were executed more efficiently on fragments than by accessing the complete database

    Algebraic Properties to Optimize kNN Queries

    No full text
    International audienceNew applications that are being required to employ Database Management Systems (DBMSs), such as storing and retrieving complex data (images, sound, temporal series, genetic data, etc.) and analytical data processing (data mining, social networks analysis, etc.), increasingly impose the need for new ways of expressing predicates. Among the new most studied predicates are the similarity-based ones, where the two commonest are the similarity range and the k-nearest neighbor predicates. The k-nearest neighbor predicate is surely the most interesting for several applications, including Content-Based Image Retrieval (CBIR) and Data Mining (DM) tasks, yet it is also the most expensive to be evaluated. A strong motivation to include operators to execute the k-nearest neighbor predicate inside a DBMS is to employ the powerful resource of query rewriting following algebraic properties to optimize query execution. Unfortunately, too few properties of the k-nearest neighbor operator have been identified so far that allow query rewriting rules leading to effectively more efficient query execution. In fact, a k-nearest neighbor operator does not even commute with either other k-nearest neighbor operator or any other attribute comparison operators (similarity range or any other of the traditional attribute comparison operator). In this paper, we investigate a new class of properties for the k-nearest neighbor operator based not on expression equivalence, but on result set inclusion. We develop a complete set of properties based on set inclusion, which can be successfully employed to rewrite query expressions involving k-nearest neighbor operators combined to any of the traditional attribute comparison operators or to other k-nearest neighbor and similarity range operators. We also give examples of how applying those properties to rewrite queries improve retrieval efficiency
    corecore