2,925 research outputs found

    Component-based Attention for Large-scale Trademark Retrieval

    Full text link
    The demand for large-scale trademark retrieval (TR) systems has significantly increased to combat the rise in international trademark infringement. Unfortunately, the ranking accuracy of current approaches using either hand-crafted or pre-trained deep convolution neural network (DCNN) features is inadequate for large-scale deployments. We show in this paper that the ranking accuracy of TR systems can be significantly improved by incorporating hard and soft attention mechanisms, which direct attention to critical information such as figurative elements and reduce attention given to distracting and uninformative elements such as text and background. Our proposed approach achieves state-of-the-art results on a challenging large-scale trademark dataset.Comment: Fix typos related to authors' informatio

    Open Set Logo Detection and Retrieval

    Full text link
    Current logo retrieval research focuses on closed set scenarios. We argue that the logo domain is too large for this strategy and requires an open set approach. To foster research in this direction, a large-scale logo dataset, called Logos in the Wild, is collected and released to the public. A typical open set logo retrieval application is, for example, assessing the effectiveness of advertisement in sports event broadcasts. Given a query sample in shape of a logo image, the task is to find all further occurrences of this logo in a set of images or videos. Currently, common logo retrieval approaches are unsuitable for this task because of their closed world assumption. Thus, an open set logo retrieval method is proposed in this work which allows searching for previously unseen logos by a single query sample. A two stage concept with separate logo detection and comparison is proposed where both modules are based on task specific CNNs. If trained with the Logos in the Wild data, significant performance improvements are observed, especially compared with state-of-the-art closed set approaches.Comment: accepted at VISAPP 201

    A Binary Neural Shape Matcher using Johnson Counters and Chain Codes

    Get PDF
    In this paper, we introduce a neural network-based shape matching algorithm that uses Johnson Counter codes coupled with chain codes. Shape matching is a fundamental requirement in content-based image retrieval systems. Chain codes describe shapes using sequences of numbers. They are simple and flexible. We couple this power with the efficiency and flexibility of a binary associative-memory neural network. We focus on the implementation details of the algorithm when it is constructed using the neural network. We demonstrate how the binary associative-memory neural network can index and match chain codes where the chain code elements are represented by Johnson codes

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Learning Test-time Data Augmentation for Image Retrieval with Reinforcement Learning

    Full text link
    Off-the-shelf convolutional neural network features achieve outstanding results in many image retrieval tasks. However, their invariance is pre-defined by the network architecture and training data. Existing image retrieval approaches require fine-tuning or modification of the pre-trained networks to adapt to the variations in the target data. In contrast, our method enhances the invariance of off-the-shelf features by aggregating features extracted from images augmented with learned test-time augmentations. The optimal ensemble of test-time augmentations is learned automatically through reinforcement learning. Our training is time and resources efficient, and learns a diverse test-time augmentations. Experiment results on trademark retrieval (METU trademark dataset) and landmark retrieval (Oxford5k and Paris6k scene datasets) tasks show the learned ensemble of transformations is effective and transferable. We also achieve state-of-the-art MAP@100 results on the METU trademark dataset

    Named Entity Recognition in Twitter using Images and Text

    Full text link
    Named Entity Recognition (NER) is an important subtask of information extraction that seeks to locate and recognise named entities. Despite recent achievements, we still face limitations with correctly detecting and classifying entities, prominently in short and noisy text, such as Twitter. An important negative aspect in most of NER approaches is the high dependency on hand-crafted features and domain-specific knowledge, necessary to achieve state-of-the-art results. Thus, devising models to deal with such linguistically complex contexts is still challenging. In this paper, we propose a novel multi-level architecture that does not rely on any specific linguistic resource or encoded rule. Unlike traditional approaches, we use features extracted from images and text to classify named entities. Experimental tests against state-of-the-art NER for Twitter on the Ritter dataset present competitive results (0.59 F-measure), indicating that this approach may lead towards better NER models.Comment: The 3rd International Workshop on Natural Language Processing for Informal Text (NLPIT 2017), 8 page

    A high performance k-NN approach using binary neural networks

    Get PDF
    This paper evaluates a novel k-nearest neighbour (k-NN) classifier built from binary neural networks. The binary neural approach uses robust encoding to map standard ordinal, categorical and numeric data sets onto a binary neural network. The binary neural network uses high speed pattern matching to recall a candidate set of matching records, which are then processed by a conventional k-NN approach to determine the k-best matches. We compare various configurations of the binary approach to a conventional approach for memory overheads, training speed, retrieval speed and retrieval accuracy. We demonstrate the superior performance with respect to speed and memory requirements of the binary approach compared to the standard approach and we pinpoint the optimal configurations. (C) 2003 Elsevier Ltd. All rights reserved

    Using CNNs in the domain of Visual Trademark Retrieval

    Get PDF
    To be specified upon arrival at University of Klagenfurt.Nowadays we are immersed in the age of Artificial Intelligence and it seems that every application has to be developed following this tendency. But even this actual boom, this technique is not so common to find in the domain of image trademark retrieval. Thus, what this thesis proposes is to create a tool to help the people in charge of the process of indexing and classifying the upcoming visual trademarks, to support them with suggestions following a standard classification. The project involves all the process end-to-end of a classification task in deep learning. From downloading and extraction of a data set for training and testing, passing through the analysis of the state of the art and finishing with the evaluation of the results in our own database. As this field is constantly developing we cannot predict what the future will bring. However, using deep learning in this scenario may help in the future to have more concise labelling and classification of visual trademarks compared to the results of the manual process.Hoy en día no encontramos inmersos en la era de la inteligencia artificial y parece que todas las aplicaciones hayan de ser desarrolladas siguiendo esta tendencia. Pero pese el actual boom en el que vivimos, esta técnica no es muy común encontrarla en el dominio de la retribución de imágenes de marcas. Por lo consiguiente, este trabajo propone crear una herramienta para ayudar a las personas que se encargan del proceso de indexación y clasificación de nuevas marcas, apoyándolos con sugerencias para seguir una clasificación estándar. El proyecto involucra todo el proceso de inicio a final de una tarea de clasificación de aprendizaje profundo. Yendo desde la descarga y extracción de conjunto de datos de entrenamiento y test, pasando por el análisis del estado del arte, y acabando por la evaluación de los resultados en nuestra propia base de datos. Ya que este campo está en constante desarrollo, no podemos predecir qué nos deparará el futuro. Aún y así, utilizando el aprendizaje profundo en este escenario, puede ayudar en un futuro a tener una tarea de etiquetado y clasificación de marcas comerciales comparables a los resultados del proceso manual.Avui en dia ens trobem immerses en l’era de la intel·ligència artificial i sembla que totes les aplicacions hagin de ser desenvolupades seguint aquesta tendència. Però tot i l’actual boom en que vivim, aquesta tècnica no es gaire comú trobar-la en el domini de la retribució d’imatges de marques. Per tant, el que proposa aquest treball és crear una eina per ajudar les persones que s’encarreguen del procés d’indexar i classificar les noves marques, recolzant-los amb suggeriments per seguir una classificació estàndard. El projecte involucra tot el procés de principi a fi d’una tasca de classificació en aprenentatge profund. Des de la descàrrega i extracció d’un conjunt de dades d’entrenament i test, passant per l’anàlisi de l’estat de l’art i acabant per l’avaluació dels resultats en la nostra pròpia base de dades. Ja que aquest camp està en constant desenvolupament, no podem predir el que ens depararà el futur. Tot i així, usant l’aprenentatge profund en aquest escenari, pot ajudar en un futur per tenir una tasca d’etiquetatge i classificació de marques comercials comparable als resultats del procés manual

    CNN-Siamese 네트워크를 활용한 문자 상표 발음 유사성 탐지

    Get PDF
    학위논문(석사) -- 서울대학교대학원 : 공과대학 산업공학과, 2022. 8. 조성준.Recently, as the number of registered trademarks has rapidly increased, research to determine trademark similarity based on machine learning has been actively con- ducted. Similarity of trademarks is judged based on shapes, meaning, and pronun- ciation. In the case of pronunciation, there is a limit in judging similarity because the standards for similarity are ambiguous and spellings do not correspond to pro- nunciation in many cases. On the other hand, the performance of converting text into speech has been remarkably improved due to the recent development of speech synthesis technology. In this paper, we propose a deep learning framework that au- tomatically determines the pronunciation similarity of trademarks using speech data converted using speech synthesis technology. First, after synthesizing the trademark text into speech, it is converted into a log Mel spectrogram, and feature learning is performed through a convolutional neural network with a triplet loss. To compare the proposed method with previous studies, the trademark text dataset provided by AIhub was used, and our proposed method showed superior performance than the previous studies.최근 등록되는 상표의 수가 빠르게 증가함에 따라 기계학습을 기반으로 상표 유사성을 판단하려는 연구가 활발히 진행되어 왔다. 상표의 유사성은 도형, 관념, 발음을 기준으 로 판단되는데, 발음의 경우 유사함의 기준이 모호하며 철자가 발음에 대응되지 않는 경우가 많기 때문에 유사성을 판단하는데 한계가 존재한다. 한편, 최근 음성 합성 기술의 발달로 인해 텍스트를 음성으로 변환하는 성능이 눈에 띄게 향상하였다. 본 논문은 음 성합성기술을 활용하여 상표의 발음 유사성을 자동으로 판단하는 딥러닝 프레임워크를 제안한다. 먼저, 상표 텍스트를 음성으로 합성한 뒤, log Mel Spectrogram 으로 변환 하고 합성곱 신경망과 삼중항 손실을 통해 feature 학습을 진행한다. 제안하는 방법과 선행 연구를 비교하기 위해 AIhub 에서 제공하는 상표 텍스트 데이터셋을 활용하였고, 제안하는 방식이 선행 연구를 앞서는 것을 확인하였다.Chapter 1 Introduction 1 Chapter 2 Related Work 5 Chapter 3 Proposed Method 8 3.1 Model Architecture 8 3.2 EvaluationMetric 12 Chapter 4 Datasets 14 4.1 Traindataset 14 4.2 Testdataset 15 4.3 Speechdataset 15 4.4 Preprocessing 15 Chapter 5 Experimental Results 18 5.1 Experiment1: Compare different input type 18 5.2 Experiment 2: Compare signal processing methods 19 5.3 Experiment3:Comparebackbonenetworks 20 5.4 Experiment4:Comparebaselinemodels 21 Chapter 6 Conclusion 23 Bibliography 25 국문초록 28 감사의 글 29석
    corecore