371 research outputs found

    Deep Learning for Logo Detection: A Survey

    Full text link
    When logos are increasingly created, logo detection has gradually become a research hotspot across many domains and tasks. Recent advances in this area are dominated by deep learning-based solutions, where many datasets, learning strategies, network architectures, etc. have been employed. This paper reviews the advance in applying deep learning techniques to logo detection. Firstly, we discuss a comprehensive account of public datasets designed to facilitate performance evaluation of logo detection algorithms, which tend to be more diverse, more challenging, and more reflective of real life. Next, we perform an in-depth analysis of the existing logo detection strategies and the strengths and weaknesses of each learning strategy. Subsequently, we summarize the applications of logo detection in various fields, from intelligent transportation and brand monitoring to copyright and trademark compliance. Finally, we analyze the potential challenges and present the future directions for the development of logo detection to complete this survey

    Confluence of Vision and Natural Language Processing for Cross-media Semantic Relations Extraction

    Get PDF
    In this dissertation, we focus on extracting and understanding semantically meaningful relationships between data items of various modalities; especially relations between images and natural language. We explore the ideas and techniques to integrate such cross-media semantic relations for machine understanding of large heterogeneous datasets, made available through the expansion of the World Wide Web. The datasets collected from social media websites, news media outlets and blogging platforms usually contain multiple modalities of data. Intelligent systems are needed to automatically make sense out of these datasets and present them in such a way that humans can find the relevant pieces of information or get a summary of the available material. Such systems have to process multiple modalities of data such as images, text, linguistic features, and structured data in reference to each other. For example, image and video search and retrieval engines are required to understand the relations between visual and textual data so that they can provide relevant answers in the form of images and videos to the users\u27 queries presented in the form of text. We emphasize the automatic extraction of semantic topics or concepts from the data available in any form such as images, free-flowing text or metadata. These semantic concepts/topics become the basis of semantic relations across heterogeneous data types, e.g., visual and textual data. A classic problem involving image-text relations is the automatic generation of textual descriptions of images. This problem is the main focus of our work. In many cases, large amount of text is associated with images. Deep exploration of linguistic features of such text is required to fully utilize the semantic information encoded in it. A news dataset involving images and news articles is an example of this scenario. We devise frameworks for automatic news image description generation based on the semantic relations of images, as well as semantic understanding of linguistic features of the news articles

    Awjedni: A Reverse-Image-Search Application

    Get PDF
    The abundance of photos on the internet, along with smartphones that could implement computer vision technologies allow for a unique way to browse the web. These technologies have potential used in many widely accessible and globally available reverse-image search applications. One of these applications is the use of reverse-image search to help people finding items which they're interested in, but they can't name it. This is where Awjedni was born. Awjedni is a reverse-image search application compatible with iOS and Android smartphones built to provide an efficient way to search millions of products on the internet using images only. Awjedni utilizes a computer vision technology through implementing multiple libraries and frameworks to process images, recognize objects, and crawl the web. Users simply upload/take a photo of a desired item and the application returns visually similar items and a direct link to the websites that sell them
    • …
    corecore