Search CORE

7,360 research outputs found

Using Incomplete Information for Complete Weight Annotation of Road Networks -- Extended Version

Author: Jensen Christian S.
Kaul Manohar
Yang Bin
Publication venue
Publication date: 01/01/2013
Field of study

We are witnessing increasing interests in the effective use of road networks. For example, to enable effective vehicle routing, weighted-graph models of transportation networks are used, where the weight of an edge captures some cost associated with traversing the edge, e.g., greenhouse gas (GHG) emissions or travel time. It is a precondition to using a graph model for routing that all edges have weights. Weights that capture travel times and GHG emissions can be extracted from GPS trajectory data collected from the network. However, GPS trajectory data typically lack the coverage needed to assign weights to all edges. This paper formulates and addresses the problem of annotating all edges in a road network with travel cost based weights from a set of trips in the network that cover only a small fraction of the edges, each with an associated ground-truth travel cost. A general framework is proposed to solve the problem. Specifically, the problem is modeled as a regression problem and solved by minimizing a judiciously designed objective function that takes into account the topology of the road network. In particular, the use of weighted PageRank values of edges is explored for assigning appropriate weights to all edges, and the property of directional adjacency of edges is also taken into account to assign weights. Empirical studies with weights capturing travel time and GHG emissions on two road networks (Skagen, Denmark, and North Jutland, Denmark) offer insight into the design properties of the proposed techniques and offer evidence that the techniques are effective.Comment: This is an extended version of "Using Incomplete Information for Complete Weight Annotation of Road Networks," which is accepted for publication in IEEE TKD

arXiv.org e-Print Archive

CiteSeerX

VBN

Multi modal multi-semantic image retrieval

Author: Kesorn Kraisak
Publication venue
Publication date: 01/01/2010
Field of study

PhDThe rapid growth in the volume of visual information, e.g. image, and video can overwhelm users’ ability to find and access the specific visual information of interest to them. In recent years, ontology knowledge-based (KB) image information retrieval techniques have been adopted into in order to attempt to extract knowledge from these images, enhancing the retrieval performance. A KB framework is presented to promote semi-automatic annotation and semantic image retrieval using multimodal cues (visual features and text captions). In addition, a hierarchical structure for the KB allows metadata to be shared that supports multi-semantics (polysemy) for concepts. The framework builds up an effective knowledge base pertaining to a domain specific image collection, e.g. sports, and is able to disambiguate and assign high level semantics to ‘unannotated’ images. Local feature analysis of visual content, namely using Scale Invariant Feature Transform (SIFT) descriptors, have been deployed in the ‘Bag of Visual Words’ model (BVW) as an effective method to represent visual content information and to enhance its classification and retrieval. Local features are more useful than global features, e.g. colour, shape or texture, as they are invariant to image scale, orientation and camera angle. An innovative approach is proposed for the representation, annotation and retrieval of visual content using a hybrid technique based upon the use of an unstructured visual word and upon a (structured) hierarchical ontology KB model. The structural model facilitates the disambiguation of unstructured visual words and a more effective classification of visual content, compared to a vector space model, through exploiting local conceptual structures and their relationships. The key contributions of this framework in using local features for image representation include: first, a method to generate visual words using the semantic local adaptive clustering (SLAC) algorithm which takes term weight and spatial locations of keypoints into account. Consequently, the semantic information is preserved. Second a technique is used to detect the domain specific ‘non-informative visual words’ which are ineffective at representing the content of visual data and degrade its categorisation ability. Third, a method to combine an ontology model with xi a visual word model to resolve synonym (visual heterogeneity) and polysemy problems, is proposed. The experimental results show that this approach can discover semantically meaningful visual content descriptions and recognise specific events, e.g., sports events, depicted in images efficiently. Since discovering the semantics of an image is an extremely challenging problem, one promising approach to enhance visual content interpretation is to use any associated textual information that accompanies an image, as a cue to predict the meaning of an image, by transforming this textual information into a structured annotation for an image e.g. using XML, RDF, OWL or MPEG-7. Although, text and image are distinct types of information representation and modality, there are some strong, invariant, implicit, connections between images and any accompanying text information. Semantic analysis of image captions can be used by image retrieval systems to retrieve selected images more precisely. To do this, a Natural Language Processing (NLP) is exploited firstly in order to extract concepts from image captions. Next, an ontology-based knowledge model is deployed in order to resolve natural language ambiguities. To deal with the accompanying text information, two methods to extract knowledge from textual information have been proposed. First, metadata can be extracted automatically from text captions and restructured with respect to a semantic model. Second, the use of LSI in relation to a domain-specific ontology-based knowledge model enables the combined framework to tolerate ambiguities and variations (incompleteness) of metadata. The use of the ontology-based knowledge model allows the system to find indirectly relevant concepts in image captions and thus leverage these to represent the semantics of images at a higher level. Experimental results show that the proposed framework significantly enhances image retrieval and leads to narrowing of the semantic gap between lower level machinederived and higher level human-understandable conceptualisation

Queen Mary Research Online

DEEP FULLY RESIDUAL CONVOLUTIONAL NEURAL NETWORK FOR SEMANTIC IMAGE SEGMENTATION

Author: Tousi Ali
Publication venue: Graduate School of UNIST
Publication date: 01/08/2018
Field of study

Department of Computer Science and EngineeringThe goal of semantic image segmentation is to partition the pixels of an image into semantically meaningful parts and classifying those parts according to a predefined label set. Although object recognition models achieved remarkable performance recently and they even surpass human???s ability to recognize objects, but semantic segmentation models are still behind. One of the reason that makes semantic segmentation relatively a hard problem is the image understanding at pixel level by considering global context as oppose to object recognition. One other challenge is transferring the knowledge of an object recognition model for the task of semantic segmentation. In this thesis, we are delineating some of the main challenges we faced approaching semantic image segmentation with machine learning algorithms. Our main focus was how we can use deep learning algorithms for this task since they require the least amount of feature engineering and also it was shown that such models can be applied to large scale datasets and exhibit remarkable performance. More precisely, we worked on a variation of convolutional neural networks (CNN) suitable for the semantic segmentation task. We proposed a model called deep fully residual convolutional networks (DFRCN) to tackle this problem. Utilizing residual learning makes training of deep models feasible which ultimately leads to having a rich powerful visual representation. Our model also benefits from skip-connections which ease the propagation of information from the encoder module to the decoder module. This would enable our model to have less parameters in the decoder module while it also achieves better performance. We also benchmarked the effective variation of the proposed model on a semantic segmentation benchmark. We first make a thorough review of current high-performance models and the problems one might face when trying to replicate such models which mainly arose from the lack of sufficient provided information. Then, we describe our own novel method which we called deep fully residual convolutional network (DFRCN). We showed that our method exhibits state of the art performance on a challenging benchmark for aerial image segmentation.clos

ScholarWorks@UNIST

Deteção de veículos e edifícios em imagens aéreas obtidas por drone

Author: Amante Rita Filipa dos Santos
Publication venue
Publication date: 21/07/2022
Field of study

The need to develop software for aerial image analysis, captured by Unmanned Aerial Vehicles, has increased over the years because their use has become more prevalent in different day-to-day scenarios. Object detection, a Computer Vision technique, is one of the most explored problems in this area and consists of identifying and locating objects in images or videos, with the help of Artificial Intelligence technologies. The aim of this dissertation is to analyze the performance of Deep Learning algorithms for detecting vehicles and buildings in aerial images. Two of the main algorithms described in literature, Faster R-CNN and YOLO, the latter in the third and fifth versions, were chosen to verify which one is capable of better performance. The dataset provided by the Portuguese Military Academy, which was annotated and pre-processed, was used for the training of each algorithm and the performance of tests. The results obtained in the abovementioned dataset demonstrate that there is a considerable discrepancy between the two algorithms, both in terms of performance and speed. Faster R-CNN only proved to be superior to the two versions of YOLO in terms of training speed, as it was the algorithm that required less time for training. Among the versions of YOLO, the fifth version showed the best results.A necessidade de desenvolver software para a análise de imagem aérea, capturada por Veículos Aéreos Não Tripulados, tem vindo a aumentar ao longo dos anos devido ao facto de serem cada vez mais utilizadas em diversos cenários do dia-a-dia. A deteção de objetos, técnica da Visão Computacional, é um dos problemas mais explorados nesta área e consiste na identificação e localização de objetos em imagens ou vídeos, com o auxílio de tecnologias de Inteligência Artificial. Pretende-se com esta dissertação analisar o desempenho de algoritmos de Aprendizagem Profunda, para a deteção de veículos e edifícios em imagens aéreas. Foram escolhidos dois dos principais algoritmos descritos na literatura, Faster R-CNN e YOLO, este último na terceira e quinta versão, por forma a verificar qual apresenta melhor desempenho. Para o treino de cada algoritmo e realização de testes foi utilizado um conjunto de dados fornecido pela Academia Militar Portuguesa, o qual foi anotado e pré-processado. Os resultados obtidos, no referido conjunto de dados, demonstraram que existe uma discrepância considerável entre os dois algoritmos, tanto a nível do desempenho como do tempo de deteção. O Faster R-CNN apenas se mostrou superior em relação às duas versões do YOLO no tempo de treino, pois foi o algoritmo que precisou de menos tempo. Entre as versões do YOLO, a quinta versão foi a que apresentou melhores resultados.Mestrado em Engenharia de Computadores e Telemátic

Repositório Institucional da Universidade de Aveiro

Learning Contextualized Semantics from Co-occurring Terms via a Siamese Architecture

Author: Chen Ke
Sandouk Ubai
Publication venue
Publication date: 17/06/2015
Field of study

One of the biggest challenges in Multimedia information retrieval and understanding is to bridge the semantic gap by properly modeling concept semantics in context. The presence of out of vocabulary (OOV) concepts exacerbates this difficulty. To address the semantic gap issues, we formulate a problem on learning contextualized semantics from descriptive terms and propose a novel Siamese architecture to model the contextualized semantics from descriptive terms. By means of pattern aggregation and probabilistic topic models, our Siamese architecture captures contextualized semantics from the co-occurring descriptive terms via unsupervised learning, which leads to a concept embedding space of the terms in context. Furthermore, the co-occurring OOV concepts can be easily represented in the learnt concept embedding space. The main properties of the concept embedding space are demonstrated via visualization. Using various settings in semantic priming, we have carried out a thorough evaluation by comparing our approach to a number of state-of-the-art methods on six annotation corpora in different domains, i.e., MagTag5K, CAL500 and Million Song Dataset in the music domain as well as Corel5K, LabelMe and SUNDatabase in the image domain. Experimental results on semantic priming suggest that our approach outperforms those state-of-the-art methods considerably in various aspects

arXiv.org e-Print Archive

Crossref

The University of Manchester - Institutional Repository

'Part'ly first among equals: Semantic part-based benchmarking for state-of-the-art object recognition systems

Author: A Borji
A Khosla
A Torralba
CW Tyler
D Hoiem
GA Miller
L Xu
M Everingham
MD Zeiler
PF Felzenszwalb
SE Palmer
T Tommasi
T-Y Lin
Publication venue
Publication date: 24/11/2016
Field of study

An examination of object recognition challenge leaderboards (ILSVRC, PASCAL-VOC) reveals that the top-performing classifiers typically exhibit small differences amongst themselves in terms of error rate/mAP. To better differentiate the top performers, additional criteria are required. Moreover, the (test) images, on which the performance scores are based, predominantly contain fully visible objects. Therefore, `harder' test images, mimicking the challenging conditions (e.g. occlusion) in which humans routinely recognize objects, need to be utilized for benchmarking. To address the concerns mentioned above, we make two contributions. First, we systematically vary the level of local object-part content, global detail and spatial context in images from PASCAL VOC 2010 to create a new benchmarking dataset dubbed PPSS-12. Second, we propose an object-part based benchmarking procedure which quantifies classifiers' robustness to a range of visibility and contextual settings. The benchmarking procedure relies on a semantic similarity measure that naturally addresses potential semantic granularity differences between the category labels in training and test datasets, thus eliminating manual mapping. We use our procedure on the PPSS-12 dataset to benchmark top-performing classifiers trained on the ILSVRC-2012 dataset. Our results show that the proposed benchmarking procedure enables additional differentiation among state-of-the-art object classifiers in terms of their ability to handle missing content and insufficient object detail. Given this capability for additional differentiation, our approach can potentially supplement existing benchmarking procedures used in object recognition challenge leaderboards.Comment: Extended version of our ACCV-2016 paper. Author formatting modifie

arXiv.org e-Print Archive

Crossref

Learning Contextualized Semantics from Co-occurring Terms via a Siamese Architecture

Author: Chen Ke
Sandouk Ubai
Publication venue: 'Elsevier BV'
Publication date: 01/04/2016
Field of study

The University of Manchester - Institutional Repository

Features for Killer Apps from a Semantic Web Perspective

Author: Alani Harith
Kalfoglou Yannis
O'Hara Kieron
Shadbolt Nigel
Publication venue: Information Science Reference
Publication date: 01/01/2008
Field of study

There are certain features that that distinguish killer apps from other ordinary applications. This chapter examines those features in the context of the semantic web, in the hope that a better understanding of the characteristics of killer apps might encourage their consideration when developing semantic web applications. Killer apps are highly tranformative technologies that create new e-commerce venues and widespread patterns of behaviour. Information technology, generally, and the Web, in particular, have benefited from killer apps to create new networks of users and increase its value. The semantic web community on the other hand is still awaiting a killer app that proves the superiority of its technologies. The authors hope that this chapter will help to highlight some of the common ingredients of killer apps in e-commerce, and discuss how such applications might emerge in the semantic web

Southampton (e-Prints Soton)

Open Research Online (The Open University)