18,054 research outputs found
Multi-dimensional modelling for the national mapping agency: a discussion of initial ideas, considerations, and challenges
The Ordnance Survey, the National Mapping Agency (NMA) for Great Britain, has recently
begun to research the possible extension of its 2-dimensional geographic information into a
multi-dimensional environment. Such a move creates a number of data creation and storage
issues which the NMA must consider. Many of these issues are highly relevant to all NMA’s
and their customers alike, and are presented and explored here.
This paper offers a discussion of initial considerations which NMA’s face in the creation of
multi-dimensional datasets. Such issues include assessing which objects should be mapped in
3 dimensions by a National Mapping Agency, what should be sensibly represented
dynamically, and whether resolution of multi-dimensional models should change over space.
This paper also offers some preliminary suggestions for the optimal creation method for any
future enhanced national height model for the Ordnance Survey. This discussion includes
examples of problem areas and issues in both the extraction of 3-D data and in the
topological reconstruction of such. 3-D feature extraction is not a new problem. However, the
degree of automation which may be achieved and the suitability of current techniques for
NMA’s remains a largely unchartered research area, which this research aims to tackle.
The issues presented in this paper require immediate research, and if solved adequately
would mark a cartographic paradigm shift in the communication of geographic information –
and could signify the beginning of the way in which NMA’s both present and interact with
their customers in the future
A Survey on Deep Learning-based Architectures for Semantic Segmentation on 2D images
Semantic segmentation is the pixel-wise labelling of an image. Since the
problem is defined at the pixel level, determining image class labels only is
not acceptable, but localising them at the original image pixel resolution is
necessary. Boosted by the extraordinary ability of convolutional neural
networks (CNN) in creating semantic, high level and hierarchical image
features; excessive numbers of deep learning-based 2D semantic segmentation
approaches have been proposed within the last decade. In this survey, we mainly
focus on the recent scientific developments in semantic segmentation,
specifically on deep learning-based methods using 2D images. We started with an
analysis of the public image sets and leaderboards for 2D semantic
segmantation, with an overview of the techniques employed in performance
evaluation. In examining the evolution of the field, we chronologically
categorised the approaches into three main periods, namely pre-and early deep
learning era, the fully convolutional era, and the post-FCN era. We technically
analysed the solutions put forward in terms of solving the fundamental problems
of the field, such as fine-grained localisation and scale invariance. Before
drawing our conclusions, we present a table of methods from all mentioned eras,
with a brief summary of each approach that explains their contribution to the
field. We conclude the survey by discussing the current challenges of the field
and to what extent they have been solved.Comment: Updated with new studie
Extracting structured information from 2D images
Convolutional neural networks can handle an impressive array of supervised learning tasks while relying on a single backbone architecture, suggesting that one solution fits all vision problems. But for many tasks, we can directly make use of the problem structure within neural networks to deliver more accurate predictions. In this thesis, we propose novel deep learning components that exploit the structured output space of an increasingly complex set of problems. We start from Optical Character Recognition (OCR) in natural scenes and leverage the constraints imposed by a spatial outline of letters and language requirements. Conventional OCR systems do not work well in natural scenes due to distortions, blur, or letter variability. We introduce a new attention-based model, equipped with extra information about the neuron positions to guide its focus across characters sequentially. It beats the previous state-of-the-art benchmark by a significant margin. We then turn to dense labeling tasks employing encoder-decoder architectures. We start with an experimental study that documents the drastic impact that decoder design can have on task performance. Rather than optimizing one decoder per task separately, we propose new robust layers for the upsampling of high-dimensional encodings. We show that these better suit the structured per pixel output across the board of all tasks. Finally, we turn to the problem of urban scene understanding. There is an elaborate structure in both the input space (multi-view recordings, aerial and street-view scenes) and the output space (multiple fine-grained attributes for holistic building understanding). We design new models that benefit from a relatively simple cuboidal-like geometry of buildings to create a single unified representation from multiple views. To benchmark our model, we build a new multi-view large-scale dataset of buildings images and fine-grained attributes and show systematic improvements when compared to a broad range of strong CNN-based baselines
Automatic supervised information extraction of structured web data
The overall purpose of this project is, in short words, to create a system able to extract vital
information from product web pages just like a human would. Information like the name of the
product, its description, price tag, company that produces it, and so on. At a first glimpse, this
may not seem extraordinary or technically difficult, since web scraping techniques exist from long
ago (like the python library Beautiful Soup for instance, an HTML parser1 released in 2004). But
let us think for a second on what it actually means being able to extract desired information from
any given web source: the way information is displayed can be extremely varied, not only visually,
but also semantically. For instance, some hotel booking web pages display at once all prices for
the different room types, while medium-sized consumer products in websites like Amazon offer the
main product in detail and then more small-sized product recommendations further down the page,
being the latter the preferred way of displaying assets by most retail companies. And each with its
own styling and search engines. With the above said, the task of mining valuable data from the
web now does not sound as easy as it first seemed. Hence the purpose of this project is to shine
some light on the Automatic Supervised Information Extraction of Structured Web Data problem.
It is important to think if developing such a solution is really valuable at all. Such an endeavour
both in time and computing resources should lead to a useful end result, at least on paper, to
justify it. The opinion of this author is that it does lead to a potentially valuable result. The
targeted extraction of information of publicly available consumer-oriented content at large scale in
an accurate, reliable and future proof manner could provide an incredibly useful and large amount
of data. This data, if kept updated, could create endless opportunities for Business Intelligence,
although exactly which ones is beyond the scope of this work. A simple metaphor explains the
potential value of this work: if an oil company were to be told where are all the oil reserves in the
planet, it still should need to invest in machinery, workers and time to successfully exploit them,
but half of the job would have already been done2.
As the reader will see in this work, the way the issue is tackled is by building a somehow complex
architecture that ends in an Artificial Neural Network3. A quick overview of such architecture is
as follows: first find the URLs that lead to the product pages that contain the desired data that
is going to be extracted inside a given site (like URLs that lead to ”action figure” products inside
the site ebay.com); second, per each URL passed, extract its HTML and make a screenshot of the
page, and store this data in a suitable and scalable fashion; third, label the data that will be fed to
the NN4; fourth, prepare the aforementioned data to be input in an NN; fifth, train the NN; and
sixth, deploy the NN to make [hopefully accurate] predictions
- …