92,034 research outputs found
Hybrid Information Retrieval Model For Web Images
The Bing Bang of the Internet in the early 90's increased dramatically the
number of images being distributed and shared over the web. As a result, image
information retrieval systems were developed to index and retrieve image files
spread over the Internet. Most of these systems are keyword-based which search
for images based on their textual metadata; and thus, they are imprecise as it
is vague to describe an image with a human language. Besides, there exist the
content-based image retrieval systems which search for images based on their
visual information. However, content-based type systems are still immature and
not that effective as they suffer from low retrieval recall/precision rate.
This paper proposes a new hybrid image information retrieval model for indexing
and retrieving web images published in HTML documents. The distinguishing mark
of the proposed model is that it is based on both graphical content and textual
metadata. The graphical content is denoted by color features and color
histogram of the image; while textual metadata are denoted by the terms that
surround the image in the HTML document, more particularly, the terms that
appear in the tags p, h1, and h2, in addition to the terms that appear in the
image's alt attribute, filename, and class-label. Moreover, this paper presents
a new term weighting scheme called VTF-IDF short for Variable Term
Frequency-Inverse Document Frequency which unlike traditional schemes, it
exploits the HTML tag structure and assigns an extra bonus weight for terms
that appear within certain particular HTML tags that are correlated to the
semantics of the image. Experiments conducted to evaluate the proposed IR model
showed a high retrieval precision rate that outpaced other current models.Comment: LACSC - Lebanese Association for Computational Sciences,
http://www.lacsc.org/; International Journal of Computer Science & Emerging
Technologies (IJCSET), Vol. 3, No. 1, February 201
Efficient Reorganisation of Hybrid Index Structures Supporting Multimedia Search Criteria
This thesis describes the development and setup of hybrid index structures. They are access methods for retrieval techniques in hybrid data spaces which are formed by one or more relational or normalised columns in conjunction with one non-relational or non-normalised column. Examples for these hybrid data spaces are, among others, textual data combined with geographical ones or data from enterprise content management systems. However, all non-relational data types may be stored as well as image feature vectors or comparable types.
Hybrid index structures are known to function efficiently regarding retrieval operations. Unfortunately, little information is available about reorganisation operations which insert or update the row tuples. The fundamental research is mainly executed in simulation based environments. This work is written ensuing from a previous thesis that implements hybrid access structures in realistic database surroundings. During this implementation it has become obvious that retrieval works efficiently. Yet, the restructuring approaches require too much effort to be set up, e.g., in web search engine environments where several thousands of documents are inserted or modified every day. These search engines rely on relational database systems as storage backends. Hence, the setup of these access methods for hybrid data spaces is required in real world database management systems.
This thesis tries to apply a systematic approach for the optimisation of the rearrangement algorithms inside realistic scenarios. Thus, a measurement and evaluation scheme is created which is repeatedly deployed to an evolving state and a model of hybrid index structures in order to optimise the regrouping algorithms to make a setup of hybrid index structures in real world information systems possible. Thus, a set of input corpora is selected which is applied to the test suite as well as an evaluation scheme.
To sum up, it can be said that this thesis describes input sets, a test suite including an evaluation scheme as well as optimisation iterations on reorganisation algorithms reflecting a theoretical model framework to provide efficient reorganisations of hybrid index structures supporting multimedia search criteria
Applying semantic web technologies to knowledge sharing in aerospace engineering
This paper details an integrated methodology to optimise Knowledge reuse and sharing, illustrated with a use case in the aeronautics domain. It uses Ontologies as a central modelling strategy for the Capture of Knowledge from legacy docu-ments via automated means, or directly in systems interfacing with Knowledge workers, via user-defined, web-based forms. The domain ontologies used for Knowledge Capture also guide the retrieval of the Knowledge extracted from the data using a Semantic Search System that provides support for multiple modalities during search. This approach has been applied and evaluated successfully within the aerospace domain, and is currently being extended for use in other domains on an increasingly large scale
On-Demand Big Data Integration: A Hybrid ETL Approach for Reproducible Scientific Research
Scientific research requires access, analysis, and sharing of data that is
distributed across various heterogeneous data sources at the scale of the
Internet. An eager ETL process constructs an integrated data repository as its
first step, integrating and loading data in its entirety from the data sources.
The bootstrapping of this process is not efficient for scientific research that
requires access to data from very large and typically numerous distributed data
sources. a lazy ETL process loads only the metadata, but still eagerly. Lazy
ETL is faster in bootstrapping. However, queries on the integrated data
repository of eager ETL perform faster, due to the availability of the entire
data beforehand.
In this paper, we propose a novel ETL approach for scientific data
integration, as a hybrid of eager and lazy ETL approaches, and applied both to
data as well as metadata. This way, Hybrid ETL supports incremental integration
and loading of metadata and data from the data sources. We incorporate a
human-in-the-loop approach, to enhance the hybrid ETL, with selective data
integration driven by the user queries and sharing of integrated data between
users. We implement our hybrid ETL approach in a prototype platform, Obidos,
and evaluate it in the context of data sharing for medical research. Obidos
outperforms both the eager ETL and lazy ETL approaches, for scientific research
data integration and sharing, through its selective loading of data and
metadata, while storing the integrated data in a scalable integrated data
repository.Comment: Pre-print Submitted to the DMAH Special Issue of the Springer DAPD
Journa
Reverse spatial visual top-k query
With the wide application of mobile Internet techniques an location-based services (LBS), massive multimedia data with geo-tags has been generated and collected. In this paper, we investigate a novel type of spatial query problem, named reverse spatial visual top- query (RSVQ k ) that aims to retrieve a set of geo-images that have the query as one of the most relevant geo-images in both geographical proximity and visual similarity. Existing approaches for reverse top- queries are not suitable to address this problem because they cannot effectively process unstructured data, such as image. To this end, firstly we propose the definition of RSVQ k problem and introduce the similarity measurement. A novel hybrid index, named VR 2 -Tree is designed, which is a combination of visual representation of geo-image and R-Tree. Besides, an extension of VR 2 -Tree, called CVR 2 -Tree is introduced and then we discuss the calculation of lower/upper bound, and then propose the optimization technique via CVR 2 -Tree for further pruning. In addition, a search algorithm named RSVQ k algorithm is developed to support the efficient RSVQ k query. Comprehensive experiments are conducted on four geo-image datasets, and the results illustrate that our approach can address the RSVQ k problem effectively and efficiently
Collaborative Mapping of London Using Google Maps: The LondonProfiler
This paper begins by reviewing the ways in which the innovation of Google Maps has transformed our ability to reference and view geographically referenced data. We describe the ways in which the GMap Creator tool developed under the ESRC National Centre for E Social Science programme enables users to ‘mashup’ thematic choropleth maps using the Google API. We illustrate the application of GMap Creator using the example of www.londonprofiler.org, which makes it possible to view a range of health, education and other socioeconomic datasets against a backcloth of Google Maps data. Our conclusions address the ways in which Google Map mashups developed using GMap Creator facilitate online exploratory cartographic visualisation in a range of areas of policy concern
An Universal Image Attractiveness Ranking Framework
We propose a new framework to rank image attractiveness using a novel
pairwise deep network trained with a large set of side-by-side multi-labeled
image pairs from a web image index. The judges only provide relative ranking
between two images without the need to directly assign an absolute score, or
rate any predefined image attribute, thus making the rating more intuitive and
accurate. We investigate a deep attractiveness rank net (DARN), a combination
of deep convolutional neural network and rank net, to directly learn an
attractiveness score mean and variance for each image and the underlying
criteria the judges use to label each pair. The extension of this model
(DARN-V2) is able to adapt to individual judge's personal preference. We also
show the attractiveness of search results are significantly improved by using
this attractiveness information in a real commercial search engine. We evaluate
our model against other state-of-the-art models on our side-by-side web test
data and another public aesthetic data set. With much less judgments (1M vs
50M), our model outperforms on side-by-side labeled data, and is comparable on
data labeled by absolute score.Comment: Accepted by 2019 Winter Conference on Application of Computer Vision
(WACV
- …