15,250 research outputs found
Utilising semantic technologies for intelligent indexing and retrieval of digital images
The proliferation of digital media has led to a huge interest in classifying and indexing media objects for generic search and usage. In particular, we are witnessing colossal growth in digital image repositories that are difficult to navigate using free-text search mechanisms, which often return inaccurate matches as they in principle rely on statistical analysis of query keyword recurrence in the image annotation or surrounding text. In this paper we present a semantically-enabled image annotation and retrieval engine that is designed to satisfy the requirements of the commercial image collections market in terms of both accuracy and efficiency of the retrieval process. Our search engine relies on methodically structured ontologies for image annotation, thus allowing for more intelligent reasoning about the image content and subsequently obtaining a more accurate set of results and a richer set of alternatives matchmaking the original query. We also show how our well-analysed and designed domain ontology contributes to the implicit expansion of user queries as well as the exploitation of lexical databases for explicit semantic-based query expansion
Recommended from our members
The quest for information retrieval on the semantic web
Semantic search has been one of the motivations of the Semantic Web since it was envisioned. We propose a model for the exploitation of ontology-based KBs to improve search over large document repositories. The retrieval model is based on an adaptation of the classic vector-space model, including an annotation weighting algorithm, and a ranking algorithm. Semantic search is combined with keyword-based search to achieve tolerance to KB incompleteness. Our proposal has been tested on corpora of significant size, showing promising results with respect to keyword-based search, and providing ground for further analysis and research
Automatic tagging and geotagging in video collections and communities
Automatically generated tags and geotags hold great promise
to improve access to video collections and online communi-
ties. We overview three tasks offered in the MediaEval 2010
benchmarking initiative, for each, describing its use scenario, definition and the data set released. For each task, a reference algorithm is presented that was used within MediaEval 2010 and comments are included on lessons learned. The Tagging Task, Professional involves automatically matching episodes in a collection of Dutch television with subject labels drawn from the keyword thesaurus used by the archive staff. The Tagging Task, Wild Wild Web involves automatically predicting the tags that are assigned by users to their online videos. Finally, the Placing Task requires automatically assigning geo-coordinates to videos. The specification of each task admits the use of the full range of available information including user-generated metadata, speech recognition transcripts, audio, and visual features
REST: A Thread Embedding Approach for Identifying and Classifying User-specified Information in Security Forums
How can we extract useful information from a security forum? We focus on
identifying threads of interest to a security professional: (a) alerts of
worrisome events, such as attacks, (b) offering of malicious services and
products, (c) hacking information to perform malicious acts, and (d) useful
security-related experiences. The analysis of security forums is in its infancy
despite several promising recent works. Novel approaches are needed to address
the challenges in this domain: (a) the difficulty in specifying the "topics" of
interest efficiently, and (b) the unstructured and informal nature of the text.
We propose, REST, a systematic methodology to: (a) identify threads of interest
based on a, possibly incomplete, bag of words, and (b) classify them into one
of the four classes above. The key novelty of the work is a multi-step weighted
embedding approach: we project words, threads and classes in appropriate
embedding spaces and establish relevance and similarity there. We evaluate our
method with real data from three security forums with a total of 164k posts and
21K threads. First, REST robustness to initial keyword selection can extend the
user-provided keyword set and thus, it can recover from missing keywords.
Second, REST categorizes the threads into the classes of interest with superior
accuracy compared to five other methods: REST exhibits an accuracy between
63.3-76.9%. We see our approach as a first step for harnessing the wealth of
information of online forums in a user-friendly way, since the user can loosely
specify her keywords of interest
Recommended from our members
REST: A thread embedding approach for identifying and classifying user-specified information in security forums
Genie: A Generator of Natural Language Semantic Parsers for Virtual Assistant Commands
To understand diverse natural language commands, virtual assistants today are
trained with numerous labor-intensive, manually annotated sentences. This paper
presents a methodology and the Genie toolkit that can handle new compound
commands with significantly less manual effort. We advocate formalizing the
capability of virtual assistants with a Virtual Assistant Programming Language
(VAPL) and using a neural semantic parser to translate natural language into
VAPL code. Genie needs only a small realistic set of input sentences for
validating the neural model. Developers write templates to synthesize data;
Genie uses crowdsourced paraphrases and data augmentation, along with the
synthesized data, to train a semantic parser. We also propose design principles
that make VAPL languages amenable to natural language translation. We apply
these principles to revise ThingTalk, the language used by the Almond virtual
assistant. We use Genie to build the first semantic parser that can support
compound virtual assistants commands with unquoted free-form parameters. Genie
achieves a 62% accuracy on realistic user inputs. We demonstrate Genie's
generality by showing a 19% and 31% improvement over the previous state of the
art on a music skill, aggregate functions, and access control.Comment: To appear in PLDI 201
Science Models as Value-Added Services for Scholarly Information Systems
The paper introduces scholarly Information Retrieval (IR) as a further
dimension that should be considered in the science modeling debate. The IR use
case is seen as a validation model of the adequacy of science models in
representing and predicting structure and dynamics in science. Particular
conceptualizations of scholarly activity and structures in science are used as
value-added search services to improve retrieval quality: a co-word model
depicting the cognitive structure of a field (used for query expansion), the
Bradford law of information concentration, and a model of co-authorship
networks (both used for re-ranking search results). An evaluation of the
retrieval quality when science model driven services are used turned out that
the models proposed actually provide beneficial effects to retrieval quality.
From an IR perspective, the models studied are therefore verified as expressive
conceptualizations of central phenomena in science. Thus, it could be shown
that the IR perspective can significantly contribute to a better understanding
of scholarly structures and activities.Comment: 26 pages, to appear in Scientometric
- …