67,144 research outputs found
Soft Seeded SSL Graphs for Unsupervised Semantic Similarity-based Retrieval
Semantic similarity based retrieval is playing an increasingly important role
in many IR systems such as modern web search, question-answering, similar
document retrieval etc. Improvements in retrieval of semantically similar
content are very significant to applications like Quora, Stack Overflow, Siri
etc. We propose a novel unsupervised model for semantic similarity based
content retrieval, where we construct semantic flow graphs for each query, and
introduce the concept of "soft seeding" in graph based semi-supervised learning
(SSL) to convert this into an unsupervised model.
We demonstrate the effectiveness of our model on an equivalent question
retrieval problem on the Stack Exchange QA dataset, where our unsupervised
approach significantly outperforms the state-of-the-art unsupervised models,
and produces comparable results to the best supervised models. Our research
provides a method to tackle semantic similarity based retrieval without any
training data, and allows seamless extension to different domain QA
communities, as well as to other semantic equivalence tasks.Comment: Published in Proceedings of the 2017 ACM Conference on Information
and Knowledge Management (CIKM '17
Smart Search Engine For Information Retrieval
This project addresses the main research problem in information retrieval and semantic search. It proposes the smart search theory as new theory based on hypothesis that semantic meanings of a document can be described by a set of
keywords. With two experiments designed and carried out in this project, the experiment result demonstrates positive evidence that meet the smart search theory.
In the theory proposed in this project, the smart search aims to determine a set of keywords for any web documents, by which the semantic meanings of the documents can be uniquely identified. Meanwhile, the size of the set of keywords is supposed to be small enough which can be easily managed. This is the fundamental assumption for creating the smart semantic search engine. In this project, the rationale of the assumption and the theory based on it will be discussed, as well as the processes of how the theory can be applied to the keyword allocation and the data model to be
generated. Then the design of the smart search engine will be proposed, in order to create a solution to the efficiency problem while searching among huge amount of increasing information published on the web.
To achieve high efficiency in web searching, statistical method is proved to be an effective way and it can be interpreted from the semantic level. Based on the frequency of joint keywords, the keyword list can be generated and linked to each other to form a meaning structure. A data model is built when a proper keyword list is achieved and the model is applied to the design of the smart search engine
Integrating Semantic Web with Knowledge Management Agent
This paper is about the fundamental of a Semantic Web Service system. This system
is always related to a search engine. By integrating semantic web with a knowledge
representation agent, this will help user to reduce the searching time, as the agent will
classify the output of their search using this web service. Besides, by having such
system it also can ease the user in form of interacting with the data and to appreciate
the data management and knowledge management. Having a Semantic web is quite
important as Semantic web will be a platform, where the knowledge management
(KM), will applied and perform their task in doing the classification of the data.
Semantic web need to be constructed in user-friendly environment in which user can
use the Semantic web as a channel to transfer knowledge from user to the system. The
construction of semantic web require a certain framework and language as a tools,
this will add value to the database as it will query the related web link in a particular
manner. Beside, semantic web will also include URI as its framework, and one of the
URI been used is RDF (Resource Description Framework) in which Once information
is in RDF form, it becomes easy to process it, since RDF is a generic format, which
already has many parsers. In addition, this will satisfied the last objective of building
the semantic web. The need of semantic web service are based on the problem faced
by user as when they use any search engine to obtain some information, the output of
the search will produce the result in a general form. Besides, the problem with the
majority of data on the Web that is in this form now is that it is difficult to use on a
large scale, because there is no global system for publishing data in such a way as it
can be easily processed by anyone. The vision of Semantic Web envisage the web
enriched with several domain ontologies, which specify formal semantic of data and
that can be used for different intelligent service, like information research, retrieval,
and transformation. The suitable methodology to be used is waterfall model as it
provides flexibility on developing this semantic web.
i
User-centered semantic dataset retrieval
Finding relevant research data is an increasingly important but time-consuming task in daily research practice. Several studies report on difficulties in dataset search, e.g., scholars retrieve only partial pertinent data, and important information can not be displayed in the user interface. Overcoming these problems has motivated a number of research efforts in computer science, such as text mining and semantic search. In particular, the emergence of the Semantic Web opens a variety of novel research perspectives. Motivated by these challenges, the overall aim of this work is to analyze the current obstacles in dataset search and to propose and develop a novel semantic dataset search. The studied domain is biodiversity research, a domain that explores the diversity of life, habitats and ecosystems. This thesis has three main contributions: (1) We evaluate the current situation in dataset search in a user study, and we compare a semantic search with a classical keyword search to explore the suitability of semantic web technologies for dataset search. (2) We generate a question corpus and develop an information model to figure out on what scientific topics scholars in biodiversity research are interested in. Moreover, we also analyze the gap between current metadata and scholarly search interests, and we explore whether metadata and user interests match. (3) We propose and develop an improved dataset search based on three components: (A) a text mining pipeline, enriching metadata and queries with semantic categories and URIs, (B) a retrieval component with a semantic index over categories and URIs and (C) a user interface that enables a search within categories and a search including further hierarchical relations. Following user centered design principles, we ensure user involvement in various user studies during the development process
A Combined Approach of Structured and Non-structured IR in Multimodal Domain
We present a generic model for multimodal information retrieval, leveraging different information sources to improve the effectiveness of a retrieval system. The proposed method is able to take into account both explicit and latent semantics present in the data and can be used to answer complex queries, not currently answerable neither by document retrieval systems, nor by semantic web systems. By providing a hybrid approach combining IR and structured search techniques, we prepare a framework applicable to multimodal data collections. To test its effectiveness, we instantiate the model for an image retrieval task
Enhanced web log based recommendation by personalized retrieval
University of Technology, Sydney. Faculty of Engineering and Information Technology.With the rapid development of the Internet and WWW, it is more and more important for people to access quality web information. Thus the problem of enabling users to quickly and accurately find information has become an urgent issue. As one of the basic ways to solve this problem, personalized information services have been focusing on fulfilling the personalized information requirements of different users based on their actual demands, preference characteristics, behaviour patterns, etc. This thesis focuses on enhancing web log based recommendation by personalized retrieval, and its main works and innovations include:
âą For personalized retrieval, the thesis proposes two models to improve user experience and optimize search performance. The first is a query suggestion model based on query semantics and click-through data. This model calculates the subject relevance between queries, and then combines the semantic information and the relevance of the query-click matrix model as this can effectively eliminate the ambiguity and input errors of reminder queries. The second is a collaborative filtering retrieval model based on local and global features. By the integration of the local and global characteristics of the accessed information, this model overcomes the limitations of a single feature, and increases the degree of application of the retrieval model.
âą For recommendation by personalized retrieval, we propose two recommendation models based on the web log. The first is based on the userâs atomic retrieval transaction sequence and the browse characteristics. This model decomposes search transactions, and calculates the userâs degree of interest on the search term, which allows users to query information more clearly. Further, it incorporates the user feedback on the search results evaluation value, which overcomes the shortcomings of the model based on content filtering. The second model is based on user interests association findings, which can be used to: find the relationship between resources accessed by users, extract the associations of user interests, and address the problem of user interests isolation
A Semantics-based User Interface Model for Content Annotation, Authoring and Exploration
The Semantic Web and Linked Data movements with the aim of creating, publishing and interconnecting machine readable information have gained traction in the last years.
However, the majority of information still is contained in and exchanged using unstructured documents, such as Web pages, text documents, images and videos.
This can also not be expected to change, since text, images and videos are the natural way in which humans interact with information.
Semantic structuring of content on the other hand provides a wide range of advantages compared to unstructured information.
Semantically-enriched documents facilitate information search and retrieval, presentation, integration, reusability, interoperability and personalization.
Looking at the life-cycle of semantic content on the Web of Data, we see quite some progress on the backend side in storing structured content or for linking data and schemata.
Nevertheless, the currently least developed aspect of the semantic content life-cycle is from our point of view the user-friendly manual and semi-automatic creation of rich semantic content.
In this thesis, we propose a semantics-based user interface model, which aims to reduce the complexity of underlying technologies for semantic enrichment of content by Web users.
By surveying existing tools and approaches for semantic content authoring, we extracted a set of guidelines for designing efficient and effective semantic authoring user interfaces.
We applied these guidelines to devise a semantics-based user interface model called WYSIWYM (What You See Is What You Mean) which enables integrated authoring, visualization and exploration of unstructured and (semi-)structured content.
To assess the applicability of our proposed WYSIWYM model, we incorporated the model into four real-world use cases comprising two general and two domain-specific applications.
These use cases address four aspects of the WYSIWYM implementation:
1) Its integration into existing user interfaces,
2) Utilizing it for lightweight text analytics to incentivize users,
3) Dealing with crowdsourcing of semi-structured e-learning content,
4) Incorporating it for authoring of semantic medical prescriptions
Digital Image Representation Model Enriched with Semantic Web Technologies: Visual and Non-Visual Information
The types of content of digital images, visual (syntactic and semantic) and non-visual, cause the complexity of their representation. Considering these contents separately hinders digital image retrieval because this creates a gap between the contents of the image and its representation. Therefore, this work aims to present a representation model of visual and non-visual information of digital images, with semantic enrichment through the Semantic Web technologies. For that, a qualitative methodology with a bibliographical approach was used. Theoretical subsidies of the topics addressed were sought, and it has an applied focus since it proposes a model and its exemplification. The developed model depicts the representation image process and allows the semantic enrichment of the data. This enrichment facilitates the retrieval in multiple contexts with technologies that favor the use of the data through inferences. Also, a use case with digital medical images is presented, demonstrating the feasibility of the proposal. It is concluded that the representation of visual and non-visual content aims to improve the way images are retrieved in digital information environments. The junction of the content and the context of images should be considered, even though search mechanisms usually treat this separately due to the disaggregation of image representation itself
Multi-Perspective Relevance Matching with Hierarchical ConvNets for Social Media Search
Despite substantial interest in applications of neural networks to
information retrieval, neural ranking models have only been applied to standard
ad hoc retrieval tasks over web pages and newswire documents. This paper
proposes MP-HCNN (Multi-Perspective Hierarchical Convolutional Neural Network)
a novel neural ranking model specifically designed for ranking short social
media posts. We identify document length, informal language, and heterogeneous
relevance signals as features that distinguish documents in our domain, and
present a model specifically designed with these characteristics in mind. Our
model uses hierarchical convolutional layers to learn latent semantic
soft-match relevance signals at the character, word, and phrase levels. A
pooling-based similarity measurement layer integrates evidence from multiple
types of matches between the query, the social media post, as well as URLs
contained in the post. Extensive experiments using Twitter data from the TREC
Microblog Tracks 2011--2014 show that our model significantly outperforms prior
feature-based as well and existing neural ranking models. To our best
knowledge, this paper presents the first substantial work tackling search over
social media posts using neural ranking models.Comment: AAAI 2019, 10 page
- âŠ