13,065 research outputs found
Toward Entity-Aware Search
As the Web has evolved into a data-rich repository, with the standard "page view," current search engines are becoming increasingly inadequate for a wide range of query tasks. While we often search for various data "entities" (e.g., phone number, paper PDF, date), today's engines only take us indirectly to pages. In my Ph.D. study, we focus on a novel type of Web search that is aware of data entities inside pages, a significant departure from traditional document retrieval. We study the various essential aspects of supporting entity-aware Web search. To begin with, we tackle the core challenge of ranking entities, by distilling its underlying conceptual model Impression Model and developing a probabilistic ranking framework, EntityRank, that is able to seamlessly integrate both local and global information in ranking. We also report a prototype system built to show the initial promise of the proposal. Then, we aim at distilling and abstracting the essential computation requirements of entity search. From the dual views of reasoning--entity as input and entity as output, we propose a dual-inversion framework, with two indexing and partition schemes, towards efficient and scalable query processing. Further, to recognize more entity instances, we study the problem of entity synonym discovery through mining query log data. The results we obtained so far have shown clear promise of entity-aware search, in its usefulness, effectiveness, efficiency and scalability
The NASA Astrophysics Data System: Architecture
The powerful discovery capabilities available in the ADS bibliographic
services are possible thanks to the design of a flexible search and retrieval
system based on a relational database model. Bibliographic records are stored
as a corpus of structured documents containing fielded data and metadata, while
discipline-specific knowledge is segregated in a set of files independent of
the bibliographic data itself.
The creation and management of links to both internal and external resources
associated with each bibliography in the database is made possible by
representing them as a set of document properties and their attributes.
To improve global access to the ADS data holdings, a number of mirror sites
have been created by cloning the database contents and software on a variety of
hardware and software platforms.
The procedures used to create and manage the database and its mirrors have
been written as a set of scripts that can be run in either an interactive or
unsupervised fashion.
The ADS can be accessed at http://adswww.harvard.eduComment: 25 pages, 8 figures, 3 table
Personalized Fuzzy Text Search Using Interest Prediction and Word Vectorization
In this paper we study the personalized text search problem. The keyword
based search method in conventional algorithms has a low efficiency in
understanding users' intention since the semantic meaning, user profile, user
interests are not always considered. Firstly, we propose a novel text search
algorithm using a inverse filtering mechanism that is very efficient for label
based item search. Secondly, we adopt the Bayesian network to implement the
user interest prediction for an improved personalized search. According to user
input, it searches the related items using keyword information, predicted user
interest. Thirdly, the word vectorization is used to discover potential targets
according to the semantic meaning. Experimental results show that the proposed
search engine has an improved efficiency and accuracy and it can operate on
embedded devices with very limited computational resources
The NASA Astrophysics Data System: The Search Engine and its User Interface
The ADS Abstract and Article Services provide access to the astronomical
literature through the World Wide Web (WWW). The forms based user interface
provides access to sophisticated searching capabilities that allow our users to
find references in the fields of Astronomy, Physics/Geophysics, and
astronomical Instrumentation and Engineering. The returned information includes
links to other on-line information sources, creating an extensive astronomical
digital library. Other interfaces to the ADS databases provide direct access to
the ADS data to allow developers of other data systems to integrate our data
into their system.
The search engine is a custom-built software system that is specifically
tailored to search astronomical references. It includes an extensive synonym
list that contains discipline specific knowledge about search term
equivalences.
Search request logs show the usage pattern of the various search system
capabilities. Access logs show the world-wide distribution of ADS users.
The ADS can be accessed at http://adswww.harvard.eduComment: 23 pages, 18 figures, 11 table
Enriching ontological user profiles with tagging history for multi-domain recommendations
Many advanced recommendation frameworks employ ontologies of various complexities to model individuals and items, providing a mechanism for the expression of user interests and the representation of item attributes. As a result, complex matching techniques can be applied to support individuals in the discovery of items according to explicit and implicit user preferences. Recently, the rapid adoption of Web2.0, and the proliferation of social networking sites, has resulted in more and more users providing an increasing amount of information about themselves that could be exploited for recommendation purposes. However, the unification of personal information with ontologies using the contemporary knowledge representation methods often associated with Web2.0 applications, such as community tagging, is a non-trivial task. In this paper, we propose a method for the unification of tags with ontologies by grounding tags to a shared representation in the form of Wordnet and Wikipedia. We incorporate individuals' tagging history into their ontological profiles by matching tags with ontology concepts. This approach is preliminary evaluated by extending an existing news recommendation system with user tagging histories harvested from popular social networking sites
A Configurable Matchmaking Framework for Electronic Marketplaces
E-marketplaces constitute a major enabler of B2B and B2C e-commerce activities. This paper proposes a framework for one of the central activities of e-marketplaces: matchmaking of trading intentions lodged by market participants. The framework identifies a core set of concepts and functions that are common to all types of marketplaces and can serve as the basis for describing the distinct styles of matchmaking employed within various market mechanisms. A prototype implementation of the framework based on Web services technology is presented, illustrating its ability to be dynamically configured to meet specific market needs and its potential to serve as a foundation for more fully fledged e-marketplace frameworks
Facets and Typed Relations as Tools for Reasoning Processes in Information Retrieval
Faceted arrangement of entities and typed relations for representing
different associations between the entities are established tools in knowledge
representation. In this paper, a proposal is being discussed combining both
tools to draw inferences along relational paths. This approach may yield new
benefit for information retrieval processes, especially when modeled for
heterogeneous environments in the Semantic Web. Faceted arrangement can be used
as a se-lection tool for the semantic knowledge modeled within the knowledge
repre-sentation. Typed relations between the entities of different facets can
be used as restrictions for selecting them across the facets
Oblivion: Mitigating Privacy Leaks by Controlling the Discoverability of Online Information
Search engines are the prevalently used tools to collect information about
individuals on the Internet. Search results typically comprise a variety of
sources that contain personal information -- either intentionally released by
the person herself, or unintentionally leaked or published by third parties,
often with detrimental effects on the individual's privacy. To grant
individuals the ability to regain control over their disseminated personal
information, the European Court of Justice recently ruled that EU citizens have
a right to be forgotten in the sense that indexing systems, must offer them
technical means to request removal of links from search results that point to
sources violating their data protection rights. As of now, these technical
means consist of a web form that requires a user to manually identify all
relevant links upfront and to insert them into the web form, followed by a
manual evaluation by employees of the indexing system to assess if the request
is eligible and lawful.
We propose a universal framework Oblivion to support the automation of the
right to be forgotten in a scalable, provable and privacy-preserving manner.
First, Oblivion enables a user to automatically find and tag her disseminated
personal information using natural language processing and image recognition
techniques and file a request in a privacy-preserving manner. Second, Oblivion
provides indexing systems with an automated and provable eligibility mechanism,
asserting that the author of a request is indeed affected by an online
resource. The automated ligibility proof ensures censorship-resistance so that
only legitimately affected individuals can request the removal of corresponding
links from search results. We have conducted comprehensive evaluations, showing
that Oblivion is capable of handling 278 removal requests per second, and is
hence suitable for large-scale deployment
PowerAqua: fishing the semantic web
The Semantic Web (SW) offers an opportunity to develop novel, sophisticated forms of question answering (QA). Specifically, the availability of distributed semantic markup on a large scale opens the way to QA systems which can make use of such semantic information to provide precise, formally derived answers to questions. At the same time the distributed, heterogeneous, large-scale nature of the semantic information introduces significant challenges. In this paper we describe the design of a QA system, PowerAqua, designed to exploit semantic markup on the web to provide answers to questions posed in natural language. PowerAqua does not assume that the user has any prior information about the semantic resources. The system takes as input a natural language query, translates it into a set of logical queries, which are then answered by consulting and aggregating information derived from multiple heterogeneous semantic sources
- …