13,065 research outputs found

    Toward Entity-Aware Search

    Get PDF
    As the Web has evolved into a data-rich repository, with the standard "page view," current search engines are becoming increasingly inadequate for a wide range of query tasks. While we often search for various data "entities" (e.g., phone number, paper PDF, date), today's engines only take us indirectly to pages. In my Ph.D. study, we focus on a novel type of Web search that is aware of data entities inside pages, a significant departure from traditional document retrieval. We study the various essential aspects of supporting entity-aware Web search. To begin with, we tackle the core challenge of ranking entities, by distilling its underlying conceptual model Impression Model and developing a probabilistic ranking framework, EntityRank, that is able to seamlessly integrate both local and global information in ranking. We also report a prototype system built to show the initial promise of the proposal. Then, we aim at distilling and abstracting the essential computation requirements of entity search. From the dual views of reasoning--entity as input and entity as output, we propose a dual-inversion framework, with two indexing and partition schemes, towards efficient and scalable query processing. Further, to recognize more entity instances, we study the problem of entity synonym discovery through mining query log data. The results we obtained so far have shown clear promise of entity-aware search, in its usefulness, effectiveness, efficiency and scalability

    The NASA Astrophysics Data System: Architecture

    Full text link
    The powerful discovery capabilities available in the ADS bibliographic services are possible thanks to the design of a flexible search and retrieval system based on a relational database model. Bibliographic records are stored as a corpus of structured documents containing fielded data and metadata, while discipline-specific knowledge is segregated in a set of files independent of the bibliographic data itself. The creation and management of links to both internal and external resources associated with each bibliography in the database is made possible by representing them as a set of document properties and their attributes. To improve global access to the ADS data holdings, a number of mirror sites have been created by cloning the database contents and software on a variety of hardware and software platforms. The procedures used to create and manage the database and its mirrors have been written as a set of scripts that can be run in either an interactive or unsupervised fashion. The ADS can be accessed at http://adswww.harvard.eduComment: 25 pages, 8 figures, 3 table

    Personalized Fuzzy Text Search Using Interest Prediction and Word Vectorization

    Full text link
    In this paper we study the personalized text search problem. The keyword based search method in conventional algorithms has a low efficiency in understanding users' intention since the semantic meaning, user profile, user interests are not always considered. Firstly, we propose a novel text search algorithm using a inverse filtering mechanism that is very efficient for label based item search. Secondly, we adopt the Bayesian network to implement the user interest prediction for an improved personalized search. According to user input, it searches the related items using keyword information, predicted user interest. Thirdly, the word vectorization is used to discover potential targets according to the semantic meaning. Experimental results show that the proposed search engine has an improved efficiency and accuracy and it can operate on embedded devices with very limited computational resources

    The NASA Astrophysics Data System: The Search Engine and its User Interface

    Get PDF
    The ADS Abstract and Article Services provide access to the astronomical literature through the World Wide Web (WWW). The forms based user interface provides access to sophisticated searching capabilities that allow our users to find references in the fields of Astronomy, Physics/Geophysics, and astronomical Instrumentation and Engineering. The returned information includes links to other on-line information sources, creating an extensive astronomical digital library. Other interfaces to the ADS databases provide direct access to the ADS data to allow developers of other data systems to integrate our data into their system. The search engine is a custom-built software system that is specifically tailored to search astronomical references. It includes an extensive synonym list that contains discipline specific knowledge about search term equivalences. Search request logs show the usage pattern of the various search system capabilities. Access logs show the world-wide distribution of ADS users. The ADS can be accessed at http://adswww.harvard.eduComment: 23 pages, 18 figures, 11 table

    Enriching ontological user profiles with tagging history for multi-domain recommendations

    Get PDF
    Many advanced recommendation frameworks employ ontologies of various complexities to model individuals and items, providing a mechanism for the expression of user interests and the representation of item attributes. As a result, complex matching techniques can be applied to support individuals in the discovery of items according to explicit and implicit user preferences. Recently, the rapid adoption of Web2.0, and the proliferation of social networking sites, has resulted in more and more users providing an increasing amount of information about themselves that could be exploited for recommendation purposes. However, the unification of personal information with ontologies using the contemporary knowledge representation methods often associated with Web2.0 applications, such as community tagging, is a non-trivial task. In this paper, we propose a method for the unification of tags with ontologies by grounding tags to a shared representation in the form of Wordnet and Wikipedia. We incorporate individuals' tagging history into their ontological profiles by matching tags with ontology concepts. This approach is preliminary evaluated by extending an existing news recommendation system with user tagging histories harvested from popular social networking sites

    A Configurable Matchmaking Framework for Electronic Marketplaces

    Get PDF
    E-marketplaces constitute a major enabler of B2B and B2C e-commerce activities. This paper proposes a framework for one of the central activities of e-marketplaces: matchmaking of trading intentions lodged by market participants. The framework identifies a core set of concepts and functions that are common to all types of marketplaces and can serve as the basis for describing the distinct styles of matchmaking employed within various market mechanisms. A prototype implementation of the framework based on Web services technology is presented, illustrating its ability to be dynamically configured to meet specific market needs and its potential to serve as a foundation for more fully fledged e-marketplace frameworks

    Facets and Typed Relations as Tools for Reasoning Processes in Information Retrieval

    Full text link
    Faceted arrangement of entities and typed relations for representing different associations between the entities are established tools in knowledge representation. In this paper, a proposal is being discussed combining both tools to draw inferences along relational paths. This approach may yield new benefit for information retrieval processes, especially when modeled for heterogeneous environments in the Semantic Web. Faceted arrangement can be used as a se-lection tool for the semantic knowledge modeled within the knowledge repre-sentation. Typed relations between the entities of different facets can be used as restrictions for selecting them across the facets

    Oblivion: Mitigating Privacy Leaks by Controlling the Discoverability of Online Information

    Get PDF
    Search engines are the prevalently used tools to collect information about individuals on the Internet. Search results typically comprise a variety of sources that contain personal information -- either intentionally released by the person herself, or unintentionally leaked or published by third parties, often with detrimental effects on the individual's privacy. To grant individuals the ability to regain control over their disseminated personal information, the European Court of Justice recently ruled that EU citizens have a right to be forgotten in the sense that indexing systems, must offer them technical means to request removal of links from search results that point to sources violating their data protection rights. As of now, these technical means consist of a web form that requires a user to manually identify all relevant links upfront and to insert them into the web form, followed by a manual evaluation by employees of the indexing system to assess if the request is eligible and lawful. We propose a universal framework Oblivion to support the automation of the right to be forgotten in a scalable, provable and privacy-preserving manner. First, Oblivion enables a user to automatically find and tag her disseminated personal information using natural language processing and image recognition techniques and file a request in a privacy-preserving manner. Second, Oblivion provides indexing systems with an automated and provable eligibility mechanism, asserting that the author of a request is indeed affected by an online resource. The automated ligibility proof ensures censorship-resistance so that only legitimately affected individuals can request the removal of corresponding links from search results. We have conducted comprehensive evaluations, showing that Oblivion is capable of handling 278 removal requests per second, and is hence suitable for large-scale deployment

    PowerAqua: fishing the semantic web

    Get PDF
    The Semantic Web (SW) offers an opportunity to develop novel, sophisticated forms of question answering (QA). Specifically, the availability of distributed semantic markup on a large scale opens the way to QA systems which can make use of such semantic information to provide precise, formally derived answers to questions. At the same time the distributed, heterogeneous, large-scale nature of the semantic information introduces significant challenges. In this paper we describe the design of a QA system, PowerAqua, designed to exploit semantic markup on the web to provide answers to questions posed in natural language. PowerAqua does not assume that the user has any prior information about the semantic resources. The system takes as input a natural language query, translates it into a set of logical queries, which are then answered by consulting and aggregating information derived from multiple heterogeneous semantic sources
    • …
    corecore