376 research outputs found

    Entity-Oriented Search

    Get PDF
    This open access book covers all facets of entity-oriented search—where “search” can be interpreted in the broadest sense of information access—from a unified point of view, and provides a coherent and comprehensive overview of the state of the art. It represents the first synthesis of research in this broad and rapidly developing area. Selected topics are discussed in-depth, the goal being to establish fundamental techniques and methods as a basis for future research and development. Additional topics are treated at a survey level only, containing numerous pointers to the relevant literature. A roadmap for future research, based on open issues and challenges identified along the way, rounds out the book. The book is divided into three main parts, sandwiched between introductory and concluding chapters. The first two chapters introduce readers to the basic concepts, provide an overview of entity-oriented search tasks, and present the various types and sources of data that will be used throughout the book. Part I deals with the core task of entity ranking: given a textual query, possibly enriched with additional elements or structural hints, return a ranked list of entities. This core task is examined in a number of different variants, using both structured and unstructured data collections, and numerous query formulations. In turn, Part II is devoted to the role of entities in bridging unstructured and structured data. Part III explores how entities can enable search engines to understand the concepts, meaning, and intent behind the query that the user enters into the search box, and how they can provide rich and focused responses (as opposed to merely a list of documents)—a process known as semantic search. The final chapter concludes the book by discussing the limitations of current approaches, and suggesting directions for future research. Researchers and graduate students are the primary target audience of this book. A general background in information retrieval is sufficient to follow the material, including an understanding of basic probability and statistics concepts as well as a basic knowledge of machine learning concepts and supervised learning algorithms

    Proceedings of the Third Dutch-Belgian Information Retrieval Workshop (DIR 2002)

    Get PDF

    From Frequency to Meaning: Vector Space Models of Semantics

    Full text link
    Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field

    Image Annotation and Topic Extraction Using Super-Word Latent Dirichlet

    Get PDF
    This research presents a multi-domain solution that uses text and images to iteratively improve automated information extraction. Stage I uses local text surrounding an embedded image to provide clues that help rank-order possible image annotations. These annotations are forwarded to Stage II, where the image annotations from Stage I are used as highly-relevant super-words to improve extraction of topics. The model probabilities from the super-words in Stage II are forwarded to Stage III where they are used to refine the automated image annotation developed in Stage I. All stages demonstrate improvement over existing equivalent algorithms in the literature

    Multimedia Retrieval

    Get PDF

    Human Machine Interaction

    Get PDF
    In this book, the reader will find a set of papers divided into two sections. The first section presents different proposals focused on the human-machine interaction development process. The second section is devoted to different aspects of interaction, with a special emphasis on the physical interaction

    Graph-Based Weakly-Supervised Methods for Information Extraction & Integration

    Get PDF
    The variety and complexity of potentially-related data resources available for querying --- webpages, databases, data warehouses --- has been growing ever more rapidly. There is a growing need to pose integrative queries across multiple such sources, exploiting foreign keys and other means of interlinking data to merge information from diverse sources. This has traditionally been the focus of research within Information Extraction (IE) and Information Integration (II) communities, with IE focusing on converting unstructured sources into structured sources, and II focusing on providing a unified view of diverse structured data sources. However, most of the current IE and II methods, which can potentially be applied to the pro blem of integration across sources, require large amounts of human supervision, often in the form of annotated data. This need for extensive supervision makes existing methods expensive to deploy and difficult to maintain. In this thesis, we develop techniques that generalize from limited human input, via weakly-supervised methods for IE and II. In particular, we argue that graph-based representation of data and learning over such graphs can result in effective and scalable methods for large-scale Information Extraction and Integration. Within IE, we focus on the problem of assigning semantic classes to entities. First we develop a context pattern induction method to extend small initial entity lists of various semantic classes. We also demonstrate that features derived from such extended entity lists can significantly improve performance of state-of-the-art discriminative taggers. The output of pattern-based class-instance extractors is often high-precision and low-recall in nature, which is inadequate for many real world applications. We use Adsorption, a graph based label propagation algorithm, to significantly increase recall of an initial high-precision, low-recall pattern-based extractor by combining evidences from unstructured and structured text corpora. Building on Adsorption, we propose a new label propagation algorithm, Modified Adsorption (MAD), and demonstrate its effectiveness on various real-world datasets. Additionally, we also show how class-instance acquisition performance in the graph-based SSL setting can be improved by incorporating additional semantic constraints available in independently developed knowledge bases. Within Information Integration, we develop a novel system, Q, which draws ideas from machine learning and databases to help a non-expert user construct data-integrating queries based on keywords (across databases) and interactive feedback on answers. We also present an information need-driven strategy for automatically incorporating new sources and their information in Q. We also demonstrate that Q\u27s learning strategy is highly effective in combining the outputs of ``black box\u27\u27 schema matchers and in re-weighting bad alignments. This removes the need to develop an expensive mediated schema which has been necessary for most previous systems

    Searching and ranking in entity-relationship graphs

    Get PDF
    The Web bears the potential to become the world';s most comprehensive knowledge base. Organizing information from the Web into entity-relationship graph structures could be a first step towards unleashing this potential. In a second step, the inherent semantics of such structures would have to be exploited by expressive search techniques that go beyond today';s keyword search paradigm. In this realm, as a first contribution of this thesis, we present NAGA (Not Another Google Answer), a new semantic search engine. NAGA provides an expressive, graph-based query language that enables queries with entities and relationships. The results are retrieved based on subgraph matching techniques and ranked by means of a statistical ranking model. As a second contribution, we present STAR (Steiner Tree Approximation in Relationship Graphs), an efficient technique for finding "close'; relations (i.e., compact connections) between k(> 2) entities of interest in large entity-relationship graphs. Our third contribution is MING (Mining Informative Graphs). MING is an efficient method for retrieving "informative'; subgraphs for k(> 2) entities of interest from an entity-relationship graph. Intuitively, these would be subgraphs that can explain the relations between the k entities of interest. The knowledge discovery tasks supported by MING have a stronger semantic flavor than the ones supported by STAR. STAR and MING are integrated into the query answering component of the NAGA engine. NAGA itself is a fully implemented prototype system and is part of the YAGONAGA project.Das Web birgt in sich das Potential zur umfangreichsten Wissensbasis der Welt zu werden. Das Organisieren der Information aus dem Web in Entity-Relationship-Graphstrukturen könnte ein erster Schritt sein, um dieses Potential zu entfalten. In einem zweiten Schritt müssten ausdrucksstarke Suchtechniken entwickelt werden, die über das heutige Keyword-basierte Suchparadigma hinausgehen und die inhärente Semantik solcher Strukturen ausnutzen. In diesem Rahmen stellen wir als ersten Beitrag dieser Arbeit NAGA (Not Another Google Answer) vor, eine neue semantische Suchmaschine. NAGA bietet eine ausdrucksstarke, graphbasierte Anfragesprache, die Anfragen mit Entitäten und Relationen ermöglicht. Die Ergebnisse werden durch Subgraph-Matching-Techniken gefunden und mithilfe eines statistischen Modells in eine Rangliste gebracht. Als zweiten Beitrag stellen wir STAR (Steiner Tree Approximation in Relationship Graphs) vor, eine effiziente Technik, um "nahe'; Relationen (d.h. kompakte Verbindungen) zwischen k(> 2) Entitäten in großen Entity-Relationship-Graphen zu finden. Unser dritter Beitrag ist MING (Mining Informative Graphs). MING ist eine effiziente Methode, die das Finden von "informativen'; Subgraphen für k(> 2) Entitäten aus einem Entity-Relationship-Graphen ermöglicht. Dies sind Subgraphen, die die Beziehungen zwischen den k Entitäten erklären können. Im Vergleich zu STAR unterstützt MING Aufgaben der Wissensexploration, die einen stärkeren semantischen Charakter haben. Sowohl STAR als auch MING sind in die Query-Answering-Komponente der NAGA-Suchmaschine integriert. NAGA selbst ist ein vollständig implementiertes Prototypsystem und Teil des YAGO-NAGA-Projekts

    Connected Information Management

    Get PDF
    Society is currently inundated with more information than ever, making efficient management a necessity. Alas, most of current information management suffers from several levels of disconnectedness: Applications partition data into segregated islands, small notes don’t fit into traditional application categories, navigating the data is different for each kind of data; data is either available at a certain computer or only online, but rarely both. Connected information management (CoIM) is an approach to information management that avoids these ways of disconnectedness. The core idea of CoIM is to keep all information in a central repository, with generic means for organization such as tagging. The heterogeneity of data is taken into account by offering specialized editors. The central repository eliminates the islands of application-specific data and is formally grounded by a CoIM model. The foundation for structured data is an RDF repository. The RDF editing meta-model (REMM) enables form-based editing of this data, similar to database applications such as MS access. Further kinds of data are supported by extending RDF, as follows. Wiki text is stored as RDF and can both contain structured text and be combined with structured data. Files are also supported by the CoIM model and are kept externally. Notes can be quickly captured and annotated with meta-data. Generic means for organization and navigation apply to all kinds of data. Ubiquitous availability of data is ensured via two CoIM implementations, the web application HYENA/Web and the desktop application HYENA/Eclipse. All data can be synchronized between these applications. The applications were used to validate the CoIM ideas

    Acta Cybernetica : Volume 18. Number 2.

    Get PDF
    • …
    corecore