48 research outputs found

    Entity-Oriented Search

    Get PDF
    This open access book covers all facets of entity-oriented search—where “search” can be interpreted in the broadest sense of information access—from a unified point of view, and provides a coherent and comprehensive overview of the state of the art. It represents the first synthesis of research in this broad and rapidly developing area. Selected topics are discussed in-depth, the goal being to establish fundamental techniques and methods as a basis for future research and development. Additional topics are treated at a survey level only, containing numerous pointers to the relevant literature. A roadmap for future research, based on open issues and challenges identified along the way, rounds out the book. The book is divided into three main parts, sandwiched between introductory and concluding chapters. The first two chapters introduce readers to the basic concepts, provide an overview of entity-oriented search tasks, and present the various types and sources of data that will be used throughout the book. Part I deals with the core task of entity ranking: given a textual query, possibly enriched with additional elements or structural hints, return a ranked list of entities. This core task is examined in a number of different variants, using both structured and unstructured data collections, and numerous query formulations. In turn, Part II is devoted to the role of entities in bridging unstructured and structured data. Part III explores how entities can enable search engines to understand the concepts, meaning, and intent behind the query that the user enters into the search box, and how they can provide rich and focused responses (as opposed to merely a list of documents)—a process known as semantic search. The final chapter concludes the book by discussing the limitations of current approaches, and suggesting directions for future research. Researchers and graduate students are the primary target audience of this book. A general background in information retrieval is sufficient to follow the material, including an understanding of basic probability and statistics concepts as well as a basic knowledge of machine learning concepts and supervised learning algorithms

    Using Context Awareness to Improve Domain-Specific Named Entity Disambiguation

    Get PDF
    In this project we designed and implemented a system based on the Learning To Rank framework to perform Named Entity Disambiguation (NED) of ancient author names and work titles being parts of canonical bibliographic citations. The data is made of abstracts extracted from modern publications in the context of Classical Studies. We had to deal with domain specific challenges like the small set of available anno- tated data, the high level of ambiguity of the citations and a specific knowledge base which does not include the common properties of the knowledge bases usually used in state-of-the-art NED systems like Wikipedia. Finally our system improved the already implemented baseline system and reached a F1 score of 77.62% (+7.1%) and 71.88% accuracy (+10.2%). We also demonstrated how we can further improve the disambiguation by exploiting the co-occurrence probability of entities extracted from the corpus. With this method we improved our system by 6.8% in terms of accuracy on a sub-set of 59 documents

    Distributional Semantic Models of Attribute Meaning in Adjectives and Nouns

    Get PDF
    Attributes such as SIZE, WEIGHT or COLOR are at the core of conceptualization, i.e., the formal representation of entities or events in the real world. In natural language, formal attributes find their counterpart in attribute nouns which can be used in order to generalize over individual properties (e.g., 'big' or 'small' in case of SIZE, 'blue' or 'red' in case of COLOR). In order to ascribe such properties to entities or events, adjective-noun phrases are a very frequent linguistic pattern (e.g., 'a blue shirt', 'a big lion'). In these constructions, attribute meaning is conveyed only implicitly, i.e., without being overtly realized at the phrasal surface. This thesis is about modeling attribute meaning in adjectives and nouns in a distributional semantics framework. This implies the acquisition of meaning representations for adjectives, nouns and their phrasal combination from corpora of natural language text in an unsupervised manner, without tedious handcrafting or manual annotation efforts. These phrase representations can be used to predict implicit attribute meaning from adjective-noun phrases -- a problem which will be referred to as attribute selection throughout this thesis. The approach to attribute selection proposed in this thesis is framed in structured distributional models. We model adjective and noun meanings as distinct semantic vectors in the same semantic space spanned by attributes as dimensions of meaning. Based on these word representations, we make use of vector composition operations in order to construct a phrase representation from which the most prominent attribute(s) being expressed in the compositional semantics of the adjective-noun phrase can be selected by means of an unsupervised selection function. This approach not only accounts for the linguistic principle of compositionality that underlies adjective-noun phrases, but also avoids inherent sparsity issues that result from the fact that the relationship between an adjective, a noun and a particular attribute is rarely explicitly observed in corpora. The attribute models developed in this thesis aim at a reconciliation of the conflict between specificity and sparsity in distributional semantic models. For this purpose, we compare various instantiations of attribute models capitalizing on pattern-based and dependency-based distributional information as well as attribute-specific latent topics induced from a weakly supervised adaptation of Latent Dirichlet Allocation. Moreover, we propose a novel framework of distributional enrichment in order to enhance structured vector representations by incorporating additional lexical information from complementary distributional sources. In applying distributional enrichment to distributional attribute models, we follow the idea to augment structured representations of adjectives and nouns to centroids of their nearest neighbours in semantic space, while keeping the principle of meaning representation along structured, interpretable dimensions intact. We evaluate our attribute models in several experiments on the attribute selection task framed for various attribute inventories, ranging from a thoroughly confined set of ten core attributes up to a large-scale set of 260 attributes. Our results show that large-scale attribute selection from distributional vector representations that have been acquired in an unsupervised setting is a challenging endeavor that can be rendered more feasible by restricting the semantic space to confined subsets of attributes. Beyond quantitative evaluation, we also provide a thorough analysis of performance factors (based on linear regression) that influence the effectiveness of a distributional attribute model for attribute selection. This investigation reflects strengths and weaknesses of the model and sheds light on the impact of a variety of linguistic factors involved in attribute selection, e.g., the relative contribution of adjective and noun meaning. In conclusion, we consider our work on attribute selection as an instructive showcase for applying methods from distributional semantics in the broader context of knowledge acquisition from text in order to alleviate issues that are related to implicitness and sparsity

    Distributional Semantic Models of Attribute Meaning in Adjectives and Nouns

    Get PDF
    Hartung M. Distributional Semantic Models of Attribute Meaning in Adjectives and Nouns. Heidelberg: Universität Heidelberg; 2015

    The impact of sources of inspiration on the genesis of creative ideas

    Get PDF
    Innovation fundamentally begins with a good idea. But where do good ideas come from? Much research suggests that innovative breakthroughs are often inspired by past experience: things and ideas that one has interacted with in the world. However, the same experiences that can inspire innovation can sometimes constrain or harm innovation through focus on previously unsuccessful solutions. In this dissertation, I explore principles for guiding interactions with sources of inspiration (previous/other ideas) to maximize their benefits and minimize their pitfalls, focusing on the role of conceptual distance and diversity of sources. I analyze thousands of ideas for complex innovation challenges (e.g., increasing accessibility in elections, revitalizing struggling urban areas) posted to an online crowd-sourced innovation platform that required contributors to cite sources of ideas, tracing the impact of the distance and diversity of sources in ideas’ conceptual genealogies on their creative success (as judged by an expert panel). In this dissertation, I make three primary contributions to the literature. First, leveraging techniques from natural language processing and machine learning, I develop a validated computational methodology for studying conceptual distance and diversity with complex design concepts, which addresses significant issues of efficiency and scalability faced in prior work. Second, I challenge the widespread but unevenly supported notion that far sources provide the best insights for creative ideation; addressing key methodological issues in prior work (time scale, statistical power, and problem variation), I show that overreliance on far sources can harm ideation success, and that good ideas can often come from very near sources. Finally, I demonstrate the potential value of incorporating a temporal dimension into analyses of the impact of sources of inspiration: I find evidence of differential impacts of source distance and diversity (viz., increased problem variation for the effect of source distance, and a more robust positive effect of source diversity) when considering sources farther back in ideas’ conceptual genealogies

    Sociolinguistic Priming and the Perception of Agreement Variation: Testing Predictions of Exemplar-Theoretic Grammar.

    Full text link
    This dissertation investigates the sociolinguistic perception of morphosyntactic variation and is motivated by exemplar-based approaches to grammar. The study uses syntactic priming experiments to test the effects of participants' exposure to subject-verb agreement variants. Experiments also manipulate the gender, social status, and individual identity of the talkers to whom participants are exposed, testing the influence of social information on the perception of agreement variation. Access to social information about a speaker has been found to influence the perception of the linguistic forms they produce. Exemplar-theoretic models of speech perception accommodate these findings by positing that linguistic knowledge consists of episodic memory traces of experiences with language, and that linguistic exemplars represent rich social details. Exemplar-theoretic models of syntax likewise posit that syntactic knowledge is a function of direct experiences with language. However, syntactic exemplar theorists have not explored patterns of sociolinguistic variation, and sociolinguistically-informed exemplar-theoretic work has focused on patterns of phonological variation. This study hypothesizes that for grammatical variation that is sociolinguistically patterned, grammatical processing will show sensitivity to both social and linguistic influences in the processing context. The dissertation experiments use structural priming, a paradigm common in psycholinguistic research for exploring cognitive representations of syntactic structure. The experiments manipulate participants' exposure to variants of two subject-verb constructions that alternate commonly across English dialects: NPsg/pl+don't (The dog/dogs don't bark) and there's+NPsg/pl (There's a dog/dogs in the yard). The experiments find effects of recency, social similarity, and constructional frequency on participants' interpretation of agreement forms, supporting central features of a socially rich exemplar-based grammar. The study shows that grammatical perception is sensitive to priming, such that exposure to a nonstandard variant in the prime sentence increases the likelihood that participants will perceive a nonstandard variant in the target sentence. Priming is also differentially affected by the social dimensions of gender, social status, and talker specificity. The dissertation argues that the notions of indirect, direct, and potential indexicality capture these differences, and that they can be accommodated by a model of grammatical knowledge that includes multiple levels of abstracted linguistic and sociolinguistic categories.Ph.D.LinguisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/86461/1/lsquires_1.pd

    Word Associations as a Language Model for Generative and Creative Tasks

    Get PDF
    In order to analyse natural language and gain a better understanding of documents, a common approach is to produce a language model which creates a structured representation of language which could then be used further for analysis or generation. This thesis will focus on a fairly simple language model which looks at word associations which appear together in the same sentence. We will revisit a classic idea of analysing word co-occurrences statistically and propose a simple parameter-free method for extracting common word associations, i.e. associations between words that are often used in the same context (e.g., Batman and Robin). Additionally we propose a method for extracting associations which are specific to a document or a set of documents. The idea behind the method is to take into account the common word associations and highlight such word associations which co-occur in the document unexpectedly often. We will empirically show that these models can be used in practice at least for three tasks: generation of creative combinations of related words, document summarization, and creating poetry. First the common word association language model is used for solving tests of creativity -- the Remote Associates test. Then observations of the properties of the model are used further to generate creative combinations of words -- sets of words which are mutually not related, but do share a common related concept. Document summarization is a task where a system has to produce a short summary of the text with a limited number of words. In this thesis, we will propose a method which will utilise the document-specific associations and basic graph algorithms to produce summaries which give competitive performance on various languages. Also, the document-specific associations are used in order to produce poetry which is related to a certain document or a set of documents. The idea is to use documents as inspiration for generating poems which could potentially be used as commentary to news stories. Empirical results indicate that both, the common and the document-specific associations, can be used effectively for different applications. This provides us with a simple language model which could be used for different languages.Kielimalleja käytetään usein luonnollisten kielten ja dokumenttien ymmärtämiseen. Kielimalli on kielen rakenteellinen esitysmuoto, jota voidaan käyttää kielen analyysiin tai sen tuottamiseen. Tässä työssä esitetään yksinkertainen kielimalli, joka perustuu assosiaatioihin sanojen välillä, jotka esiintyvät samassa lausessa. Ensin tutustumme klassiseen menetelmään analysoida sanojen yhteisesiintymiä tilastollisesti, jonka perusteella esittelemme parametri-vapaan menetelmän tuottaa yleisiä sana-assosiaatioita. Nämä sana-assosiaatiot ovat yhteyksiä sellaisten sanojen välillä, jotka esiintyvät samoissa asiayhteyksissä, kuten esimerkiksi Batman ja Robin. Lisäksi esittelemme menetelmän, joka tuottaa näitä assosiaatioita tietylle dokumentille tai joukolle dokumentteja. Menetelmä perustuu niiden sana-assosiaatioiden huomioimiseen, jotka ovat lähde-dokumenteissa erityisen yleisiä. Näytämme empiirisesti, että kielimallejamme voidaan käyttää ainakin kolmeen tarkoitukseen: luovien sanayhdistelmien tuottamiseen, dokumenttien referointiin ja runojen tuottamiseen. Ratkomme ensin yleisiin sana-assosiaatioihin perustuvalla mallillamme luovuutta testaavia Remote Associates -kokeita. Sen jälkeen tuotamme mallista tehtyjen havaintojen perusteella luovia sanayhdistelmiä. Nämä yhdistelmät sisältävät sanoja, jotka eivät välttämättä ole keskenään toisiinsa liittyviä, mutta ne jakavat joitakin yhdistäviä käsitteitä. Dokumentin referointi viittaa tehtävään, jossa pitää tuottaa rajoitetun pituinen lyhennelmä pidemmästä dokumentista. Esitämme menetelmän joka tuottaa eri kielillä tasoltaan kilpailukykyisiä referaatteja, käyttäen dokumenttikohtaisia sana-assosiaatioita sekä yksinkertaisia graafi-algoritmeja. Assosiaatioiden avulla voidaan tuottaa myös dokementtikohtaisia runoja. Dokumenttien inspiroimia runoja voitaisiin käyttää esimerkiksi uutisartikkeleiden kommentointiin. Tuloksemme niin yleisiin kuin dokumenttikohtaisiin assosiaatioihin perustuvista malleista osoittavat, että näitä malleja voidaan käyttää tehokkaasti eri käyttötarkoituksiin. Tuloksena on yksinkertainen kielimalli, jota voidaan käyttää useiden eri kielten kanssa

    Evolutionary Computation

    Get PDF
    This book presents several recent advances on Evolutionary Computation, specially evolution-based optimization methods and hybrid algorithms for several applications, from optimization and learning to pattern recognition and bioinformatics. This book also presents new algorithms based on several analogies and metafores, where one of them is based on philosophy, specifically on the philosophy of praxis and dialectics. In this book it is also presented interesting applications on bioinformatics, specially the use of particle swarms to discover gene expression patterns in DNA microarrays. Therefore, this book features representative work on the field of evolutionary computation and applied sciences. The intended audience is graduate, undergraduate, researchers, and anyone who wishes to become familiar with the latest research work on this field

    Engineering a Better Future

    Get PDF
    This open access book examines how the social sciences can be integrated into the praxis of engineering and science, presenting unique perspectives on the interplay between engineering and social science. Motivated by the report by the Commission on Humanities and Social Sciences of the American Association of Arts and Sciences, which emphasizes the importance of social sciences and Humanities in technical fields, the essays and papers collected in this book were presented at the NSF-funded workshop ‘Engineering a Better Future: Interplay between Engineering, Social Sciences and Innovation’, which brought together a singular collection of people, topics and disciplines. The book is split into three parts: A. Meeting at the Middle: Challenges to educating at the boundaries covers experiments in combining engineering education and the social sciences; B. Engineers Shaping Human Affairs: Investigating the interaction between social sciences and engineering, including the cult of innovation, politics of engineering, engineering design and future of societies; and C. Engineering the Engineers: Investigates thinking about design with papers on the art and science of science and engineering practice
    corecore