20 research outputs found

    Transforming Wikipedia into an Ontology-based Information Retrieval Search Engine for Local Experts using a Third-Party Taxonomy

    Get PDF
    Wikipedia is widely used for finding general information about a wide variety of topics. Its vocation is not to provide local information. For example, it provides plot, cast, and production information about a given movie, but not showing times in your local movie theatre. Here we describe how we can connect local information to Wikipedia, without altering its content. The case study we present involves finding local scientific experts. Using a third-party taxonomy, independent from Wikipedia's category hierarchy, we index information connected to our local experts, present in their activity reports, and we re-index Wikipedia content using the same taxonomy. The connections between Wikipedia pages and local expert reports are stored in a relational database, accessible through as public SPARQL endpoint. A Wikipedia gadget (or plugin) activated by the interested user, accesses the endpoint as each Wikipedia page is accessed. An additional tab on the Wikipedia page allows the user to open up a list of teams of local experts associated with the subject matter in the Wikipedia page. The technique, though presented here as a way to identify local experts, is generic, in that any third party taxonomy, can be used in this to connect Wikipedia to any non-Wikipedia data source.Comment: Joint Second Workshop on Language and Ontology \& Terminology and Knowledge Structures (LangOnto2 + TermiKS) LO2TKS, May 2016, Portoroz, Slovenia. 201

    Problem of transitivity of wikipedia category system

    Get PDF
    This paper analyses a violation of the transitivity principle of Wikipedia category system. Causes of the violation have been analyzed on base of ontological modeling methodologies such as Onto-Clean. A new approach for elimination of the violation has been proposed

    Avtomatska razširitev in čiščenje sloWNeta

    Get PDF
    International audienceIn this paper we present a language-independent and automatic approach to extend a wordnet by recycling different types of already existing language resources, such as machine-readable dictionaries, parallel corpora and Wikipedia. The approach, applied to Slovene, takes into account monosemous and polysemous words, general and specialized vocabulary as well as simple and multi-word lexemes. The extracted words are assigned one or several synset ids based on a classifier that relies on several features including distributional similarity. In the next step we also identify and remove highly dubious (literal, synset) pairs, based on simple distributional information extracted from a large corpus in an unsupervised way. Automatic and manual evaluation show that the proposed approach yields very promising results.V prispevku predstavljamo jezikovno neodvisno in avtomatsko razširitev wordneta z uporabo heterogenih že obstoječih jezikovnih virov, kot so strojno berljivi slovarji, vzporedni korpusi in Wikipedija. Pristop, ki ga preizkusimo na slovenščini, upošteva tako eno- kot večpomenske besede, splošno in specializirano besedišče, pa tudi eno- in večbesedne lekseme. Izluščenim besedam enega ali več pomenov pripišemo s pomočjo klasifikatorja, ki temelji na naboru različnih značilk, predvsem pa na distribucijski podobnosti. V naslednjem koraku s pomočjo distribucijskih informacij, izluščenih iz velikega korpusa, identificiramo in odstranimo zelo dvomljive kandidate. Avtomatska in ročna evalvacija rezultatov pokaže, da uporabljeni pristop daje zelo spodbudne rezultate

    Automatic Extension of WOLF

    Get PDF
    International audienceIn this paper we present the extension of WOLF, a freely available, automatically creat- ed wordnet for French, the biggest drawback of which has until now been the lack of general concepts that are typically expressed with highly polysemous vocabulary that is on the one hand the most valuable for applications in human language technologies but also the most difficult to add to wordnet accurately with automatic methods on the other. Using a set of features, we train a Maximum Entropy classifier on the existing core wordnet to be able to assign appropriate synset ids to new words, extracted from multiple, multilingual sources of lexical knowledge, such as Wik- tionaries, Wikipedias and corpora. Automatic and manual evaluation shows high coverage as well as high quality of the resulting lexico-semantic repository of. Another important ad- vantage of the approach is that it is fully au- tomatic and language-independent and could therefore be applied to any other language still lacking a wordnet

    Automatically acquiring a semantic network of related concepts

    Get PDF
    ABSTRACT We describe the automatic construction of a semantic network 1 , in which over 3000 of the most frequently occurring monosemous nouns 2 in Wikipedia (each appearing between 1,500 and 100,000 times) are linked to their semantically related concepts in the WordNet noun ontology. Relatedness between nouns is discovered automatically from cooccurrence in Wikipedia texts using an information theoretic inspired measure. Our algorithm then capitalizes on salient sense clustering among related nouns to automatically disambiguate them to their appropriate senses (i.e., concepts). Through the act of disambiguation, we begin to accumulate relatedness data for concepts denoted by polysemous nouns, as well. The resultant concept-to-concept associations, covering 17,543 nouns, and 27,312 distinct senses among them, constitute a large-scale semantic network of related concepts that can be conceived of as augmenting the WordNet noun ontology with related-to links

    Disciplines and the Categorization of Scientific Truth: The Case of Social Sciences in the Hebrew Wikipedia

    Get PDF
    For the general audience, Wikipedia is considered the source of “truth,” especially for scientific knowledge. While studies of Wikipedia usually focus on the accuracy of the knowledge within it, few studies have explored its hierarchy and categorization. This study aims to describe how scientific information is organized into disciplines in Wikipedia. I take as a case study the Hebrew Wikipedia and examine the representation and interrelations of five social sciences: sociology, anthropology, economics, political science, and psychology. I gather data from Wikipedia entries categorized under each of these disciplines and create a network that contains categories and subcategories derived from these entries. Using network analysis techniques, I estimate the strength of the relations between the disciplines. Results indicated that while sociology, anthropology, and political science are strongly linked to each other, psychology and economics are relatively isolated. As there is a hierarchical difference between these disciplines, the result is a hierarchical value of the scientific knowledge presented in these Wikipedia entries. An interesting case is the distance between economics and sociology, since under the subcategory “Inequality,” the entries are uniquely categorized under sociology or economics but rarely under both categories. I claim this is an example of a fractal walking the distinction between the two disciplines
    corecore