Search CORE

26 research outputs found

Classifying the Wikipedia articles into the OpenCyc taxonomy

Author: Smywiński-Pohl Aleksander
Publication venue: Technical University of Aachen
Publication date: 01/01/2012
Field of study

This article presents a method of classification of the Wikipedia articles into the taxonomy of OpenCyc. This method utilises several sources of the classification information, namely the Wikipedia category system, the infoboxes attached to the articles, the first sentences of the articles, treated as their definitions and the direct mapping between the articles and the Cyc symbols. The classification decision made using these methods are accommodated using the Cyc built-in inconsistency detection mechanism. The combination of the best classification methods yields 1,47 millions of classified articles and has a manually verified precision above 97%, while the combination of all of them yields 2.2 millions of articles with estimated precision of 93%

Jagiellonian Univeristy Repository

Automatic mapping of Wikipedia categories into OpenCyc types

Author: Smywiński-Pohl Aleksander
Wróbel Krzysztof
Publication venue: Technical University of Aachen
Publication date: 01/01/2015
Field of study

The aim of the research presented in the article is the mapping between the English Wikipedia categories and OpenCyc types. The mapping algorithm is heuristic and it takes into account structural similarities between the categories and the corresponding types. The achieved mapping precision ranges from 82 to 92 % (depending on the evaluation scheme), recall from 67 to 76%. The results of the algorithm and its code are available at http://cycloped.i

Jagiellonian Univeristy Repository

The importance of cross-lingual information for matching Wikipedia with the Cyc ontology

Author: Smywiński-Pohl Aleksander
Wróbel Krzysztof
Publication venue: Technical University of Aachen
Publication date: 01/01/2014
Field of study

In this paper we try to answer the question how cross-lingual evidence may improve matching between different classification schemas. We concentrate specifcally on the task of mapping between Wikipedia categories and Cycterms as well as the classication of Wikipedia articles to the Cyctaxonomy and show how this process may be improved by consuming the evidence that is available in different editions of Wikipedia. The results show that the performance of the mapping procedure may be improved from 0.6 to 4.9 percentage points, depending on the number of external Wikipedia editions and the given task

Jagiellonian Univeristy Repository

Meta‐User2Vec model for addressing the user and item cold‐start problem in recommender systems

Author: Indurkhya Bipin
Misztal‐Radecka Joanna
Smywiński‐Pohl Aleksander
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

The cold-start scenario is a critical problem for recommendation systems, especially in dynamically changing domains such as online news services. In this research, we aim at addressing the cold-start situation by adapting an unsupervised neural User2Vec method to represent new users and articles in a multidimensional space. Toward this goal, we propose an extension of the Doc2Vec model that is capable of representing users with unknown history by building embeddings of their metadata labels along with item representations. We evaluate our proposed approach with respect to different parameter configurations on three real-world recommendation datasets with different characteristics. Our results show that this approach may be applied as an efficient alternative to the factorization machine-based method when the user and item metadata are used and hence can be applied in the cold-start scenario for both new users and new items. Additionally, as our solution represents the user and item labels in the same vector space, we can analyze the spatial relations among these labels to reveal latent interest features of the audience groups as well as possible data biases and disparities

Jagiellonian Univeristy Repository

Improving the Wikipedia Miner word sense disambiguation algorithm

Author: Smywiński-Pohl Aleksander
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

This document describes the improvements of the Wikipedia Miner word sense disambiguation algorithm. The original algorithm performs very well in detecting key terms in documents and disambiguating them against Wikipedia articles. By replacing the original Normalized Google Distance inspired measure with Jaccard coefficient inspired measure and taking into account additional features, the disambiguation algorithm was improved by 8 percentage points (F1-measure), without impeding its performance nor introducing any additional preprocessing overhead. This document also presents some statistical data that are extracted from the Polish Wikipedia by Wikipedia Miner. An automatic evaluation of the performance of the disambiguation algorithm for Polish shows that it is almost as good as for English, even though the Polish Wikipedia has only a quarter of the number of the articles of the English Wikipedia

Jagiellonian Univeristy Repository

Knowledge-based named entity recognition in polish

Author: Smywiński-Pohl Aleksander
Publication venue: Polskie Towarzystwo Informatyczne
Publication date: 01/01/2013
Field of study

This document describes an algorithm aimed at recognizing Named Entities in Polish text, which is powered by two knowledge sources: the Polish Wikipedia and the Cyc ontology. Besides providing the rough types for the recognized entities, the algorithm links them to the Wikipedia pages and assigns precise semantic types taken from Cyc. The algorithm is verified against manually identified Named Entities in the one- million sub-corpus of the National Corpus of Polis

Jagiellonian Univeristy Repository

ROD : Ruby Object Database

Author: Smywiński-Pohl Aleksander
Publication venue
Publication date: 01/01/2012
Field of study

ROD (Ruby Object Database) jest otwartą, obiektową bazą danych zaprojektowaną do przechowywania i odczytywania danych, które rzadko ulegają zmianie. Podstawowym powodem jej utworzenia była chęć stworzenia bazy dla słowników oraz korpusów wykorzystywanych w przetwarzaniu języka naturalnego. Baza ta jest zoptymalizowana pod kątem szybkości odczytu danych orazłatwości jej użycia.Ruby Object Database is an open-source object database designed for storing and accessing data which rarely changes. The primary reason for designing it was to create a storage facility for natural language dictionaries and corpora. it is optimazed for reading speed and easiness of useage

Jagiellonian Univeristy Repository

Extraction of "part-whole" relations from Polish texts based on Wikipedia and Cyc

Author: Smywiński-Pohl Aleksander
Publication venue: Fundacja Uniwersytetu im. Adama Mickiewicza
Publication date: 01/01/2015
Field of study

Jagiellonian Univeristy Repository

An ontology-based method for an efficient acquisition of relation extraction training and testing examples

Author: Smywiński-Pohl Aleksander
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

In this paper, we describe an ontology-based method of selection of test examples for relation extraction, as well as a method of their validation apt to be carried out by ordinary language-speakers. The results will be used to validate performance of various relation extraction algorithms. In performed tests we utilize the ResearchCyc ontology and demonstrate the method's performance in gathering examples from Polish texts

Jagiellonian Univeristy Repository