3 research outputs found

    Algorithms and methods for automated construction of knowledge graphs based on text sources

    Get PDF
    In this article, we present our path towards building knowledge graphs automatically from Russian texts. We explore various methodologies and libraries to extract triples, which are the fundamental building blocks of knowledge graphs. Our approach involves the use of libraries for analyzing morphological characteristics of words, such as PyMorphy and Yandex Mystem, to construct triples. We also utilize the NLP library spaCy to analyze text and build triples based on semantic relationships recognized by the library. However, we found that in some cases, we could not extract relationships from the text, leading us to use word2vec to define relationships. Unfortunately, the results obtained from word2vec were unsatisfactory and could not be used as relationships. We also encountered the problem of building triples from text due to the use of pronouns. To address this issue, we explored the use of coreference resolution libraries, but unfortunately, there are no working libraries available for the Russian language at this time. Our results highlight both positive and negative outcomes of applying these methodologies and libraries, providing insights into the challenges and opportunities of building knowledge graphs automatically from Russian texts

    Automatic construction of semantic networks

    No full text
    Presented work explores the possibilities of automatic construction and expansion of semantic networks with use of machine learning methods. The main focus is put on the feature retrieving procedure for the data set. The work presents a robust method of semantic relation retrieval, based on distributional hypothesis and trained on the data from Czech WordNet. We also show the first results for czech language in this area of research. Part of the thesis is also a set of software for processing and evaluating of input data and a overview and discussion about its results on real-world data. The resulting tools can process data of amount in orders of hundreds of millions of words. The research part of the thesis used Czech morphologicaly and syntacticaly annotated data, but the methods are not language dependent

    Automatic construction of semantic networks

    No full text
    corecore