16 research outputs found

    Method for Paraphrase Extraction from the News Text Corpus

    Get PDF
    The paper discusses the process of automatic extraction of paraphrases used in rewriting. The researchers propose the method for extracting paraphrases from English news text corpora. The method is based on both the developed syntactic rules to define phrases and synsets to identify synonymous words in the designed text corpus of BBC news. In order to implement the method, Natural Language Toolkit, Universal Dependencies parser and WordNet are used

    Method for Automatic Collocation Extraction from Ukrainian Corpora

    Get PDF
    The article deals with the methods for automatic collocation extraction from Ukrainian corpora. The task of collocation extraction is considered in terms of a corpus-oriented approach [1], based on statistical measures. The term «collocation» is defined as a non-random combination of two words that go together regularly

    Extraction of Semantic Relations from Wikipedia Text Corpus

    Get PDF
    This paper proposes the algorithm for automatic extraction of semantic relations using the rule-based approach. The authors suggest identifying certain verbs (predicates) between a subject and an object of expressions to obtain a sequence of semantic relations in the designed text corpus of Wikipedia articles. The synsets from WordNet are applied to extract semantic relations between concepts and their synonyms from the text corpus

    Evaluating effectiveness of linguistic technologies of knowledge identification in text collections

    Get PDF
    The possibility of using integral coefficients of recall and precision to evaluate effectiveness of linguistic technologies of knowledge identification in texts is analyzed in the paper. An approach is based on the method of test collections, which is used for experimental validation of received effectiveness coefficients, and on methods of mathematical statistics. The problem of maximizing the reliability of sample results in their propagation on the general population of the tested text collection is studied. The method for determining the confidence interval for the attribute proportion, which is based on Wilson’s formula, and the method for determining the required size of the relevant sample under specified relative error and confidence probability, are considered

    Evaluating effectiveness of linguistic technologies of knowledge identification in text collections

    Get PDF
    The possibility of using integral coefficients of recall and precision to evaluate effectiveness of linguistic technologies of knowledge identification in texts is analyzed in the paper. An approach is based on the method of test collections, which is used for experimental validation of received effectiveness coefficients, and on methods of mathematical statistics. The problem of maximizing the reliability of sample results in their propagation on the general population of the tested text collection is studied. The method for determining the confidence interval for the attribute proportion, which is based on Wilson’s formula, and the method for determining the required size of the relevant sample under specified relative error and confidence probability, are considered

    The logic and linguistic model for automatic extraction of collocation similarity

    Get PDF
    The article discusses the process of automatic identification of collocation similarity. The semantic analysis is one of the most advanced as well as the most difficult NLP task. The main problem of semantic processing is the determination of polysemy and synonymy of linguistic units. In addition, the task becomes complicated in case of word collocations. The paper suggests a logical and linguistic model for automatic determining semantic similarity between colocations in Ukraine and English languages. The proposed model formalizes semantic equivalence of collocations by means of semantic and grammatical characteristics of collocates. The basic idea of this approach is that morphological, syntactic and semantic characteristics of lexical units are to be taken into account for the identification of collocation similarity. Basic mathematical means of our model are logical-algebraic equations of the finite predicates algebra. Verb-noun and noun-adjective collocations in Ukrainian and English languages consist of words belonged to main parts of speech. These collocations are examined in the model. The model allows extracting semantically equivalent collocations from semi-structured and non-structured texts. Implementations of the model will allow to automatically recognize semantically equivalent collocations. Usage of the model allows increasing the effectiveness of natural language processing tasks such as information extraction, ontology generation, sentiment analysis and some others

    Практикум з мовної комунікації (англійська мова)

    Get PDF
    Навчально-методичний посібник з курсу "Практикум з мовної комунікації (англійська мова)" призначено для формування та удосконалення знань і навичок, необхідних для високого практичного володіння англійською мовою, точного та всебічного розуміння оригінальних англійських текстів, а також подальшого поповнення словникового запасу студентів. У навчально-методичному посібнику наведені вправи з лексики, граматики та говоріння, пропонуються тексти з проходження співбесід та відповідні завдання для розвитку навичок письмового та усного мовлення

    Similar Text Fragments Extraction for Identifying Common Wikipedia Communities

    Get PDF
    Similar text fragments extraction from weakly formalized data is the task of natural language processing and intelligent data analysis and is used for solving the problem of automatic identification of connected knowledge fields. In order to search such common communities in Wikipedia, we propose to use as an additional stage a logical-algebraic model for similar collocations extraction. With Stanford Part-Of-Speech tagger and Stanford Universal Dependencies parser, we identify the grammatical characteristics of collocation words. WithWordNet synsets, we choose their synonyms. Our dataset includes Wikipedia articles from different portals and projects. The experimental results show the frequencies of synonymous text fragments inWikipedia articles that form common information spaces. The number of highly frequented synonymous collocations can obtain an indication of key common up-to-date Wikipedia communities

    The future is distributed: a vision of sustainable economies

    Get PDF
    “The Future is distributed: a vision of sustainable economies” is a collection of case studies on distributed economies, a concept describing sustainable alternatives to the existing business models. The authors of this publication are international Masters students of the Environmental Sciences, Policy and Management Programme at the International Institute for Industrial Environmental Economics at Lund University in Sweden. The aim of their work is to demonstrate that local, small-scale, community-based economies are not just part of the theory, but have already been implemented in various sectors and geographical settings

    Analyzing rasters, vectors and time series using new Python interfaces in GRASS GIS 7

    Get PDF
    GRASS GIS 7 is a free and open source GIS software developed and used by many scientists (Neteler et al., 2012). While some users of GRASS GIS prefer its graphical user interface, significant part of the scientific community takes advantage of various scripting and programing interfaces offered by GRASS GIS to develop new models and algorithms. Here we will present different interfaces added to GRASS GIS 7 and available in Python, a popular programming language and environment in geosciences. These Python interfaces are designed to satisfy the needs of scientists and programmers under various circumstances. PyGRASS (Zambelli et al., 2013) is a new object-oriented interface to GRASS GIS modules and libraries. The GRASS GIS libraries are implemented in C to ensure maximum performance and the PyGRASS interface provides an intuitive, pythonic access to their functionality. GRASS GIS Python scripting library is another way of accessing GRASS GIS modules. It combines the simplicity of Bash and the efficiency of the Python syntax. When full access to all low-level and advanced functions and structures from GRASS GIS library is required, Python programmers can use an interface based on the Python ctypes package. Ctypes interface provides complete, direct access to all functionality as it would be available to C programmers. GRASS GIS provides specialized Python library for managing and analyzing spatio-temporal data (Gebbert and Pebesma, 2014). The temporal library introduces space time datasets representing time series of raster, 3D raster or vector maps and allows users to combine various spatio-temporal operations including queries, aggregation, sampling or the analysis of spatio-temporal topology. We will also discuss the advantages of implementing scientific algorithm as a GRASS GIS module and we will show how to write such module in Python. To facilitate the development of the module, GRASS GIS provides a Python library for testing (Petras and Gebbert, 2014) which helps researchers to ensure the robustness of the algorithm, correctness of the results in edge cases as well as the detection of changes in results due to new development. For all modules GRASS GIS automatically creates standardized command line and graphical user interfaces and documentation. Finally, we will show how GRASS GIS can be used together with powerful Python tools such as the NumPy package and the IPython Noteboo
    corecore