2 research outputs found

    NERosetta for the Named Entity Multi-lingual Space

    No full text
    International audienceNamed Entity Recognition has been a hot topic in Natural Language Processing for more than fifteen years. A number of systems for various languages have been developed using different approaches and based on different named entity schemes and tagging strategies. We present the NERosetta web application that can be used for comparison of these various approaches applied to aligned texts (bitexts). In order to illustrate its functionalities, we have used one literary text, its 7 bi-texts involving 5 languages and 5 different NER systems. We present some preliminary results and give guidelines for further development

    The development of library and lenaguage resources for organizing and finding information on spatial planning ; Π Π°Π·Π²ΠΈΡ‚ΠΈΠ΅ Π±ΠΈΠ±Π»ΠΈΠΎΡ‚Π΅Ρ‡Π½Ρ‹Ρ… ΠΈ языковых рСсурсов Π² цСлях ΠΎΡ€Π³Π°Π½ΠΈΠ·Π°Ρ†ΠΈΠΈ ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΠΎΠ½Π½ΠΎΠ³ΠΎ поиска ΠΏΠΎ Ρ‚Π΅Ρ€Ρ€ΠΈΡ‚ΠΎΡ€ΠΈΠ°Π»ΡŒΠ½ΠΎΠΌΡƒ ΠΏΠ»Π°Π½ΠΈΠΎΡ€ΠΎΠ²Π°Π½ΠΈΡŽ

    Get PDF
    Π˜ΠΌΠ°Ρ˜ΡƒΡ›ΠΈ Ρƒ Π²ΠΈΠ΄Ρƒ Π΄Π° ΠΏΡ€Π°Π²ΠΎΠ²Ρ€Π΅ΠΌΠ΅Π½ΠΈ приступ Ρ€Π΅Π»Π΅Π²Π°Π½Ρ‚Π½ΠΈΠΌ ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΡ˜Π°ΠΌΠ°, ΠΊΠ°ΠΎ ΠΈ Π΄Π΅Ρ„ΠΈΠ½ΠΈΡΠ°ΡšΠ΅ ΠΈ Ρ€Π°Π·Π²ΠΎΡ˜ Π°Π΄Π΅ΠΊΠ²Π°Ρ‚Π½Π΅ Ρ‚Π΅Ρ€ΠΌΠΈΠ½ΠΎΠ»ΠΎΠ³ΠΈΡ˜Π΅ прСдставља прСдуслов Π·Π° Π½ΠΎΠ²Π° ΠΈΡΡ‚Ρ€Π°ΠΆΠΈΠ²Π°ΡšΠ° ΠΈ Π΄Π°Ρ™ΠΈ Ρ€Π°Π·Π²ΠΎΡ˜ свакС Π½Π°ΡƒΡ‡Π½Π΅ области, Ρƒ Π΄ΠΈΡΠ΅Ρ€Ρ‚Π°Ρ†ΠΈΡ˜ΠΈ су ΠΏΡ€ΠΈΠΊΠ°Π·Π°Π½Π΅ могућности ΠΏΡ€ΠΎΠ½Π°Π»Π°ΠΆΠ΅ΡšΠ° ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΡ˜Π° ΠΈ Π΅ΠΊΡΡ‚Ρ€Π°Ρ…ΠΎΠ²Π°ΡšΠ° Ρ‚Π΅Ρ€ΠΌΠΈΠ½Π°, Π½Π° ΡƒΠ·ΠΎΡ€Π½ΠΎΠΌ корпусу просторног ΠΏΠ»Π°Π½ΠΈΡ€Π°ΡšΠ° ΠΊΠΎΡ€ΠΈΡˆΡ›Π΅ΡšΠ΅ΠΌ Π½ΠΈΠ·Π° саврСмСних Ρ‚Π΅Ρ…Π½ΠΎΠ»ΠΎΠ³ΠΈΡ˜Π°. Π£ Ρ€Π°Π΄Ρƒ јС ΡƒΠΊΠ°Π·Π°Π½ΠΎ Π½Π° ΠΌΠ½ΠΎΠ³ΠΎΠ±Ρ€ΠΎΡ˜Π½Π΅ погодности Π°Π»ΠΈ ΠΈ извСсна ΠΎΠ³Ρ€Π°Π½ΠΈΡ‡Π΅ΡšΠ° ΠΏΡ€ΠΈΠ»ΠΈΠΊΠΎΠΌ ΠΏΡ€Π΅Ρ‚Ρ€Π°ΠΆΠΈΠ²Π°ΡšΠ° ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΡ˜Π° ΠΊΠΎΡ€ΠΈΡˆΡ›Π΅ΡšΠ΅ΠΌ Π‘ΠΈΠ±Π»ΠΈΠΎΡ‚Π΅Ρ‡ΠΊΠΈΡ… ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΠΎΠ½ΠΈΡ… систСма, ГСографских ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΠΎΠ½ΠΈΡ… систСма ΠΈ Ρ€Π΅ΠΏΠΎΠ·ΠΈΡ‚ΠΎΡ€ΠΈΡ˜ΡƒΠΌΠ° Π ΠΠ£ΠΌΠŸΠ›ΠΠ. Π£ наставку јС описан ΡΠ°Π΄Ρ€ΠΆΠ°Ρ˜, поступак ΠΈΠ·Ρ€Π°Π΄Π΅ ΠΈ ΠΏΠΎΡ‚Π²Ρ€Ρ’Π΅Π½Π° рСпрСзСнтативност Ρ„ΠΎΡ€ΠΌΠΈΡ€Π°Π½ΠΎΠ³ ΡƒΠ·ΠΎΡ€Π½ΠΎΠ³ корпуса просторног ΠΏΠ»Π°Π½ΠΈΡ€Π°ΡšΠ°. ΠžΠ±Ρ€Π°Π΄Π° тСкста, која ΠΏΠΎΠ΄Ρ€Π°Π·ΡƒΠΌΠ΅Π²Π° Ρ‚ΠΎΠΊΠ΅Π½ΠΈΠ·Π°Ρ†ΠΈΡ˜Ρƒ, Π»Π΅ΠΌΠ°Ρ‚ΠΈΠ·Π°Ρ†ΠΈΡ˜Ρƒ, обСлСТавањС врстС Ρ€Π΅Ρ‡ΠΈ, ΠΊΠ°ΠΎ ΠΈ Π΅ΠΊΡΡ‚Ρ€Π°ΠΊΡ†ΠΈΡ˜Ρƒ Ρ‚Π΅Ρ€ΠΌΠΈΠ½Π° ΠΈΠ·Π²Ρ€ΡˆΠ΅Π½Π° јС Π°Π»Π°Ρ‚ΠΎΠΌ Unitex. ΠšΠΎΡ€ΠΏΡƒΡ јС ΠΏΠΎΡ‚ΠΎΠΌ постављСн Π½Π° ΠΏΡ€Π°Ρ‚Ρ„ΠΎΡ€ΠΌΡƒ NoSketch Π³Π΄Π΅ јС, Π½Π° основу постављСних ΡƒΠΏΠΈΡ‚Π°, ΠΏΠΎΡ‚Π²Ρ€Ρ’Π΅Π½ Π·Π½Π°Ρ‡Π°Ρ˜ ΠΏΡ€Π΅Ρ‚Ρ…ΠΎΠ΄Π½Π΅ ΠΎΠ±Ρ€Π°Π΄Π΅ тСкстова која ΠΎΠΌΠΎΠ³ΡƒΡ›Π°Π²Π° ΠΏΡ€Π΅Ρ‚Ρ€Π°ΠΆΠΈΠ²Π°ΡšΠ΅ са Π·Π½Π°Ρ‚Π½ΠΎ Π²Π΅Ρ›ΠΈΠΌ ΠΈΠ½Π΄ΠΈΠΊΠ°Ρ‚ΠΎΡ€ΠΎΠΌ ΠΎΠ΄Π·ΠΈΠ²Π° ΠΈ прСцизности. ИздвајањСм тСкстова просторних ΠΏΠ»Π°Π½ΠΎΠ²Π° ΠΈΠ· ΡƒΠ·ΠΎΡ€Π½ΠΎΠ³ корпуса, Ρ„ΠΎΡ€ΠΌΠΈΡ€Π°Π½ јС поткорпус PPTXM, Π½Π° ΠΊΠΎΠΌ су Π²Ρ€ΡˆΠ΅Π½Π° прСостала ΠΈΡΡ‚Ρ€Π°ΠΆΠΈΠ²Π°ΡšΠ°. ΠšΠΎΡ€ΠΈΡˆΡ›Π΅ΡšΠ΅ΠΌ Π½Π°ΠΏΡ€Π΅Π΄Π½ΠΈΡ… ΠΌΠ΅Ρ‚ΠΎΠ΄Π° ΠΈ Ρ‚Π΅Ρ…Π½ΠΎΠ»ΠΎΠ³ΠΈΡ˜Π°, Π°Π»Π°Ρ‚ΠΎΠΌ SrpNER ΠΈΠ·Π²Ρ€ΡˆΠ΅Π½ΠΎ јС обСлСТавањС ΠΈ Π΅ΠΊΡΡ‚Ρ€Π°Ρ…ΠΎΠ²Π°ΡšΠ΅ Ρ€Π°Π·Π»ΠΈΡ‡ΠΈΡ‚ΠΈΡ… Π³Ρ€ΡƒΠΏΠ° ΠΈΠΌΠ΅Π½ΠΎΠ²Π°Π½ΠΈΡ… Π΅Π½Ρ‚ΠΈΡ‚Π΅Ρ‚Π°. Π—Π½Π°Ρ‡Π°Ρ˜Π°Π½ допринос ΠΎΠ²Π΅ Π΄ΠΈΡΠ΅Ρ€Ρ‚Π°Ρ†ΠΈΡ˜Π΅ ΠΎΠ³Π»Π΅Π΄Π° сС ΠΈ Ρƒ ΠΏΠΎΠ²Π΅Π·ΠΈΠ²Π°ΡšΡƒ ΠΈΠΌΠ΅Π½ΠΎΠ²Π°Π½ΠΈΡ… Π΅Π½Ρ‚ΠΈΡ‚Π΅Ρ‚Π° Ρƒ INCEpTION ΠΎΠΊΡ€ΡƒΠΆΠ΅ΡšΡƒ са ставкама ΠΈΠ· Π±Π°Π·Π΅ знања Π’ΠΈΠΊΠΈΠΏΠΎΠ΄Π°Ρ†ΠΈ. ΠŸΠΎΠΌΠ΅Π½ΡƒΡ‚Π° Π±Π°Π·Π° знања ΠΎΠΌΠΎΠ³ΡƒΡ›ΠΈΠ»Π° јС Π³Ρ€ΡƒΠΏΠΈΡΠ°ΡšΠ΅ ставки, ΠΊΡ€Π΅ΠΈΡ€Π°ΡšΠ΅ΠΌ SPARQL ΡƒΠΏΠΈΡ‚Π°, ΠΏΡ€Π΅ΠΌΠ° Π·Π°Π΄Π°Ρ‚ΠΈΠΌ ΠΊΡ€ΠΈΡ‚Π΅Ρ€ΠΈΡ˜ΡƒΠΌΠΈΠΌΠ°. Π’ΠΈΠ·ΡƒΠ΅Π»ΠΈΠ·Π°Ρ†ΠΈΡ˜Π° ΠΈΠ·Π»Π°Π·Π½ΠΈΡ… скупова прСдстављСна јС Ρƒ Π²ΠΈΠ΄Ρƒ ΠΌΠ°ΠΏΠ°, Π³Ρ€Π°Ρ„ΠΎΠ²Π°, Ρ‚Π°Π±Π΅Π»Π° ΠΈ ΠΎΠΊΠ²ΠΈΡ€Π° са Ρ„ΠΎΡ‚ΠΎΠ³Ρ€Π°Ρ„ΠΈΡ˜Π°ΠΌΠ°. Π£ TXM ΠΎΠΊΡ€ΡƒΠΆΠ΅ΡšΡƒ Ρ…ΠΈΡ˜Π΅Ρ€Π°Ρ…ΠΈΡ˜ΡΠΊΠΎΠΌ Π°Π½Π°Π»ΠΈΠ·ΠΎΠΌ јС ΡƒΠΊΠ°Π·Π°Π½ΠΎ Π½Π° структуралнС особинС корпуса: Π±Ρ€ΠΎΡ˜ тСкстова, пасуса, Ρ€Π΅Ρ‡Π΅Π½ΠΈΡ†Π° ΠΈ корпусних Ρ€Π΅Ρ‡ΠΈ. ΠšΠΎΡ€ΠΈΡˆΡ›Π΅ΡšΠ΅ΠΌ ΠΌΠΎΡ€Ρ„ΠΎΠ»ΠΎΡˆΠΊΠΈΡ… Π΅Ρ‚ΠΈΠΊΠ΅Ρ‚Π°, Ρƒ ΠΎΠΊΠ²ΠΈΡ€Ρƒ TXM систСма ΡƒΡ‚Π²Ρ€Ρ’Π΅Π½Π° јС фрСквСнтност ΠΏΠΎΡ˜Π°Π²Ρ™ΠΈΠ²Π°ΡšΠ° Ρ€Π°Π·Π»ΠΈΡ‡ΠΈΡ‚ΠΈΡ… врста Ρ€Π΅Ρ‡ΠΈ ΠΈ Π·Π½Π°ΠΊΠΎΠ²Π° ΠΈΠ½Ρ‚Π΅Ρ€ΠΏΡƒΠ½ΠΊΡ†ΠΈΡ˜Π΅ Ρƒ Ρ‡ΠΈΡ‚Π°Π²ΠΎΠΌ корпусу. Π‘ΡƒΠ΄ΡƒΡ›ΠΈ Π΄Π° систСм TXM Π΄ΠΎΠ·Π²ΠΎΡ™Π°Π²Π° ΠΈ ΠΏΡ€ΠΈΠΊΠ°Π·ΠΈΠ²Π°ΡšΠ΅ спСцифичних Ρ˜Π΅Π·ΠΈΡ‡ΠΊΠΈΡ… појава, ΠΎΠΌΠΎΠ³ΡƒΡ›Π΅Π½ΠΎ јС ΠΈ ΠΏΡ€Π°Ρ›Π΅ΡšΠ΅ ΠΏΡ€ΠΎΠ³Ρ€Π΅ΡΠΈΡ˜e, односно ΠΊΡƒΠΌΡƒΠ»Π°Ρ‚ΠΈΠ²Π½e Ρ„Ρ€Π΅ΠΊΠ²Π΅Π½Ρ†ΠΈΡ˜e Ρ€Π°Π·Π»ΠΈΡ‡ΠΈΡ‚ΠΈΡ… врста Ρ€Π΅Ρ‡ΠΈ, ΠΊΠ°ΠΊΠΎ ΠΊΡ€ΠΎΠ· Ρ†Π΅ΠΎ корпус, Ρ‚Π°ΠΊΠΎ ΠΈ ΠΊΡ€ΠΎΠ· њСговС саставнС Π΄Π΅Π»ΠΎΠ²Π΅.Bearing in mind that timely access to relevant information, as well as defining and developing adequate terminology, is a prerequisite for new research and further development in any scientific field, this dissertation presents the possibilities with regard to retrieving information and extracting terms for the sample corpus of spatial planning using a number of modern technologies. The study points out many benefits, but also certain limitations faced when searching for information using Library Information Systems, Geographic Information Systems and the RAUmPLAN repository. The content, preparation process and confirmed representativeness of the sample corpus formed for spatial planning are described below. Processing the text, which includes tokenization, lemmatization, highlighting types of words, and extracting terms was carried out using the Unitex tool. The corpus was then placed on the NoSketch platform, where, on the basis of set queries, the importance of the previous processing of the text was confirmed, making it possible to search with a significantly higher indicator of response and accuracy. By separating the texts of spatial plans from the sample corpus, the PPTXM sub-corpus was formed, on which the remaining research was conducted. Using advanced methods and technologies, the SrpNER tool highlighted and extracted various groups of named entities. The significant contribution of this dissertation is seen in the way it connects named entities in the INCEpTION environment with items from the Wikidata knowledge base. This knowledge base enabled the grouping of items by creating SPARQL queries, according to the given criteria. The output sets were visualized in the form of maps, graphs, tables and photo frames. Hierarchical analysis in the TXM environment indicated the structural features of the corpus: the number of texts, paragraphs, sentences and corpus words. Using morphological labels, the frequency of occurrence of different types of words and punctuation marks in the entire corpus was determined within the TXM system. Since the TXM system allows the display of specific linguistic phenomena, it was also possible to monitor the progression, i.e., the cumulative frequency of different types of words, both throughout the whole corpus and through its constituent parts
    corecore