11 research outputs found

    Application of Named Entity Recognition via Twitter on SpaCy in Indonesian (Case Study : Power Failure in the Special Region of Yogyakarta)

    Get PDF
    SpaCy is a tool that can efficiently handle Natural Language Processing (NLP) problems, one of which is Named Entity Recognition (NER). NER is used to extract and identify named entities in a text. However, so far SpaCy has not officially released the NER model pre-train for Indonesian. On the other hand, based on the 2019 PLN statistical report, the Province of D.I. Yogyakarta is a province that often experiences power failure and many complaints from the public are found on Twitter related to power failure that occur in the province. This is because there is no research on extracting information related to electrical disturbances and research on NER using SpaCy in Indonesian is still rare. So in this study, information extraction related to power failure in the Province of D.I. will be carried out. Yogyakarta via twitter using Indonesian SpaCy. This study produces good performance results with 95.52% precision calculation, 93.27% recall, and 94.38% f1-score. Then, mapping is carried out based on the location entities contained in tweets related to electrical disturbances. From this process, it was found that the highest number of locations mentioned in the tweet related to power failure came from Sleman Regency, while the lowest number came from Gunung Kidul Regency. Then, the month that experienced the most power failure was March 2020, while the month that experienced the least amount of electricity was July 2020

    Fast and Accurate Recognition of Chinese Clinical Named Entities with Residual Dilated Convolutions

    Full text link
    Clinical Named Entity Recognition (CNER) aims to identify and classify clinical terms such as diseases, symptoms, treatments, exams, and body parts in electronic health records, which is a fundamental and crucial task for clinical and translation research. In recent years, deep learning methods have achieved significant success in CNER tasks. However, these methods depend greatly on Recurrent Neural Networks (RNNs), which maintain a vector of hidden activations that are propagated through time, thus causing too much time to train models. In this paper, we propose a Residual Dilated Convolutional Neural Network with Conditional Random Field (RD-CNN-CRF) to solve it. Specifically, Chinese characters and dictionary features are first projected into dense vector representations, then they are fed into the residual dilated convolutional neural network to capture contextual features. Finally, a conditional random field is employed to capture dependencies between neighboring tags. Computational results on the CCKS-2017 Task 2 benchmark dataset show that our proposed RD-CNN-CRF method competes favorably with state-of-the-art RNN-based methods both in terms of computational performance and training time.Comment: 8 pages, 3 figures. Accepted as regular paper by 2018 IEEE International Conference on Bioinformatics and Biomedicine. arXiv admin note: text overlap with arXiv:1804.0501

    БСмантичСскоС Π°Π½Π½ΠΎΡ‚ΠΈΡ€ΠΎΠ²Π°Π½ΠΈΠ΅ тСкстовых Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚ΠΎΠ² Π½Π° основС иСрархичСской Ρ€Π°Π΄ΠΈΠ°Π»ΡŒΠ½ΠΎ-базисной Π½Π΅ΠΉΡ€ΠΎΠ½Π½ΠΎΠΉ сСти

    Get PDF
    The hierarchical radial basis function neural network with a multi-layered architecture is proposed. This neural network is used for extracting knowledge from textual sources with the maximum number of relevant attributes for each object and assigns it to the selected class of ontology.Π’ Ρ€Π°Π±ΠΎΡ‚Π΅ ΠΏΡ€Π΅Π΄Π»ΠΎΠΆΠ΅Π½Π° иСрархичСская Ρ€Π°Π΄ΠΈΠ°Π»ΡŒΠ½ΠΎ-базисная нСйронная ΡΠ΅Ρ‚ΡŒ с многослойной Π°Ρ€Ρ…ΠΈΡ‚Π΅ΠΊΡ‚ΡƒΡ€ΠΎΠΉ, которая ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅Ρ‚ΡΡ для извлСчСния Π·Π½Π°Π½ΠΈΠΉ ΠΈΠ· тСкстовых источников с ΡƒΡ‡Π΅Ρ‚ΠΎΠΌ максимального количСства Ρ€Π΅Π»Π΅Π²Π°Π½Ρ‚Π½Ρ‹Ρ… ΠΏΡ€ΠΈΠ·Π½Π°ΠΊΠΎΠ² ΠΊΠ°ΠΆΠ΄ΠΎΠ³ΠΎ ΠΎΠ±ΡŠΠ΅ΠΊΡ‚Π° ΠΈ отнСсСния Π΅Π³ΠΎ ΠΊ Π²Ρ‹Π±Ρ€Π°Π½Π½ΠΎΠΌΡƒ классу ΠΎΠ½Ρ‚ΠΎΠ»ΠΎΠ³ΠΈΠΈ.Π’ Ρ€ΠΎΠ±ΠΎΡ‚Ρ– Π·Π°ΠΏΡ€ΠΎΠΏΠΎΠ½ΠΎΠ²Π°Π½ΠΎ Ρ–Ρ”Ρ€Π°Ρ€Ρ…Ρ–Ρ‡Π½Ρƒ Ρ€Π°Π΄Ρ–Π°Π»ΡŒΠ½ΠΎ-базисну Π½Π΅ΠΉΡ€ΠΎΠ½Π½Ρƒ ΠΌΠ΅Ρ€Π΅ΠΆΡƒ Π· Π±Π°Π³Π°Ρ‚ΠΎΡˆΠ°Ρ€ΠΎΠ²ΠΎΡŽ Π°Ρ€Ρ…Ρ–Ρ‚Π΅ΠΊΡ‚ΡƒΡ€ΠΎΡŽ, яка Π²ΠΈΠΊΠΎΡ€ΠΈΡΡ‚ΠΎΠ²ΡƒΡ”Ρ‚ΡŒΡΡ для видобування знань Ρ–Π· тСкстових Π΄ΠΆΠ΅Ρ€Π΅Π» Ρ–Π· урахуванням ΠΌΠ°ΠΊΡΠΈΠΌΠ°Π»ΡŒΠ½ΠΎΡ— ΠΊΡ–Π»ΡŒΠΊΠΎΡΡ‚Ρ– Ρ€Π΅Π»Π΅Π²Π°Π½Ρ‚Π½ΠΈΡ… ΠΎΠ·Π½Π°ΠΊ ΠΊΠΎΠΆΠ½ΠΎΠ³ΠΎ об’єкта Ρ‚Π° віднСсСння ΠΉΠΎΠ³ΠΎ Π΄ΠΎ ΠΎΠ±Ρ€Π°Π½ΠΎΠ³ΠΎ класу ΠΎΠ½Ρ‚ΠΎΠ»ΠΎΠ³Ρ–Ρ—

    БСмантичСскоС Π°Π½Π½ΠΎΡ‚ΠΈΡ€ΠΎΠ²Π°Π½ΠΈΠ΅ тСкстовых Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚ΠΎΠ² Π½Π° основС иСрархичСской Ρ€Π°Π΄ΠΈΠ°Π»ΡŒΠ½ΠΎ-базисной Π½Π΅ΠΉΡ€ΠΎΠ½Π½ΠΎΠΉ сСти

    Get PDF
    The hierarchical radial basis function neural network with a multi-layered architecture is proposed. This neural network is used for extracting knowledge from textual sources with the maximum number of relevant attributes for each object and assigns it to the selected class of ontology.Π’ Ρ€Π°Π±ΠΎΡ‚Π΅ ΠΏΡ€Π΅Π΄Π»ΠΎΠΆΠ΅Π½Π° иСрархичСская Ρ€Π°Π΄ΠΈΠ°Π»ΡŒΠ½ΠΎ-базисная нСйронная ΡΠ΅Ρ‚ΡŒ с многослойной Π°Ρ€Ρ…ΠΈΡ‚Π΅ΠΊΡ‚ΡƒΡ€ΠΎΠΉ, которая ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅Ρ‚ΡΡ для извлСчСния Π·Π½Π°Π½ΠΈΠΉ ΠΈΠ· тСкстовых источников с ΡƒΡ‡Π΅Ρ‚ΠΎΠΌ максимального количСства Ρ€Π΅Π»Π΅Π²Π°Π½Ρ‚Π½Ρ‹Ρ… ΠΏΡ€ΠΈΠ·Π½Π°ΠΊΠΎΠ² ΠΊΠ°ΠΆΠ΄ΠΎΠ³ΠΎ ΠΎΠ±ΡŠΠ΅ΠΊΡ‚Π° ΠΈ отнСсСния Π΅Π³ΠΎ ΠΊ Π²Ρ‹Π±Ρ€Π°Π½Π½ΠΎΠΌΡƒ классу ΠΎΠ½Ρ‚ΠΎΠ»ΠΎΠ³ΠΈΠΈ.Π’ Ρ€ΠΎΠ±ΠΎΡ‚Ρ– Π·Π°ΠΏΡ€ΠΎΠΏΠΎΠ½ΠΎΠ²Π°Π½ΠΎ Ρ–Ρ”Ρ€Π°Ρ€Ρ…Ρ–Ρ‡Π½Ρƒ Ρ€Π°Π΄Ρ–Π°Π»ΡŒΠ½ΠΎ-базисну Π½Π΅ΠΉΡ€ΠΎΠ½Π½Ρƒ ΠΌΠ΅Ρ€Π΅ΠΆΡƒ Π· Π±Π°Π³Π°Ρ‚ΠΎΡˆΠ°Ρ€ΠΎΠ²ΠΎΡŽ Π°Ρ€Ρ…Ρ–Ρ‚Π΅ΠΊΡ‚ΡƒΡ€ΠΎΡŽ, яка Π²ΠΈΠΊΠΎΡ€ΠΈΡΡ‚ΠΎΠ²ΡƒΡ”Ρ‚ΡŒΡΡ для видобування знань Ρ–Π· тСкстових Π΄ΠΆΠ΅Ρ€Π΅Π» Ρ–Π· урахуванням ΠΌΠ°ΠΊΡΠΈΠΌΠ°Π»ΡŒΠ½ΠΎΡ— ΠΊΡ–Π»ΡŒΠΊΠΎΡΡ‚Ρ– Ρ€Π΅Π»Π΅Π²Π°Π½Ρ‚Π½ΠΈΡ… ΠΎΠ·Π½Π°ΠΊ ΠΊΠΎΠΆΠ½ΠΎΠ³ΠΎ об’єкта Ρ‚Π° віднСсСння ΠΉΠΎΠ³ΠΎ Π΄ΠΎ ΠΎΠ±Ρ€Π°Π½ΠΎΠ³ΠΎ класу ΠΎΠ½Ρ‚ΠΎΠ»ΠΎΠ³Ρ–Ρ—

    Named Entity Recognition for Nepali Text Using Support Vector Machines

    Get PDF
    Abstract Named Entity Recognition aims to identify and to classify rigid designators in text such as proper names, biological species, and temporal expressions into some predefined categories. There has been growing interest in this field of research since the early 1990s. Named Entity Recognition has a vital role in different fields of natural language processing such as Machine Translation, Information Extraction, Question Answering System and various other fields. In this paper, Named Entity Recognition for Nepali text, based on the Support Vector Machine (SVM) is presented which is one of machine learning approaches for the classification task. A set of features are extracted from training data set. Accuracy and efficiency of SVM classifier are analyzed in three different sizes of training data set. Recognition systems are tested with ten datasets for Nepali text. The strength of this work is the efficient feature extraction and the comprehensive recognition techniques. The Support Vector Machine based Named Entity Recognition is limited to use a certain set of features and it uses a small dictionary which affects its performance. The learning performance of recognition system is observed. It is found that system can learn well from the small set of training data and increase the rate of learning on the increment of training size

    Using microtasks to crowdsource DBpedia entity classification: A study in workflow design

    No full text
    DBpedia is at the core of the Linked Open Data Cloud and widely used in research and applications. However, it is far from being perfect. Its content suffers from many flaws, as a result of factual errors inherited from Wikipedia or incomplete mappings from Wikipedia infobox to DBpedia ontology. In this work we focus on one class of such problems, un-typed entities. We propose a hierarchical tree-based approach to categorize DBpedia entities according to the DBpedia ontology using human computation and paid microtasks. We analyse the main dimensions of the crowdsourcing exercise in depth in order to come up with suggestions for workflow design and study three different workflows with automatic and hybrid prediction mechanisms to select possible candidates for the most specific category from the DBpedia ontology. To test our approach, we run experiments on CrowdFlower using a gold standard dataset of 120 previously unclassified entities. In our studies human-computation driven approaches generally achieved higher precision at lower cost when compared to workflows with automatic predictors. However, each of the tested workflows has its merit and none of them seems to perform exceptionally well on the entities that the DBpedia Extraction Framework fails to classify. We discuss these findings and their potential implications for the design of effective crowdsourced entity classification in DBpedia and beyond

    Extracting Named Entities Using Support Vector Machines

    No full text

    KLASIFIKASI DAN EKSTRAKSI INFORMASI DARI TWEET E-COMMERCE DENGAN MENGGUNAKAN NAÏVE BAYES CLASSIFIER DAN PENDEKATAN RULE BASED

    Get PDF
    KLASIFIKASI DAN EKSTRAKSI INFORMASI DARI TWEET E-COMMERCE DENGAN MENGGUNAKAN NAÏVE BAYES CLASSIFIER DAN PENDEKATAN RULE BASED ISMA ALGHOSANI 11451204728 Tanggal Sidang: 26 Januari 2021 Periode Wisuda: November 2021 Jurusan Teknik Informatika Fakultas Sains dan Teknologi Universitas Islam Negeri Sultan Syarif Kasim Riau ABSTRAK Indonesia termasuk negara dengan basis pengguna terbanyak twitter, termasuk dalam kicauan masyarakat terkait e-commerce. Hal ini menjadikan Twitter sebagai salah satu media untuk komunikasi dan menarik perhatian pembeli. Namun, data yang terdapat pada twitter tidak terstruktur, dan dibutuhkan suatu metode untuk mendapatkan informasi pada tweet. Terdapat empat rantai utama dalam melakukan transaksi, yaitu, sebelum transaksi, transaksi, dan purna transaksi. Informasi yang diekstrak melalui tweet yaitu harga, Produk, Waktu, Nomor Order, Nomor Resi, Nomor Transaksi, Nominal Cashback, dan Nomor Handphone. Penelitian ini menggunakan Naïve Bayes sebagai metode klasifikasi dengan menggunakan 1200 dataset berupa tweet dari akun e-commerce, dan menggunakan metode berbasis aturan/rule based untuk mendapatkan entitas tweet. Pengujian pada penelitian ini menggunakan confusion matrix, dengan akurasi tertinggi sebesar 84,16% pada perbandingan 90:10, dan akurasi ekstraksi informasi pada tweet didapatkan hasil 100%. Berdasarkan hasil pengujian klasifikasi dan ekstraksi informasi, Naïve Bayes berhasil dalam mengklasifikasikan data tweet e-commerce dan metode berbasis aturan berhasil dengan baik mengenali entitas tweet. Kata Kunci: E-Commerce, Ekstraksi Informasi, Klasifikasi, Naïve Bayes, Rule Based, Twitte
    corecore