11 research outputs found
Application of Named Entity Recognition via Twitter on SpaCy in Indonesian (Case Study : Power Failure in the Special Region of Yogyakarta)
SpaCy is a tool that can efficiently handle Natural Language Processing (NLP) problems, one of which is Named Entity Recognition (NER). NER is used to extract and identify named entities in a text. However, so far SpaCy has not officially released the NER model pre-train for Indonesian. On the other hand, based on the 2019 PLN statistical report, the Province of D.I. Yogyakarta is a province that often experiences power failure and many complaints from the public are found on Twitter related to power failure that occur in the province. This is because there is no research on extracting information related to electrical disturbances and research on NER using SpaCy in Indonesian is still rare. So in this study, information extraction related to power failure in the Province of D.I. will be carried out. Yogyakarta via twitter using Indonesian SpaCy. This study produces good performance results with 95.52% precision calculation, 93.27% recall, and 94.38% f1-score. Then, mapping is carried out based on the location entities contained in tweets related to electrical disturbances. From this process, it was found that the highest number of locations mentioned in the tweet related to power failure came from Sleman Regency, while the lowest number came from Gunung Kidul Regency. Then, the month that experienced the most power failure was March 2020, while the month that experienced the least amount of electricity was July 2020
Fast and Accurate Recognition of Chinese Clinical Named Entities with Residual Dilated Convolutions
Clinical Named Entity Recognition (CNER) aims to identify and classify
clinical terms such as diseases, symptoms, treatments, exams, and body parts in
electronic health records, which is a fundamental and crucial task for clinical
and translation research. In recent years, deep learning methods have achieved
significant success in CNER tasks. However, these methods depend greatly on
Recurrent Neural Networks (RNNs), which maintain a vector of hidden activations
that are propagated through time, thus causing too much time to train models.
In this paper, we propose a Residual Dilated Convolutional Neural Network with
Conditional Random Field (RD-CNN-CRF) to solve it. Specifically, Chinese
characters and dictionary features are first projected into dense vector
representations, then they are fed into the residual dilated convolutional
neural network to capture contextual features. Finally, a conditional random
field is employed to capture dependencies between neighboring tags.
Computational results on the CCKS-2017 Task 2 benchmark dataset show that our
proposed RD-CNN-CRF method competes favorably with state-of-the-art RNN-based
methods both in terms of computational performance and training time.Comment: 8 pages, 3 figures. Accepted as regular paper by 2018 IEEE
International Conference on Bioinformatics and Biomedicine. arXiv admin note:
text overlap with arXiv:1804.0501
Π‘Π΅ΠΌΠ°Π½ΡΠΈΡΠ΅ΡΠΊΠΎΠ΅ Π°Π½Π½ΠΎΡΠΈΡΠΎΠ²Π°Π½ΠΈΠ΅ ΡΠ΅ΠΊΡΡΠΎΠ²ΡΡ Π΄ΠΎΠΊΡΠΌΠ΅Π½ΡΠΎΠ² Π½Π° ΠΎΡΠ½ΠΎΠ²Π΅ ΠΈΠ΅ΡΠ°ΡΡ ΠΈΡΠ΅ΡΠΊΠΎΠΉ ΡΠ°Π΄ΠΈΠ°Π»ΡΠ½ΠΎ-Π±Π°Π·ΠΈΡΠ½ΠΎΠΉ Π½Π΅ΠΉΡΠΎΠ½Π½ΠΎΠΉ ΡΠ΅ΡΠΈ
The hierarchical radial basis function neural network with a multi-layered architecture is proposed. This neural network is used for extracting knowledge from textual sources with the maximum number of relevant attributes for each object and assigns it to the selected class of ontology.Π ΡΠ°Π±ΠΎΡΠ΅ ΠΏΡΠ΅Π΄Π»ΠΎΠΆΠ΅Π½Π° ΠΈΠ΅ΡΠ°ΡΡ
ΠΈΡΠ΅ΡΠΊΠ°Ρ ΡΠ°Π΄ΠΈΠ°Π»ΡΠ½ΠΎ-Π±Π°Π·ΠΈΡΠ½Π°Ρ Π½Π΅ΠΉΡΠΎΠ½Π½Π°Ρ ΡΠ΅ΡΡ Ρ ΠΌΠ½ΠΎΠ³ΠΎΡΠ»ΠΎΠΉΠ½ΠΎΠΉ Π°ΡΡ
ΠΈΡΠ΅ΠΊΡΡΡΠΎΠΉ, ΠΊΠΎΡΠΎΡΠ°Ρ ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΠ΅ΡΡΡ Π΄Π»Ρ ΠΈΠ·Π²Π»Π΅ΡΠ΅Π½ΠΈΡ Π·Π½Π°Π½ΠΈΠΉ ΠΈΠ· ΡΠ΅ΠΊΡΡΠΎΠ²ΡΡ
ΠΈΡΡΠΎΡΠ½ΠΈΠΊΠΎΠ² Ρ ΡΡΠ΅ΡΠΎΠΌ ΠΌΠ°ΠΊΡΠΈΠΌΠ°Π»ΡΠ½ΠΎΠ³ΠΎ ΠΊΠΎΠ»ΠΈΡΠ΅ΡΡΠ²Π° ΡΠ΅Π»Π΅Π²Π°Π½ΡΠ½ΡΡ
ΠΏΡΠΈΠ·Π½Π°ΠΊΠΎΠ² ΠΊΠ°ΠΆΠ΄ΠΎΠ³ΠΎ ΠΎΠ±ΡΠ΅ΠΊΡΠ° ΠΈ ΠΎΡΠ½Π΅ΡΠ΅Π½ΠΈΡ Π΅Π³ΠΎ ΠΊ Π²ΡΠ±ΡΠ°Π½Π½ΠΎΠΌΡ ΠΊΠ»Π°ΡΡΡ ΠΎΠ½ΡΠΎΠ»ΠΎΠ³ΠΈΠΈ.Π ΡΠΎΠ±ΠΎΡΡ Π·Π°ΠΏΡΠΎΠΏΠΎΠ½ΠΎΠ²Π°Π½ΠΎ ΡΡΡΠ°ΡΡ
ΡΡΠ½Ρ ΡΠ°Π΄ΡΠ°Π»ΡΠ½ΠΎ-Π±Π°Π·ΠΈΡΠ½Ρ Π½Π΅ΠΉΡΠΎΠ½Π½Ρ ΠΌΠ΅ΡΠ΅ΠΆΡ Π· Π±Π°Π³Π°ΡΠΎΡΠ°ΡΠΎΠ²ΠΎΡ Π°ΡΡ
ΡΡΠ΅ΠΊΡΡΡΠΎΡ, ΡΠΊΠ° Π²ΠΈΠΊΠΎΡΠΈΡΡΠΎΠ²ΡΡΡΡΡΡ Π΄Π»Ρ Π²ΠΈΠ΄ΠΎΠ±ΡΠ²Π°Π½Π½Ρ Π·Π½Π°Π½Ρ ΡΠ· ΡΠ΅ΠΊΡΡΠΎΠ²ΠΈΡ
Π΄ΠΆΠ΅ΡΠ΅Π» ΡΠ· ΡΡΠ°Ρ
ΡΠ²Π°Π½Π½ΡΠΌ ΠΌΠ°ΠΊΡΠΈΠΌΠ°Π»ΡΠ½ΠΎΡ ΠΊΡΠ»ΡΠΊΠΎΡΡΡ ΡΠ΅Π»Π΅Π²Π°Π½ΡΠ½ΠΈΡ
ΠΎΠ·Π½Π°ΠΊ ΠΊΠΎΠΆΠ½ΠΎΠ³ΠΎ ΠΎΠ±βΡΠΊΡΠ° ΡΠ° Π²ΡΠ΄Π½Π΅ΡΠ΅Π½Π½Ρ ΠΉΠΎΠ³ΠΎ Π΄ΠΎ ΠΎΠ±ΡΠ°Π½ΠΎΠ³ΠΎ ΠΊΠ»Π°ΡΡ ΠΎΠ½ΡΠΎΠ»ΠΎΠ³ΡΡ
Π‘Π΅ΠΌΠ°Π½ΡΠΈΡΠ΅ΡΠΊΠΎΠ΅ Π°Π½Π½ΠΎΡΠΈΡΠΎΠ²Π°Π½ΠΈΠ΅ ΡΠ΅ΠΊΡΡΠΎΠ²ΡΡ Π΄ΠΎΠΊΡΠΌΠ΅Π½ΡΠΎΠ² Π½Π° ΠΎΡΠ½ΠΎΠ²Π΅ ΠΈΠ΅ΡΠ°ΡΡ ΠΈΡΠ΅ΡΠΊΠΎΠΉ ΡΠ°Π΄ΠΈΠ°Π»ΡΠ½ΠΎ-Π±Π°Π·ΠΈΡΠ½ΠΎΠΉ Π½Π΅ΠΉΡΠΎΠ½Π½ΠΎΠΉ ΡΠ΅ΡΠΈ
The hierarchical radial basis function neural network with a multi-layered architecture is proposed. This neural network is used for extracting knowledge from textual sources with the maximum number of relevant attributes for each object and assigns it to the selected class of ontology.Π ΡΠ°Π±ΠΎΡΠ΅ ΠΏΡΠ΅Π΄Π»ΠΎΠΆΠ΅Π½Π° ΠΈΠ΅ΡΠ°ΡΡ
ΠΈΡΠ΅ΡΠΊΠ°Ρ ΡΠ°Π΄ΠΈΠ°Π»ΡΠ½ΠΎ-Π±Π°Π·ΠΈΡΠ½Π°Ρ Π½Π΅ΠΉΡΠΎΠ½Π½Π°Ρ ΡΠ΅ΡΡ Ρ ΠΌΠ½ΠΎΠ³ΠΎΡΠ»ΠΎΠΉΠ½ΠΎΠΉ Π°ΡΡ
ΠΈΡΠ΅ΠΊΡΡΡΠΎΠΉ, ΠΊΠΎΡΠΎΡΠ°Ρ ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΠ΅ΡΡΡ Π΄Π»Ρ ΠΈΠ·Π²Π»Π΅ΡΠ΅Π½ΠΈΡ Π·Π½Π°Π½ΠΈΠΉ ΠΈΠ· ΡΠ΅ΠΊΡΡΠΎΠ²ΡΡ
ΠΈΡΡΠΎΡΠ½ΠΈΠΊΠΎΠ² Ρ ΡΡΠ΅ΡΠΎΠΌ ΠΌΠ°ΠΊΡΠΈΠΌΠ°Π»ΡΠ½ΠΎΠ³ΠΎ ΠΊΠΎΠ»ΠΈΡΠ΅ΡΡΠ²Π° ΡΠ΅Π»Π΅Π²Π°Π½ΡΠ½ΡΡ
ΠΏΡΠΈΠ·Π½Π°ΠΊΠΎΠ² ΠΊΠ°ΠΆΠ΄ΠΎΠ³ΠΎ ΠΎΠ±ΡΠ΅ΠΊΡΠ° ΠΈ ΠΎΡΠ½Π΅ΡΠ΅Π½ΠΈΡ Π΅Π³ΠΎ ΠΊ Π²ΡΠ±ΡΠ°Π½Π½ΠΎΠΌΡ ΠΊΠ»Π°ΡΡΡ ΠΎΠ½ΡΠΎΠ»ΠΎΠ³ΠΈΠΈ.Π ΡΠΎΠ±ΠΎΡΡ Π·Π°ΠΏΡΠΎΠΏΠΎΠ½ΠΎΠ²Π°Π½ΠΎ ΡΡΡΠ°ΡΡ
ΡΡΠ½Ρ ΡΠ°Π΄ΡΠ°Π»ΡΠ½ΠΎ-Π±Π°Π·ΠΈΡΠ½Ρ Π½Π΅ΠΉΡΠΎΠ½Π½Ρ ΠΌΠ΅ΡΠ΅ΠΆΡ Π· Π±Π°Π³Π°ΡΠΎΡΠ°ΡΠΎΠ²ΠΎΡ Π°ΡΡ
ΡΡΠ΅ΠΊΡΡΡΠΎΡ, ΡΠΊΠ° Π²ΠΈΠΊΠΎΡΠΈΡΡΠΎΠ²ΡΡΡΡΡΡ Π΄Π»Ρ Π²ΠΈΠ΄ΠΎΠ±ΡΠ²Π°Π½Π½Ρ Π·Π½Π°Π½Ρ ΡΠ· ΡΠ΅ΠΊΡΡΠΎΠ²ΠΈΡ
Π΄ΠΆΠ΅ΡΠ΅Π» ΡΠ· ΡΡΠ°Ρ
ΡΠ²Π°Π½Π½ΡΠΌ ΠΌΠ°ΠΊΡΠΈΠΌΠ°Π»ΡΠ½ΠΎΡ ΠΊΡΠ»ΡΠΊΠΎΡΡΡ ΡΠ΅Π»Π΅Π²Π°Π½ΡΠ½ΠΈΡ
ΠΎΠ·Π½Π°ΠΊ ΠΊΠΎΠΆΠ½ΠΎΠ³ΠΎ ΠΎΠ±βΡΠΊΡΠ° ΡΠ° Π²ΡΠ΄Π½Π΅ΡΠ΅Π½Π½Ρ ΠΉΠΎΠ³ΠΎ Π΄ΠΎ ΠΎΠ±ΡΠ°Π½ΠΎΠ³ΠΎ ΠΊΠ»Π°ΡΡ ΠΎΠ½ΡΠΎΠ»ΠΎΠ³ΡΡ
Named Entity Recognition for Nepali Text Using Support Vector Machines
Abstract Named Entity Recognition aims to identify and to classify rigid designators in text such as proper names, biological species, and temporal expressions into some predefined categories. There has been growing interest in this field of research since the early 1990s. Named Entity Recognition has a vital role in different fields of natural language processing such as Machine Translation, Information Extraction, Question Answering System and various other fields. In this paper, Named Entity Recognition for Nepali text, based on the Support Vector Machine (SVM) is presented which is one of machine learning approaches for the classification task. A set of features are extracted from training data set. Accuracy and efficiency of SVM classifier are analyzed in three different sizes of training data set. Recognition systems are tested with ten datasets for Nepali text. The strength of this work is the efficient feature extraction and the comprehensive recognition techniques. The Support Vector Machine based Named Entity Recognition is limited to use a certain set of features and it uses a small dictionary which affects its performance. The learning performance of recognition system is observed. It is found that system can learn well from the small set of training data and increase the rate of learning on the increment of training size
Using microtasks to crowdsource DBpedia entity classification: A study in workflow design
DBpedia is at the core of the Linked Open Data Cloud and widely used in research and applications. However, it is far from being perfect. Its content suffers from many flaws, as a result of factual errors inherited from Wikipedia or incomplete mappings from Wikipedia infobox to DBpedia ontology. In this work we focus on one class of such problems, un-typed entities. We propose a hierarchical tree-based approach to categorize DBpedia entities according to the DBpedia ontology using human computation and paid microtasks. We analyse the main dimensions of the crowdsourcing exercise in depth in order to come up with suggestions for workflow design and study three different workflows with automatic and hybrid prediction mechanisms to select possible candidates for the most specific category from the DBpedia ontology. To test our approach, we run experiments on CrowdFlower using a gold standard dataset of 120 previously unclassified entities. In our studies human-computation driven approaches generally achieved higher precision at lower cost when compared to workflows with automatic predictors. However, each of the tested workflows has its merit and none of them seems to perform exceptionally well on the entities that the DBpedia Extraction Framework fails to classify. We discuss these findings and their potential implications for the design of effective crowdsourced entity classification in DBpedia and beyond
KLASIFIKASI DAN EKSTRAKSI INFORMASI DARI TWEET E-COMMERCE DENGAN MENGGUNAKAN NAΓVE BAYES CLASSIFIER DAN PENDEKATAN RULE BASED
KLASIFIKASI DAN EKSTRAKSI INFORMASI DARI TWEET E-COMMERCE DENGAN MENGGUNAKAN NAΓVE BAYES CLASSIFIER DAN PENDEKATAN RULE BASED
ISMA ALGHOSANI
11451204728
Tanggal Sidang: 26 Januari 2021
Periode Wisuda: November 2021
Jurusan Teknik Informatika
Fakultas Sains dan Teknologi
Universitas Islam Negeri Sultan Syarif Kasim Riau
ABSTRAK
Indonesia termasuk negara dengan basis pengguna terbanyak twitter, termasuk dalam kicauan masyarakat terkait e-commerce. Hal ini menjadikan Twitter sebagai salah satu media untuk komunikasi dan menarik perhatian pembeli. Namun, data yang terdapat pada twitter tidak terstruktur, dan dibutuhkan suatu metode untuk mendapatkan informasi pada tweet. Terdapat empat rantai utama dalam melakukan transaksi, yaitu, sebelum transaksi, transaksi, dan purna transaksi. Informasi yang diekstrak melalui tweet yaitu harga, Produk, Waktu, Nomor Order, Nomor Resi, Nomor Transaksi, Nominal Cashback, dan Nomor Handphone. Penelitian ini menggunakan NaΓ―ve Bayes sebagai metode klasifikasi dengan menggunakan 1200 dataset berupa tweet dari akun e-commerce, dan menggunakan metode berbasis aturan/rule based untuk mendapatkan entitas tweet. Pengujian pada penelitian ini menggunakan confusion matrix, dengan akurasi tertinggi sebesar 84,16% pada perbandingan 90:10, dan akurasi ekstraksi informasi pada tweet didapatkan hasil 100%. Berdasarkan hasil pengujian klasifikasi dan ekstraksi informasi, NaΓ―ve Bayes berhasil dalam mengklasifikasikan data tweet e-commerce dan metode berbasis aturan berhasil dengan baik mengenali entitas tweet.
Kata Kunci: E-Commerce, Ekstraksi Informasi, Klasifikasi, NaΓ―ve Bayes, Rule Based, Twitte