Search CORE

11 research outputs found

Application of Named Entity Recognition via Twitter on SpaCy in Indonesian (Case Study : Power Failure in the Special Region of Yogyakarta)

Author: Santoso Ibnu
Suadaa Lya Hulliyyatus
Yanti Rizka Maulida
Publication venue: 'Universitas Atma Jaya Yogyakarta'
Publication date: 19/08/2021
Field of study

SpaCy is a tool that can efficiently handle Natural Language Processing (NLP) problems, one of which is Named Entity Recognition (NER). NER is used to extract and identify named entities in a text. However, so far SpaCy has not officially released the NER model pre-train for Indonesian. On the other hand, based on the 2019 PLN statistical report, the Province of D.I. Yogyakarta is a province that often experiences power failure and many complaints from the public are found on Twitter related to power failure that occur in the province. This is because there is no research on extracting information related to electrical disturbances and research on NER using SpaCy in Indonesian is still rare. So in this study, information extraction related to power failure in the Province of D.I. will be carried out. Yogyakarta via twitter using Indonesian SpaCy. This study produces good performance results with 95.52% precision calculation, 93.27% recall, and 94.38% f1-score. Then, mapping is carried out based on the location entities contained in tweets related to electrical disturbances. From this process, it was found that the highest number of locations mentioned in the tweet related to power failure came from Sleman Regency, while the lowest number came from Gunung Kidul Regency. Then, the month that experienced the most power failure was March 2020, while the month that experienced the least amount of electricity was July 2020

Universitas Atma Jaya Yogyakarta (UAJY): Open Journal Systems

Fast and Accurate Recognition of Chinese Clinical Named Entities with Residual Dilated Convolutions

Author: Gao Ju
Qiu Jiahui
Ruan Tong
Wang Qi
Zhou Yangming
Publication venue
Publication date: 27/11/2018
Field of study

Clinical Named Entity Recognition (CNER) aims to identify and classify clinical terms such as diseases, symptoms, treatments, exams, and body parts in electronic health records, which is a fundamental and crucial task for clinical and translation research. In recent years, deep learning methods have achieved significant success in CNER tasks. However, these methods depend greatly on Recurrent Neural Networks (RNNs), which maintain a vector of hidden activations that are propagated through time, thus causing too much time to train models. In this paper, we propose a Residual Dilated Convolutional Neural Network with Conditional Random Field (RD-CNN-CRF) to solve it. Specifically, Chinese characters and dictionary features are first projected into dense vector representations, then they are fed into the residual dilated convolutional neural network to capture contextual features. Finally, a conditional random field is employed to capture dependencies between neighboring tags. Computational results on the CCKS-2017 Task 2 benchmark dataset show that our proposed RD-CNN-CRF method competes favorably with state-of-the-art RNN-based methods both in terms of computational performance and training time.Comment: 8 pages, 3 figures. Accepted as regular paper by 2018 IEEE International Conference on Bioinformatics and Biomedicine. arXiv admin note: text overlap with arXiv:1804.0501

arXiv.org e-Print Archive

Crossref

Семантическое аннотирование текстовых документов на основе иерархической радиально-базисной нейронной сети

Author: Бодянский Евгений Владимирович
Шубкина Ольга Васильевна
Publication venue: РС ТЕСHNOLOGY СЕNTЕR
Publication date: 10/11/2010
Field of study

The hierarchical radial basis function neural network with a multi-layered architecture is proposed. This neural network is used for extracting knowledge from textual sources with the maximum number of relevant attributes for each object and assigns it to the selected class of ontology.В работе предложена иерархическая радиально-базисная нейронная сеть с многослойной архитектурой, которая используется для извлечения знаний из текстовых источников с учетом максимального количества релевантных признаков каждого объекта и отнесения его к выбранному классу онтологии.В роботі запропоновано ієрархічну радіально-базисну нейронну мережу з багатошаровою архітектурою, яка використовується для видобування знань із текстових джерел із урахуванням максимальної кількості релевантних ознак кожного об’єкта та віднесення його до обраного класу онтології

Наукова періодика України

Семантическое аннотирование текстовых документов на основе иерархической радиально-базисной нейронной сети

Author: Бодянский Евгений Владимирович
Шубкина Ольга Васильевна
Publication venue: 'Private Company Technology Center'
Publication date: 01/01/2010
Field of study

Neliti

Наукова періодика України

Eastern-European Journal of Enterprise Technologies

Named Entity Recognition for Nepali Text Using Support Vector Machines

Author: Surya Bahadur Bam
Tej Bahadur Shahi
Publication venue
Publication date: 01/01/2014
Field of study

Abstract Named Entity Recognition aims to identify and to classify rigid designators in text such as proper names, biological species, and temporal expressions into some predefined categories. There has been growing interest in this field of research since the early 1990s. Named Entity Recognition has a vital role in different fields of natural language processing such as Machine Translation, Information Extraction, Question Answering System and various other fields. In this paper, Named Entity Recognition for Nepali text, based on the Support Vector Machine (SVM) is presented which is one of machine learning approaches for the classification task. A set of features are extracted from training data set. Accuracy and efficiency of SVM classifier are analyzed in three different sizes of training data set. Recognition systems are tested with ten datasets for Nepali text. The strength of this work is the efficient feature extraction and the comprehensive recognition techniques. The Support Vector Machine based Named Entity Recognition is limited to use a certain set of features and it uses a small dictionary which affects its performance. The learning performance of recognition system is observed. It is found that system can learn well from the small set of training data and increase the rate of learning on the increment of training size

CiteSeerX

Using microtasks to crowdsource DBpedia entity classification: A study in workflow design

Author: Bu Qiong
Li Yunjia
Simperl Elena
Zerr Sergej
Publication venue: 'IOS Press'
Publication date
Field of study

DBpedia is at the core of the Linked Open Data Cloud and widely used in research and applications. However, it is far from being perfect. Its content suffers from many flaws, as a result of factual errors inherited from Wikipedia or incomplete mappings from Wikipedia infobox to DBpedia ontology. In this work we focus on one class of such problems, un-typed entities. We propose a hierarchical tree-based approach to categorize DBpedia entities according to the DBpedia ontology using human computation and paid microtasks. We analyse the main dimensions of the crowdsourcing exercise in depth in order to come up with suggestions for workflow design and study three different workflows with automatic and hybrid prediction mechanisms to select possible candidates for the most specific category from the DBpedia ontology. To test our approach, we run experiments on CrowdFlower using a gold standard dataset of 120 previously unclassified entities. In our studies human-computation driven approaches generally achieved higher precision at lower cost when compared to workflows with automatic predictors. However, each of the tested workflows has its merit and none of them seems to perform exceptionally well on the entities that the DBpedia Extraction Framework fails to classify. We discuss these findings and their potential implications for the design of effective crowdsourced entity classification in DBpedia and beyond

Southampton (e-Prints Soton)

Malay Named Entity Recognition Based on Rule-Based Approach

Author: Ashraef
Ferreira
Liddy
Mansouri
Micheal
Ralph
Rau
Rohini
Yong
Yu
Publication venue: 'IACSIT Press'
Publication date
Field of study

Crossref

Extracting Named Entities Using Support Vector Machines

Author: E. Brill
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Crossref

KLASIFIKASI DAN EKSTRAKSI INFORMASI DARI TWEET E-COMMERCE DENGAN MENGGUNAKAN NAÏVE BAYES CLASSIFIER DAN PENDEKATAN RULE BASED

Author: ISMA ALGHOSANI -
Publication venue
Publication date: 26/01/2021
Field of study

KLASIFIKASI DAN EKSTRAKSI INFORMASI DARI TWEET E-COMMERCE DENGAN MENGGUNAKAN NAÏVE BAYES CLASSIFIER DAN PENDEKATAN RULE BASED ISMA ALGHOSANI 11451204728 Tanggal Sidang: 26 Januari 2021 Periode Wisuda: November 2021 Jurusan Teknik Informatika Fakultas Sains dan Teknologi Universitas Islam Negeri Sultan Syarif Kasim Riau ABSTRAK Indonesia termasuk negara dengan basis pengguna terbanyak twitter, termasuk dalam kicauan masyarakat terkait e-commerce. Hal ini menjadikan Twitter sebagai salah satu media untuk komunikasi dan menarik perhatian pembeli. Namun, data yang terdapat pada twitter tidak terstruktur, dan dibutuhkan suatu metode untuk mendapatkan informasi pada tweet. Terdapat empat rantai utama dalam melakukan transaksi, yaitu, sebelum transaksi, transaksi, dan purna transaksi. Informasi yang diekstrak melalui tweet yaitu harga, Produk, Waktu, Nomor Order, Nomor Resi, Nomor Transaksi, Nominal Cashback, dan Nomor Handphone. Penelitian ini menggunakan Naïve Bayes sebagai metode klasifikasi dengan menggunakan 1200 dataset berupa tweet dari akun e-commerce, dan menggunakan metode berbasis aturan/rule based untuk mendapatkan entitas tweet. Pengujian pada penelitian ini menggunakan confusion matrix, dengan akurasi tertinggi sebesar 84,16% pada perbandingan 90:10, dan akurasi ekstraksi informasi pada tweet didapatkan hasil 100%. Berdasarkan hasil pengujian klasifikasi dan ekstraksi informasi, Naïve Bayes berhasil dalam mengklasifikasikan data tweet e-commerce dan metode berbasis aturan berhasil dengan baik mengenali entitas tweet. Kata Kunci: E-Commerce, Ekstraksi Informasi, Klasifikasi, Naïve Bayes, Rule Based, Twitte

Analisis Harga Pokok Produksi Rumah Pada