    Indonesian Named-entity Recognition for 15 Classes Using Ensemble Supervised Learning

    AbstractHere, we describe our effort in building Indonesian Named Entity Recognition (NER) for newspaper article with 15 classes which is larger number of class type compared to existing Indonesian NER. We employed supervised machine learning in the NER and conducted experiments to find the best attribute combination and the best algorithm with highest accuracy. We compared the attribute of word level, sentence level and document level. In the algorithm, we compared several single machine learning algorithms and also an ensembled one. Using 457 news articles, the best accuracy was achieved by using ensemble technique where the result of several machine learning algorithms were used as the feature for one machine learning algorithm

    High Accuracy Location Information Extraction from Social Network Texts Using Natural Language Processing

    Terrorism has become a worldwide plague with severe consequences for the development of nations. Besides killing innocent people daily and preventing educational activities from taking place, terrorism is also hindering economic growth. Machine Learning (ML) and Natural Language Processing (NLP) can contribute to fighting terrorism by predicting in real-time future terrorist attacks if accurate data is available. This paper is part of a research project that uses text from social networks to extract necessary information to build an adequate dataset for terrorist attack prediction. We collected a set of 3000 social network texts about terrorism in Burkina Faso and used a subset to experiment with existing NLP solutions. The experiment reveals that existing solutions have poor accuracy for location recognition, which our solution resolves. We will extend the solution to extract dates and action information to achieve the project's goal

    Feature extraction using regular expression in detecting proper noun for Malay news articles based on KNN algorithm

    No AbstractKeywords: data mining; named entity recognition; regular expression; natural language processin

    Monitoring Indonesian online news for COVID-19 event detection using deep learning

    Even though coronavirus disease 2019 (COVID-19) vaccination has been done, preparedness for the possibility of the next outbreak wave is still needed with new mutations and virus variants. A near real-time surveillance system is required to provide the stakeholders, especially the public, to act in a timely response. Due to the hierarchical structure, epidemic reporting is usually slow particularly when passing jurisdictional borders. This condition could lead to time gaps for public awareness of new and emerging events of infectious diseases. Online news is a potential source for COVID-19 monitoring because it reports almost every infectious disease incident globally. However, the news does not report only about COVID-19 events, but also various information related to COVID-19 topics such as the economic impact, health tips, and others. We developed a framework for online news monitoring and applied sentence classification for news titles using deep learning to distinguish between COVID-19 events and non-event news. The classification results showed that the fine-tuned bidirectional encoder representations from transformers (BERT) trained with Bahasa Indonesia achieved the highest performance (accuracy: 95.16%, precision: 94.71%, recall: 94.32%, F1-score: 94.51%). Interestingly, our framework was able to identify news that reports the new COVID strain from the United Kingdom (UK) as an event news, 13 days before the Indonesian officials closed the border for foreigners

    Virtual Assistant Design for Water Systems Operation

    Water management systems such as wastewater treatment plants and water distributions systems are big systems which include a multitude of variables and performance indicators that drive the decision making process for controlling the plant. To help water operators make the right decisions, we provide them with a platform to get quick answers about the different components of the system that they are controlling in natural language. In our research, we explore the architecture for building a virtual assistant in the domain of water systems. Our design focused on developing better semantic inference across the different stages of the process. We developed a named entity recognizer that is able to infer the semantics in the water field by leveraging state-of-the art methods for word embeddings. Our model achieved significant improvements over the baseline Term Frequency - Inverse Document Frequency (TF-IDF) cosine similarity model. Additionally, we explore the design of intent classifiers, which involves more challenges than a traditional classifier due to the small ratio of text length compared to the number of classes. In our design, we incorporate the results of entity recognition, produced from previous layers of the Chatbot pipeline to boost the intent classification performance. Our baseline bidirectional Long Short Term Memory Network (LSTM) model showed significant improvements, amounting to 7-10\% accuracy boost on augmented input data and we contrasted its performance with a modified bidirectional LSTM architecture which embeds information about recognized entities. In each stage of our architecture, we explored state-of-the-art solutions and how we can customize them to our problem domain in order to build a production level application. We additionally leveraged Chatbot frameworks architecture to provide a context aware virtual assistance experience which is able to infer implicit references from the conversation flow


    Seminar Tahunan Linguistik yang lazim disebut SETALI merupakan ajang seminar tahunan yang diselenggarakan oleh Program Studi Linguistik Sekolah Pascasarjana Universitas Pendidikan Indonesia (SPs UPI) bekerja sama dengan organisasi profesi Masyarakat Linguistik Indonesia (MLI) komisariat UPI. Pada 2018 ini, seminar kembali digelar pada 5-6 Mei bertemakan “Bahasa di Era Digital: Peluang atau Ancaman?”. Pengusungan tema kali ini beranjak dari fenomena khas terkait bahasa di era digital yang turut mengambil peran penting di dalam pengaplikasiannya. Ada sekitar 200 makalah terpilih yang dimuat untuk dibentangkan dalam Setali 2018. Makalah-makalah yang terhimpun dalam prosiding ini telah diseleksi melalui proses panjang dan pertimbangan yang cukup cermat. Bahasa dan digitalisasi adalah dua hal yang saling berkait dan tidak terpisahkan. Pemakaian bahasa di ruang digital, pada berbagai media, menimbulkan berbagai varian. Penggunaan bahasa dalam komunikasi di era digital, terkadang sesuai dengan bentuk yang baik (well-form), namun tak jarang juga tampil menyimpang (unwell-form). Banyaknya penyimpangan yang terjadi dalam konteks penggunaaan bahasa di ruang digital berpotensi menimbulkan efek negatif yang dapat mempengaruhi sikap bahasa pengguna bahasa Indonesia secara umum. Terkait dengan hal tersebut, masyarakat diharapkan cermat dalam menyikapi berbagai fenomena penggunaan bahasa yang sulit terbendung. Sekalipun ada banyak ancaman terhadap eksistensi bahasa di era ini, tidak dipungkiri juga ada banyak peluang yang dapat dipilih oleh masyarakat pengguna bahasa sebagai hal yang positif dan menguntungkan. Setakat ini, muncul berbagai polemik dalam dunia linguistik terkait masalah kebahasaan yang merebak di dunia digital. Para penggiat bahasa diharapkan banyak melakukan penelaahan terhadap praktik dan peran bahasa di era digital ini. Tema “Bahasa di Era Digital: Peluang atau Ancaman?” ini diharapkan mampu mewadahi semua elemen masyarakat untuk berpatisipasi dan ikut andil dalam menilai dan menelisik kedudukan bahasa dari sudut pandang yang beraneka ragam sehingga dapat melahirkan beraragamnya perspektif di jagat linguistik Indonesia. Akhir kata, dengan memohon petunjuk dan keridhaan Allah Swt., saya berharap agar penyelenggaraan Setali 2018 ini dapat berjalan dengan tertib dan lancar. Selain itu, saya juga berharap semoga dokumentasi akademik seperti ini dapat memberikan kontribusi nyata bagi perkembangan linguistik di Indonesia. Dalam kesempatan ini, saya merasa perlu untuk mengucapkan terima kasih kepada para pihak yang telah turut serta membantu terlaksananya Setali 2018 ini berjalan dengan baik. Selamat berseminar