Search CORE

3 research outputs found

Algoritma Jaro-Winkler Distance: Fitur Autocorrect dan Spelling Suggestion pada Penulisan Naskah Bahasa Indonesia di BMS TV

Author: Agung Prasetyo
Iqbaluddin Syam Had
Wiga Maulana Baihaqi
Publication venue: 'Fakultas Ilmu Komputer Universitas Brawijaya'
Publication date: 01/10/2018
Field of study

Autocorrect adalah suatu sistem yang dapat memeriksa dan memperbaiki kesalahan penulisan kata secara otomatis. Dewasa ini fitur autocorrect memang sering ditemui pada berbagai perangkat dan aplikasi, misalkan pada papan ketik smartphone dan aplikasi misalkan sebut saja Microsoft Word. Sistem autocorrect tersebut langsung mengganti kata yang dianggap salah oleh sistem secara otomatis tanpa memberi tahu pengguna sehingga pengguna seringkali tidak sadar tulisannya berubah sedangkan kata penggantinya tidak selalu benar sesuai dengan yang dimaksud pengguna. Pengetahuan Microsoft Word pada fitur autocorrect-nya berbahasa Inggris sehingga tidak dapat diterapkan pada penulisan naskah berita di BMS TV. Setiap harinya News Director BMS TV memeriksa naskah yang akan diberitakan dimana termasuk diantaranya adalah pemeriksaan ejaan. Dengan fitur autocorrect dan spelling suggestion bahasa Indonesia diharapkan dapat membantu News Director BMS TV untuk memeriksa dan memperbaiki kesalahan penulisan kata secara otomatis serta memberi saran penulisan ejaan kata yang benar dalam bahasa Indonesia. Metode pengembangan perangkat lunak yang digunakan adalah Extreme Programming dan algoritme Jaro-Winkler Distance. Jaro-Winkler adalah algoritme untuk menghitung nilai jarak kedekatan antara dua teks. Hasil dari penelitian ini adalah sebuah sistem yang dapat membantu News Director BMS TV dalam pemeriksaan kesalahan penulisan ejaan kata pada naskah bahasa Indonesia dan mempermudah News Director pusat dalam penghimpunan naskah dari berbagai kontributor BMS TV. Dapat disimpulkan bahwa fitur autocorrect dan spelling suggestion dapat menengani kesalahan penulisan ejaan kata dengan pengujian 60 kata yang terdiri dari berbagai skenario kesalahan penulisan kata fitur ini dapat memperbaiki sepuluh kata secara otomatis dengan benar dan memunculkan saran ejaan kata pada 39 kata dengan tepat. Abstract Autocorrect is a software system that automatically identifies and correct misspelled words. Nowadays autocorrect feature is often encountered in various devices dan applications, like on the smartphone keyboard dan Microsoft Word application. The autocorrect system instantly replaces the word that is considered wrong by the system automatically without notifying the user so that users are often not aware of writing changes while the replacement word is not always true in accordance with the intended user. The Autocorrect feature of Microsoft Word uses English so it can’t be applied on writing news script in BMS TV. Every day News Director of BMS TV checks the script that would be reported where there is a spell checking included. By using bahasa in autocorrect dan spelling suggestion, it is expected to help News Director BMS TV to check dan fix the misspelled word automatically dan give suggestion for the right words spelling in bahasa. The development software method that is used is Extreme Programming dan Jaro-Winkler Distance algorithm. Jaro-Winkler is an algorithm that is applied to calculate the distance of proximity between two texts. The results of this study is a system that could help News Director BMS TV in identifying misspelled words on script in bahasa dan to make it easier for News Director center in collecting of manuscripts from various contributors of BMS TV. It can be concluded that the autocorrect dan spelling suggestion features can compound the misspelled words with a 60-word test consisting of various error scenarios. This feature can correct ten words automatically dan show correct spelling suggestion word on 39 words

Directory of Open Access Journals

Jurnal Teknologi Informasi dan Ilmu Komputer

A comparison of cloud-based speech recognition engines

Author: G. Jasinski Marcio
L. Herchonvicz Andrey
R. Franco Cristiano
Publication venue: 'Editora UNIVALI'
Publication date: 29/05/2019
Field of study

Human-machine interaction is present in our routines and has become increasingly natural these days. Devices can record a person’s speech, transcribe into text and execute tasks accordingly. This kind of interaction provides more productivity for several operations since it allows users to have hands free through a more natural interface. Moreover, the speech recognition engines need to assure reliability and speed. However, the maturity of speech recognition systems vary from providers and most importantly accordingly to the language. For instance, Brazilian Portuguese language has a particularity of using several foreign terms, especially if we consider corporate environments.In this paper, an experiment was conducted, to evaluate three speech recognition engines regarding accuracy and performance: Bing Speech API, Google Cloud Speech and IBM Watson Speech to Text. To obtain the accuracy value, we used a well-known string similarity algorithm. The results showed a high level of accuracy for Google Cloud Speech and Bing Speech API. However, the best accuracy provided by Google services came with a cost on performance – requiring additional time to provide the speech to text transcription

Portal de Periódicos da Univali (Universidade do Vale do Itajaí)

Tune your brown clustering, please

Author: Bøgh K.S.
Chester S.
Derczynski L.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2015
Field of study

Brown clustering, an unsupervised hierarchical clustering technique based on ngram mutual information, has proven useful in many NLP applications. However, most uses of Brown clustering employ the same default configuration; the appropriateness of this configuration has gone predominantly unexplored. Accordingly, we present information for practitioners on the behaviour of Brown clustering in order to assist hyper-parametre tuning, in the form of a theoretical model of Brown clustering utility. This model is then evaluated empirically in two sequence labelling tasks over two text types. We explore the dynamic between the input corpus size, chosen number of classes, and quality of the resulting clusters, which has an impact for any approach using Brown clustering. In every scenario that we examine, our results reveal that the values most commonly used for the clustering are sub-optimal

White Rose Research Online