Search CORE

28 research outputs found

Automatic Identification of Close Languages – Case Study: Malay and Indonesian.

Author: Bali Ranaivo-Malançon
Publication venue
Publication date: 01/11/2006
Field of study

Identifying the language of an unknown text is not a new problem but what is new is the task of identifying close languages. Malay and Indonesian as many other language€ are very similar, and therefore it is a real difficulty to search, retrieve, classify, and above all translate texts written in one of the two Identifying the language of an unknown text is not a new problem but what is new is the task of identifying close languages. Malay and Indonesian as many other language€ are very similar, and therefore it is a real difficulty to search, retrieve, classify, and above all translate texts written in one of the two languages

Repository@USM

The Notion Of Instrument In Malay Language.

Author: Bali Ranaivo-Malançon
Publication venue
Publication date: 01/08/2007
Field of study

In Malay, the official language of Malaysia, the notion of instrument is expressed in five ways. In the expressions first two the noun instrument is introduced by either the preposition dengan 'with' or the preposition melalui 'through, via': <X Z=Action {dengan, melalui} Y=Instrument(e.g. Remaja pukul ibunya dengan batang paip ‘An adolescent hit his mother with a pipe', menghantar bantahan melalui e-mel kepada Dr. X ‘to send a protest through email to Dr' X')

Repository@USM

Which Extractive Summarization Method For Malay Texts?

Author: Hazimah Iboi
Ranaivo-Malançon Bali
Publication venue: 'UUM Press, Universiti Utara Malaysia'
Publication date: 01/01/2017
Field of study

The number of texts written in Malay increases every day. When these texts are lengthy, interested readers tend to skim through them. Automatic text summarization may assist these readers to get access to the important parts of the texts without scanning from the beginning to the end. As of today, only few Malay text summarizers have been presented in the literature. Therefore, a comparative study of three extractive summarization methods (Luhn’s method, Edmundson’s method, and LexRank method) was undertaken and the results are reported in this paper. The aim of the study is to determine the adequate extractive method. Several experiments were conducted by comparing the results of three extractive methods with human extracts as well as human abstracts. It appears that the Luhn’s method, one of the oldest automatic extractive summarization, shows a good perfor-mance while tested on 14 Malay abstract summaries and 20 Malay extrac-tive summaries

Unimas Institutional Repository

Design and Implementation of PIAK: A Personalized Internet Access System for Kids

Author: Kamarudin Abrar Noor Akramin
Musa Nadianatra
Ranaivo-Malançon Bali
Publication venue: Journal of Telecommunication, Electronic and Computer Engineering (JTEC)
Publication date: 15/09/2017
Field of study

Internet plays an important role to deliver information worldwide. But the available huge amounts of online information are not all appropriate for children. This paper presents the design and implementation of PIAK, a Personalized Internet Access system for Kids. It aims to assist and teach children about using the Internet in one single and safe environment. PIAK features four personalized components: cross-platform user interface, multilingual support, educative and assistive mediums, and web content filtering. Its design is based on the children’s needs inferred from a survey finding. This will enable the Internet access to be more appealing to the children as they can explore the Internet in a controlled environment

Universiti Teknikal Malaysia Melaka: UTeM Open Journal System

Identifying And Classifying Unknown Words In Malay Texts.

Author: Bali Ranaivo-Malançon
Chong Chai Chua
Pek Kuan Ng
Publication venue
Publication date: 01/12/2007
Field of study

In this paper, we propose a method based on a chain of filters to handle the problem of identifying and classifying unknown words in Malay texts. A word is identified as unknown when it is not listed in the lexicon

Repository@USM

Using TEI XML Schema to Encode the Structures of Sarawak Gazette

Author: Bali Ranaivo-Malançon
Tze-Min Fong
Publication venue: 'IACSIT Press'
Publication date: 01/01/2015
Field of study

Automatic extraction of information from old printed documents which have been digitised injudiciously will end up with a lot human corrections. To overcome the problem, one possible solution is to annotate the documents with some markups. This paper presents the encoding of the digitised sample of Sarawak Gazette published from 1903 until 1939 using the standard TEI XML schema. The output of the work is a set of six TEI XML templates that is considered to represent the different layout structures found in the studied samples

Unimas Institutional Repository

Minimizing Human Labelling Effort for Annotating Named Entities in Historical Newspaper

Author: Chua S.
Ranaivo-Malançon Bali
Wan Tamlikha W.M.F.
Publication venue: Universiti Teknikal Malaysia (UTEM)
Publication date: 01/01/2017
Field of study

To accelerate the annotation of named entities (NEs) in historical newspapers like Sarawak Gazette, only two choices are possible: an automatic approach or a semi-automatic approach. This paper presents a fully automatic annotation of NEs occurring in Sarawak Gazette. At the initial stage, a subset of the historical newspapers is fed to an established rule-based named entity recognizer (NER), that is ANNIE. Then, the preannotated corpus is used as training and testing data for three supervised learning NER, which are based on Naïve Bayes, J48 decision trees, and SVM-SMO methods. These methods are not always accurate and it appears that SVM-SMO and J48 have better performance than Naïve Bayes. Thus, a thorough study on the errors done by SVM-SMO and J48 yield to the creation of ad hoc rules to correct the errors automatically. The proposed approach is promising even though it still needs more experiments to refine the rules

Unimas Institutional Repository

Wiki SaGa: an Interactive Timeline to Visualize Historical Documents

Author: Bali Ranaivo-Malançon
Narayanan Kulathuramaiyer
Tan Daniel Yong Wen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Searching for information inside a repository of digitised historical documents is a very common task. A timeline interface that represents the historical content which can perform the same search function will reveal better results to researchers. This paper presents the integration of SIMILE Timeline within a wiki, named Wiki SaGa, containing digitised version of Sarawak Gazette. The proposed approach allows display of events and relevant information search compared to traditional list of documents

Unimas Institutional Repository

Comparative Studies of Ontologies on Sarawak Gazette

Author: Chua Stephanie
Mohammad Mira Shumiza
Ramli Fatihah
Ranaivo-Malançon Bali
Publication venue: Journal of Telecommunication, Electronic and Computer Engineering (JTEC)
Publication date: 07/12/2017
Field of study

This paper presents a discussion on experience and process during initial stage of ontology building in history. The objective of this paper is to create a manual semantic annotation process to determine the concepts that will be used in the historical news ontology. It will describe the tasks of facilitating the analysis of missing concepts existing in Sarawak Gazette (SAGA) documents. Semantically annotating SAGA documents enable to enrich the element of concepts and relations taken from existing ontologies. Furthermore, an initial result is provided to observe the performance gain due to domainspecific annotations. Finally, we conclude on the importance of semantic annotations process in the construction of an ontology

Universiti Teknikal Malaysia Melaka: UTeM Open Journal System

Inducing a Semantically Rich Nested Event Model

Author: Bali Ranaivo-Malançon
Jane Labadin
Narayanan Kulathuramaiyer
Nyuk Hiong Siaw
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Research has revealed that getting data with named entities (NEs) labels are laboured intensive and costly. This paper is proposing two approaches to enable NE classes to be added to the semantic role label (SRL) predicateargument structure of Nested Event Model. The first approach associates SRL to Named Entity Recognition (NER), which is named as SRL-NER, to tag the appropriate entity class to the simple argument of the model. The second approach associates SRL to NER by fine-tuning entities in complex argument structures with Automatic Content Extraction (ACE) structure. This approach is called SRL-ACE-NER. Stanford NER tool is used as the benchmark for evaluation. The result shows that the proposed approaches are able to recognize more PERSON entities. However, the approaches are not able to recognize LOCATION/PLACE as efficiently as the benchmark. It is also observed that the benchmark tool is sometimes not able to tag as comprehensively as the proposed approaches. This paper has successfully demonstrated the potential of using a semantically enriched Nested Event Model as an alternative for NER technique. SRL-ACE-NER has achieved an average precision of 92 % in recognising PERSON, LOCATION/PLACE, TIME, and ORGANIZATION

Unimas Institutional Repository