589 research outputs found

    Towards the Semantic Text Retrieval for Indonesian

    Get PDF
    Indonesia is the fourth most populous country in the world and the Asosiasi Penyelenggara Jasa Internet Indonesia (Indonesian Internet Service Providers Association) recorded that Indonesian Internet subscribers and users has been growing rapidly every year. These facts should encourage research such as computer linguistic and information retrieval for Indonesian language which in fact has not been extensively investigated. The research aims to investigate the tolerance rough sets model (TRSM) in order to propose a framework for a semantic text retrieval system. The proposed framework is intended for Indonesian language specifically hence we are working with Indonesian corpora and applying tools for Indonesian, e.g. Indonesian stemmer, in all of the studies. Cognitive approach is employed particularly during data preparation and analysis. An extensive collaboration with human experts is significant on creating a new Indonesian corpus suitable for our research. The performance of an ad hoc retrieval system becomes the starting point for further analysis in order to learn and understand more about the process and characteristic of TRSM, despite comparing TRSM with other methods and determining the best solution. The results of this process function as the guidance for computational modeling of some TRSM's tasks and finally the framework of a semantic information retrieval system with TRSM as its heart. In addition to the proposed framework, this thesis proposes three methods based on TRSM, which are the automatic tolerance value generator, thesaurus optimization, and lexicon-based document representation. All methods were developed by the use of our own corpus, namely ICL-corpus, and evaluated by employing an available Indonesian corpus, called Kompas-corpus. The evaluation on the methods achieved satisfactory results, except for the compact document representation method; this last method seems to work only in limited domain

    Role of images on World Wide Web readability

    Get PDF
    As the Internet and World Wide Web have grown, many good things have come. If you have access to a computer, you can find a lot of information quickly and easily. Electronic devices can store and retrieve vast amounts of data in seconds. You no longer have to leave your house to get products and services you could only get in person. Documents can be changed from English to Urdu or from text to speech almost instantly, making it easy for people from different cultures and with different abilities to talk to each other. As technology improves, web developers and website visitors want more animation, colour, and technology. As computers get faster at processing images and other graphics, web developers use them more and more. Users who can see colour, pictures, animation, and images can help understand and read the Web and improve the Web experience. People who have trouble reading or whose first language is not used on the website can also benefit from using pictures. But not all images help people understand and read the text they go with. For example, images just for decoration or picked by the people who made the website should not be used. Also, different factors could affect how easy it is to read graphical content, such as a low image resolution, a bad aspect ratio, a bad colour combination in the image itself, a small font size, etc., and the WCAG gave different rules for each of these problems. The rules suggest using alternative text, the right combination of colours, low contrast, and a higher resolution. But one of the biggest problems is that images that don't go with the text on a web page can make it hard to read the text. On the other hand, relevant pictures could make the page easier to read. A method has been suggested to figure out how relevant the images on websites are from the point of view of web readability. This method combines different ways to get information from images by using Cloud Vision API and Optical Character Recognition (OCR), and reading text from websites to find relevancy between them. Techniques for preprocessing data have been used on the information that has been extracted. Natural Language Processing (NLP) technique has been used to determine what images and text on a web page have to do with each other. This tool looks at fifty educational websites' pictures and assesses their relevance. Results show that images that have nothing to do with the page's content and images that aren't very good cause lower relevancy scores. A user study was done to evaluate the hypothesis that the relevant images could enhance web readability based on two evaluations: the evaluation of the 1024 end users of the page and the heuristic evaluation, which was done by 32 experts in accessibility. The user study was done with questions about what the user knows, how they feel, and what they can do. The results back up the idea that images that are relevant to the page make it easier to read. This method will help web designers make pages easier to read by looking at only the essential parts of a page and not relying on their judgment.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: José Luis Lépez Cuadrado.- Secretario: Divakar Yadav.- Vocal: Arti Jai

    A New Approach for Text Documents Classification with Invasive Weed Optimization and Naive Bayes Classifier

    Get PDF
    With the fast increase of the documents, using Text Document Classification (TDC) methods has become a crucial matter. This paper presented a hybrid model of Invasive Weed Optimization (IWO) and Naive Bayes (NB) classifier (IWO-NB) for Feature Selection (FS) in order to reduce the big size of features space in TDC. TDC includes different actions such as text processing, feature extraction, forming feature vectors, and final classification. In the presented model, the authors formed a feature vector for each document by means of weighting features use for IWO. Then, documents are trained with NB classifier; then using the test, similar documents are classified together. FS do increase accuracy and decrease the calculation time. IWO-NB was performed on the datasets Reuters-21578, WebKb, and Cade 12. In order to demonstrate the superiority of the proposed model in the FS, Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) have been used as comparison models. Results show that in FS the proposed model has a higher accuracy than NB and other models. In addition, comparing the proposed model with and without FS suggests that error rate has decreased

    Geoinformatics in Citizen Science

    Get PDF
    The book features contributions that report original research in the theoretical, technological, and social aspects of geoinformation methods, as applied to supporting citizen science. Specifically, the book focuses on the technological aspects of the field and their application toward the recruitment of volunteers and the collection, management, and analysis of geotagged information to support volunteer involvement in scientific projects. Internationally renowned research groups share research in three areas: First, the key methods of geoinformatics within citizen science initiatives to support scientists in discovering new knowledge in specific application domains or in performing relevant activities, such as reliable geodata filtering, management, analysis, synthesis, sharing, and visualization; second, the critical aspects of citizen science initiatives that call for emerging or novel approaches of geoinformatics to acquire and handle geoinformation; and third, novel geoinformatics research that could serve in support of citizen science

    The experiences of the International Institute of Islamic Civilization & Malay World (ISTAC)in empowering the Malay World and Islamic Civilization

    Get PDF
    Although Indonesia and Malaysia are now two independent nations, both countries have a lot in common, tied by a common religion, language and cultural heritage dating back centuries ago. In building a better tomorrow, both nations can benefit from each other’s rich experiences. This paper will focus on an educational institute established at the International Islamic University (henceforth, IIUM) in 2016 initially named the Centre for Malay World and Islamic Civilization and now has been transformed as the International Institute of Islamic Civilization & Malay World (henceforth, ISTAC). The institute is established to support the efforts of the university in creating an academic and intellectual field of research and studies focusing on the Malay World and Islamic civilization. This study describes the scope, major types of activities and concerns of the institute. In so doing it will identify the potential areas for collaboration (such as academic exchange, research, international conferenc

    KEER2022

    Get PDF
    Avanttítol: KEER2022. DiversitiesDescripció del recurs: 25 juliol 202

    Austronesian and other languages of the Pacific and South-east Asia : an annotated catalogue of theses and dissertations

    Get PDF

    One Model to Rule them all: Multitask and Multilingual Modelling for Lexical Analysis

    Get PDF
    When learning a new skill, you take advantage of your preexisting skills and knowledge. For instance, if you are a skilled violinist, you will likely have an easier time learning to play cello. Similarly, when learning a new language you take advantage of the languages you already speak. For instance, if your native language is Norwegian and you decide to learn Dutch, the lexical overlap between these two languages will likely benefit your rate of language acquisition. This thesis deals with the intersection of learning multiple tasks and learning multiple languages in the context of Natural Language Processing (NLP), which can be defined as the study of computational processing of human language. Although these two types of learning may seem different on the surface, we will see that they share many similarities. The traditional approach in NLP is to consider a single task for a single language at a time. However, recent advances allow for broadening this approach, by considering data for multiple tasks and languages simultaneously. This is an important approach to explore further as the key to improving the reliability of NLP, especially for low-resource languages, is to take advantage of all relevant data whenever possible. In doing so, the hope is that in the long term, low-resource languages can benefit from the advances made in NLP which are currently to a large extent reserved for high-resource languages. This, in turn, may then have positive consequences for, e.g., language preservation, as speakers of minority languages will have a lower degree of pressure to using high-resource languages. In the short term, answering the specific research questions posed should be of use to NLP researchers working towards the same goal.Comment: PhD thesis, University of Groninge
    corecore