4,116 research outputs found

    Variation of word frequencies across genre classification tasks

    Get PDF
    This paper examines automated genre classification of text documents and its role in enabling the effective management of digital documents by digital libraries and other repositories. Genre classification, which narrows down the possible structure of a document, is a valuable step in realising the general automatic extraction of semantic metadata essential to the efficient management and use of digital objects. In the present report, we present an analysis of word frequencies in different genre classes in an effort to understand the distinction between independent classification tasks. In particular, we examine automated experiments on thirty-one genre classes to determine the relationship between the word frequency metrics and the degree of its significance in carrying out classification in varying environments

    HITS and misses: combining BM25 with HITS for expert search

    Get PDF
    This paper describes the participation of Dublin City University in the CriES (Cross-Lingual Expert Search) pilot challenge. To realize expert search, we combine traditional information retrieval (IR)using the BM25 model with reranking of results using the HITS algorithm. The experiments were performed on two indexes, one containing all questions and one containing all answers. Two runs were submitted. The first one contains the combination of results from IR on the questions with authority values from HITS; the second contains the reranked results from IR on answers with authority values. To investigate the impact of multilinguality, additional experiments were conducted on the English topic subset and on all topics translated into English with Google Translate. The overall performance is moderate and leaves much room for improvement. However, reranking results with authority values from HITS typically improved results and more than doubled the number of relevant and retrieved results and precision at 10 documents in many experiments

    Building a Document Genre Corpus: a Profile of the KRYS I Corpus

    Get PDF
    This paper describes the KRYS I corpus (http://www.krys-corpus.eu/Info.html), consisting of documents classified into 70 genre classes. It has been constructed as part of an effort to automate document genre classification as distinct from topic detection. Previously there has been very little work on building corpora of texts which have been classified using a non-topical genre palette. The reason for this is partly due to the fact that genre as a concept, is rooted in philosophy, rhetoric and literature, and highly complex and domain dependent in its interpretation ([11]). The usefulness of genre in everyday information search is only now starting to be recognised and there is no genre classification schema that has been consolidated to have applicable value in this direction. By presenting here our experiences in constructing the KRYS I corpus, we hope to shed light on the information gathering and seeking behaviour and the role of genre in these activities, as well as a way forward for creating a better corpus for testing automated genre classification tasks and the application of these tasks to other domains

    SWA-KMDLS: An Enhanced e-Learning Management System Using Semantic Web and Knowledge Management Technology

    Get PDF
    In this era of knowledge economy in which knowledge have become the most precious resource, surveys have shown that e-Learning has been on the increasing trend in various organizations including, among others, education and corporate. The use of e-Learning is not only aim to acquire knowledge but also to maintain competitiveness and advantages for individuals or organizations. However, the early promise of e-Learning has yet to be fully realized, as it has been no more than a handout being published online, coupled with simple multiple-choice quizzes. The emerging of e-Learning 2.0 that is empowered by Web 2.0 technology still hardly overcome common problem such as information overload and poor content aggregation in a highly increasing number of learning objects in an e-Learning Management System (LMS) environment. The aim of this research study is to exploit the Semantic Web (SW) and Knowledge Management (KM) technology; the two emerging and promising technology to enhance the existing LMS. The proposed system is named as Semantic Web Aware-Knowledge Management Driven e-Learning System (SWA-KMDLS). An Ontology approach that is the backbone of SW and KM is introduced for managing knowledge especially from learning object and developing automated question answering system (Aquas) with expert locator in SWA-KMDLS. The METHONTOLOGY methodology is selected to develop the Ontology in this research work. The potential of SW and KM technology is identified in this research finding which will benefit e-Learning developer to develop e-Learning system especially with social constructivist pedagogical approach from the point of view of KM framework and SW environment. The (semi-) automatic ontological knowledge base construction system (SAOKBCS) has contributed to knowledge extraction from learning object semiautomatically whilst the Aquas with expert locator has facilitated knowledge retrieval that encourages knowledge sharing in e-Learning environment. The experiment conducted has shown that the SAOKBCS can extract concept that is the main component of Ontology from text learning object with precision of 86.67%, thus saving the expert time and effort to build Ontology manually. Additionally the experiment on Aquas has shown that more than 80% of users are satisfied with answers provided by the system. The expert locator framework can also improve the performance of Aquas in the future usage. Keywords: semantic web aware – knowledge e-Learning Management System (SWAKMDLS), semi-automatic ontological knowledge base construction system (SAOKBCS), automated question answering system (Aquas), Ontology, expert locator

    Evaluating SMS parsing using automated testing software

    Get PDF
    Mobile phones are ubiquitous with millions of users acquiring them every day for personal, business and social usage or communication. Its enormous pervasiveness has created a great advantage for its use as a technological tool applicable to overcome the challenges of information dissemination regarding burning issues, advertisement, and health related matters. Short message services (SMS), an integral functional part of cell phones, can be turned into a major tool for accessing databases of information on HIV/AIDS as appreciable percentage of the youth embrace the technology. The common features by the users of the unique language are the un-grammatical structure, convenience of spelling, homophony of words and alphanumeric mix up of the arrangement of words. This proves it to be difficult to serve as query in the search engine architecture. In this work SMS query was used for information accessing in Frequently Asked Question FAQ system under a specified medical domain. Finally, when the developed system was measured in terms of proximity to the answer retrieved remarkable results were observed
    corecore