6,905 research outputs found

    Building lexical resources: towards programmable contributive platforms

    Get PDF
    International audienceLexical resources are very important in nowadays society, with the globalization and the increase of world communi- cation and exchanges. There are clearly identified needs, both for humans and machines. Nevertheless, very few efforts are actually done in this domain. Consequently, there is an important lack of freely available good quality resources, especially for under- resourced languages. Furthermore, the majority of existing bilin- gual dictionaries is built with one language as English. Therefore, if one wants to translate from one language (that is not English) to another, it uses English as a pivot. And even for English native speakers, it creates a lot of misunderstandings that can be critical in many situations. In order to create and extend freely available good quality rich lexical resources for under-resourced languages online with a community of voluntary contributors, Jibiki, an online generic platform for managing (lookup, editing, import, export) any kind of lexical resources encoded in XML, has been developed. This platform is successfully used in several dictionary construction projects. Concerning the data, a serious game has been launched in order to collect precious lexical information such as collocations that will be integrated later into dictionary entries. Work is now done on extending our platform in order to reuse the resulting resources and enriching them by synchronization with the other systems (language learners and translators environments, machine translation systems, etc.)

    Thai word segmentation on social networks with time sensitivity

    Get PDF
    Social network service like Twitter is one of the important social networks that has had a huge impact on Thai culture.It has changed the behavior of many Thai people from using televisions to using computers or smart phones regularly.Thai people also share their experiences and get information such as news on social networks. With the increasing number of micro-blog messages that are originated and discussed over social networks, Thai word segmentation is becoming a compelling research issue as it is an important task in natural language processing. However, the existing Thai segmentation approaches are not designed to deal with short and noisy messages like Twitter. In this paper, we proposed Thai word segmentation on social networks approach by exploit both the local context (in tweets) and the global context from Thai Wikipedia.We evaluate our approach based on a real-world Twitter dataset. Our experiments show that the proposed approach can effectively segment Twitter messages over the baseline

    Effectiveness of Word Extraction and Information Retrieval on Cancer from Thai Website

    Get PDF
    This article proposes word extraction and cancer information retrieval from the Thai website. For word extraction, TH-OnSeg is proposed as a words segmentation based on LexTo algorithm with cancer dictionary and cancer oncology. TH-Onseg is used to extract cancer related words to be used as document indexing for cancer websites. The experiments were conducted by comparing the word extraction with LexTo words segment algorithm based on Thai electronic dictionary. The results show that the TH-OnSeg technique has higher efficiency; it can extract more words than LexTo for unknown words, known words, and ambiguous words.  In addition, we propose a semantic web-based technique combined with n-grams for cancer information retrieval. The experiments were conducted by comparing the proposed technique with information retrieval methods in database.  The results show that the use of semantic web techniques combined with N-gram for cancer information retrieval yields the highest number of cancer websites. The highest recall is not less than 0.9 in all experimental cases of both misspellings and misspellings

    Proceedings of the COLING 2004 Post Conference Workshop on Multilingual Linguistic Ressources MLR2004

    No full text
    International audienceIn an ever expanding information society, most information systems are now facing the "multilingual challenge". Multilingual language resources play an essential role in modern information systems. Such resources need to provide information on many languages in a common framework and should be (re)usable in many applications (for automatic or human use). Many centres have been involved in national and international projects dedicated to building har- monised language resources and creating expertise in the maintenance and further development of standardised linguistic data. These resources include dictionaries, lexicons, thesauri, word-nets, and annotated corpora developed along the lines of best practices and recommendations. However, since the late 90's, most efforts in scaling up these resources remain the responsibility of the local authorities, usually, with very low funding (if any) and few opportunities for academic recognition of this work. Hence, it is not surprising that many of the resource holders and developers have become reluctant to give free access to the latest versions of their resources, and their actual status is therefore currently rather unclear. The goal of this workshop is to study problems involved in the development, management and reuse of lexical resources in a multilingual context. Moreover, this workshop provides a forum for reviewing the present state of language resources. The workshop is meant to bring to the international community qualitative and quantitative information about the most recent developments in the area of linguistic resources and their use in applications. The impressive number of submissions (38) to this workshop and in other workshops and conferences dedicated to similar topics proves that dealing with multilingual linguistic ressources has become a very hot problem in the Natural Language Processing community. To cope with the number of submissions, the workshop organising committee decided to accept 16 papers from 10 countries based on the reviewers' recommendations. Six of these papers will be presented in a poster session. The papers constitute a representative selection of current trends in research on Multilingual Language Resources, such as multilingual aligned corpora, bilingual and multilingual lexicons, and multilingual speech resources. The papers also represent a characteristic set of approaches to the development of multilingual language resources, such as automatic extraction of information from corpora, combination and re-use of existing resources, online collaborative development of multilingual lexicons, and use of the Web as a multilingual language resource. The development and management of multilingual language resources is a long-term activity in which collaboration among researchers is essential. We hope that this workshop will gather many researchers involved in such developments and will give them the opportunity to discuss, exchange, compare their approaches and strengthen their collaborations in the field. The organisation of this workshop would have been impossible without the hard work of the program committee who managed to provide accurate reviews on time, on a rather tight schedule. We would also like to thank the Coling 2004 organising committee that made this workshop possible. Finally, we hope that this workshop will yield fruitful results for all participants

    Using concept similarity in cross ontology for adaptive e-Learning systems

    Get PDF
    Abstracte-Learning is one of the most preferred media of learning by the learners. The learners search the web to gather knowledge about a particular topic from the information in the repositories. Retrieval of relevant materials from a domain can be easily implemented if the information is organized and related in some way. Ontologies are a key concept that helps us to relate information for providing the more relevant lessons to the learner. This paper proposes an adaptive e-Learning system, which generates a user specific e-Learning content by comparing the concepts with more than one system using similarity measures. A cross ontology measure is defined, which consists of fuzzy domain ontology as the primary ontology and the domain expert’s ontology as the secondary ontology, for the comparison process. A personalized document is provided to the user with a user profile, which includes the data obtained from the processing of the proposed method under a User score, which is obtained through the user evaluation. The results of the proposed e-Learning system under the designed cross ontology similarity measure show a significant increase in performance and accuracy under different conditions. The assessment of the comparative analysis, showed the difference in performance of our proposed method over other methods. Based on the assessment results it is proved that the proposed approach is effective over other methods

    INTEGRATING COLLABORATIVE SKILLS IN 8TH GRADE ENGLISH TEACHING LESSON PLANS AT JUNIOR HIGH SCHOOL

    Get PDF
    Collaboration is one of the 21st-century skills needed to survive in this era. The need in society to think and work together on issues of critical concern has increased, shifting the emphasis from individual efforts to group work, from independence to community. Integrating Collaborative skills in the educational field, particularly in assessment, has been broadly employed. However, lesson plans that incorporate collaborative skills are lacking. This study aims to explore the collaborative skill in English teaching lesson plan in junior high school in Indonesia. A content analysis approach is used in this study. The data was analyzed using a systematic content descriptive text methodology based on the ACER (Australian Council for Educational Research) Framework, which was used in this study as an instrument to analyze lesson plans to see if they were integrating or not. The expected result of the study is that Collaborative skills are found in lesson plans, indicators, and the learning process

    24th International Conference on Information Modelling and Knowledge Bases

    Get PDF
    In the last three decades information modelling and knowledge bases have become essentially important subjects not only in academic communities related to information systems and computer science but also in the business area where information technology is applied. The series of European – Japanese Conference on Information Modelling and Knowledge Bases (EJC) originally started as a co-operation initiative between Japan and Finland in 1982. The practical operations were then organised by professor Ohsuga in Japan and professors Hannu Kangassalo and Hannu Jaakkola in Finland (Nordic countries). Geographical scope has expanded to cover Europe and also other countries. Workshop characteristic - discussion, enough time for presentations and limited number of participants (50) / papers (30) - is typical for the conference. Suggested topics include, but are not limited to: 1. Conceptual modelling: Modelling and specification languages; Domain-specific conceptual modelling; Concepts, concept theories and ontologies; Conceptual modelling of large and heterogeneous systems; Conceptual modelling of spatial, temporal and biological data; Methods for developing, validating and communicating conceptual models. 2. Knowledge and information modelling and discovery: Knowledge discovery, knowledge representation and knowledge management; Advanced data mining and analysis methods; Conceptions of knowledge and information; Modelling information requirements; Intelligent information systems; Information recognition and information modelling. 3. Linguistic modelling: Models of HCI; Information delivery to users; Intelligent informal querying; Linguistic foundation of information and knowledge; Fuzzy linguistic models; Philosophical and linguistic foundations of conceptual models. 4. Cross-cultural communication and social computing: Cross-cultural support systems; Integration, evolution and migration of systems; Collaborative societies; Multicultural web-based software systems; Intercultural collaboration and support systems; Social computing, behavioral modeling and prediction. 5. Environmental modelling and engineering: Environmental information systems (architecture); Spatial, temporal and observational information systems; Large-scale environmental systems; Collaborative knowledge base systems; Agent concepts and conceptualisation; Hazard prediction, prevention and steering systems. 6. Multimedia data modelling and systems: Modelling multimedia information and knowledge; Contentbased multimedia data management; Content-based multimedia retrieval; Privacy and context enhancing technologies; Semantics and pragmatics of multimedia data; Metadata for multimedia information systems. Overall we received 56 submissions. After careful evaluation, 16 papers have been selected as long paper, 17 papers as short papers, 5 papers as position papers, and 3 papers for presentation of perspective challenges. We thank all colleagues for their support of this issue of the EJC conference, especially the program committee, the organising committee, and the programme coordination team. The long and the short papers presented in the conference are revised after the conference and published in the Series of “Frontiers in Artificial Intelligence” by IOS Press (Amsterdam). The books “Information Modelling and Knowledge Bases” are edited by the Editing Committee of the conference. We believe that the conference will be productive and fruitful in the advance of research and application of information modelling and knowledge bases. Bernhard Thalheim Hannu Jaakkola Yasushi Kiyok

    Explorations in Cyber International Relations (ECIR) – Data Dashboard Report #1: CERT Data Sources and Prototype Dashboard System

    Get PDF
    Disclaimer: This report relies on publicly available information, especially from the CERTs' pubic web sites. They have not yet been contacted to confirm our understanding of their data. That will be done in subsequent phases of this effort.Growing global interconnection and interdependency of computer networks, in combination with increased sophistication of cyber attacks over time, demonstrate the need for better understanding of the collective and cooperative security measures needed to prevent and respond to cybersecurity emergencies. The Exploring Cyber International Relations (ECIR) Data Dashboard project is an initial effort to gather and analyze such data within and between countries. This report describes the prototype ECIR Data Dashboard and the initial data sources used. In 1988, the United States Department of Defense and Carnegie Mellon University formed the Computer Emergency Response Team (CERT) to lead and coordinate national and international efforts to combat cybsersecurity threats. Since then, the number of CERTs worldwide has grown dramatically, leading to the potential for a sophisticated and coordinated global cybersecurity response network. This report focuses primarily on the current state of the worldwide CERTs, including the data publiclyavailable, the extent of coordination, and the maturity of data management and responses. The report summarizes, analyses, and critiques the worldwide CERT network. Additionally, the report describes the ECIR team's Data Dashboard project, designed to provide scholars, policymakers, IT professionals, and other stakeholders with a comprehensive set of data on national-level cybersecurity, information technology, and demographic data. The Dashboard allows these stakeholders to observe chronological trends and multivariate correlations that can lead to insight into the current state, potential future trends, and approximate causes of global cybersecurity issues. This report summarizes the purpose, state, progress, and challenges of developing the Data Dashboard project. Disclaimer: This report relies on publicly available information, especially from the CERTs’ pubic web sites. They have not yet been contacted to confirm our understanding of their data. That will be done in subsequent phases of this effort
    corecore