7,490 research outputs found

    Collecting a corpus of Dutch SMS

    Get PDF
    In this paper we present the first freely available corpus of Dutch text messages containing data originating from the Netherlands and Flanders. This corpus has been collected in the framework of the SoNaR project and constitutes a viable part of this 500-million-word corpus. About 53,000 text messages were collected on a large scale, based on voluntary donations. These messages will be distributed as such. In this paper we focus on the data collection processes involved and after studying the effect of media coverage we show that especially free publicity in newspapers and on social media networks results in more contributions. All SMS are provided with metadata information. Looking at the composition of the corpus, it becomes visible that a small number of people have contributed a large amount of data, in total 272 people have contributed to the corpus during three months. The number of women contributing to the corpus is larger than the number of men, but male contributors submitted larger amounts of data. This corpus will be of paramount importance for sociolinguistic research and normalisation studies

    An Algorithm for Automatic Service Composition

    Get PDF
    Telecommunication companies are struggling to provide their users with value-added services. These services are expected to be context-aware, attentive and personalized. Since it is not economically feasible to build services separately by hand for each individual user, service providers are searching for alternatives to automate service creation. The IST-SPICE project aims at developing a platform for the development and deployment of innovative value-added services. In this paper we introduce our algorithm to cope with the task of automatic composition of services. The algorithm considers that every available service is semantically annotated. Based on a user/developer service request a matching service is composed in terms of component services. The composition follows a semantic graph-based approach, on which atomic services are iteratively composed based on services' functional and non-functional properties

    MedTxting: learning based and knowledge rich SMS-style medical text contraction

    Get PDF
    In mobile health (M-health), Short Message Service (SMS) has shown to improve disease related self-management and health service outcomes, leading to enhanced patient care. However, the hard limit on character size for each message limits the full value of exploring SMS communication in health care practices. To overcome this problem and improve the efficiency of clinical workflow, we developed an innovative system, MedTxting (available at http://medtxting.askhermes.org), which is a learning-based but knowledge-rich system that compresses medical texts in a SMS style. Evaluations on clinical questions and discharge summary narratives show that MedTxting can effectively compress medical texts with reasonable readability and noticeable size reduction. Findings in this work reveal potentials of MedTxting to the clinical settings, allowing for real-time and cost-effective communication, such as patient condition reporting, medication consulting, physicians connecting to share expertise to improve point of care

    Technology for Good: Innovative Use of Technology by Charities

    Get PDF
    Technology for Good identifies ten technologies being used by charitable organizations in innovative ways. The report briefly introduces each technology and provides examples of how those technologies are being used.Examples are drawn from a broad spectrum of organizations working on widely varied issues around the globe. This makes Technology for Good a unique repository of inspiration for the public and private sectors, funders, and other change makers who support the creation and use of technology for social good

    Evaluating SMS parsing using automated testing software

    Get PDF
    Mobile phones are ubiquitous with millions of users acquiring them every day for personal, business and social usage or communication. Its enormous pervasiveness has created a great advantage for its use as a technological tool applicable to overcome the challenges of information dissemination regarding burning issues, advertisement, and health related matters. Short message services (SMS), an integral functional part of cell phones, can be turned into a major tool for accessing databases of information on HIV/AIDS as appreciable percentage of the youth embrace the technology. The common features by the users of the unique language are the un-grammatical structure, convenience of spelling, homophony of words and alphanumeric mix up of the arrangement of words. This proves it to be difficult to serve as query in the search engine architecture. In this work SMS query was used for information accessing in Frequently Asked Question FAQ system under a specified medical domain. Finally, when the developed system was measured in terms of proximity to the answer retrieved remarkable results were observed

    Holaaa!! Writin like u talk is kewl but kinda hard 4 NLP

    Get PDF
    We present work in progress aiming to build tools for the normalization of User-Generated Content (UGC). As we will see, the task requires the revisiting of the initial steps of NLP processing, since UGC (micro-blog, blog, and, generally, Web 2.0 user texts) presents a number of non-standard communicative and linguistic characteristics, and is in fact much closer to oral and colloquial language than to edited text. We present and characterize a corpus of UGC text in Spanish from three different sources: Twitter, consumer reviews and blogs. We motivate the need for UGC text normalization by analyzing the problems found when processing this type of text through a conventional language processing pipeline, particularly in the tasks of lemmatization and morphosyntactic tagging, and finally we propose a strategy for automatically normalizing UGC using a selector of correct forms on top of a pre-existing spell-checker.Postprint (published version
    • …
    corecore