7,618 research outputs found

    Normalization of Dutch user-generated content

    Get PDF
    Abstract This paper describes a phrase-based machine translation approach to normalize Dutch user-generated content (UGC). We compiled a corpus of three different social media genres (text messages, message board posts and tweets) to have a sample of this recent domain. We describe the various characteristics of this noisy text material and explain how it has been manually normalized using newly developed guidelines. For the automatic normalization task we focus on text messages, and find that a cascaded SMT system where a token-based module is followed by a translation at the character level gives the best word error rate reduction. After these initial experiments, we investigate the system's robustness on the complete domain of UGC by testing it on the other two social media genres, and find that the cascaded approach performs best on these genres as well. To our knowledge, we deliver the first proof-of-concept system for Dutch UGC normalization, which can serve as a baseline for future work

    Automatic Stress Detection in Working Environments from Smartphones' Accelerometer Data: A First Step

    Full text link
    Increase in workload across many organisations and consequent increase in occupational stress is negatively affecting the health of the workforce. Measuring stress and other human psychological dynamics is difficult due to subjective nature of self- reporting and variability between and within individuals. With the advent of smartphones it is now possible to monitor diverse aspects of human behaviour, including objectively measured behaviour related to psychological state and consequently stress. We have used data from the smartphone's built-in accelerometer to detect behaviour that correlates with subjects stress levels. Accelerometer sensor was chosen because it raises fewer privacy concerns (in comparison to location, video or audio recording, for example) and because its low power consumption makes it suitable to be embedded in smaller wearable devices, such as fitness trackers. 30 subjects from two different organizations were provided with smartphones. The study lasted for 8 weeks and was conducted in real working environments, with no constraints whatsoever placed upon smartphone usage. The subjects reported their perceived stress levels three times during their working hours. Using combination of statistical models to classify self reported stress levels, we achieved a maximum overall accuracy of 71% for user-specific models and an accuracy of 60% for the use of similar-users models, relying solely on data from a single accelerometer.Comment: in IEEE Journal of Biomedical and Health Informatics, 201

    The Development of Web-Based Interface to Census Interaction Data

    Get PDF
    This project involves the development of a Web interface to origin-destination statistics from the 1991 Census (in a form that will be compatible with planned 2001 outputs). It provides the user with a set of screen-based tools for setting the parameters governing each data extraction (data set, areas, variables) in the form of a query. Traffic light icons are used to signal what the user has set so far and what remains to be done. There are options to extract different types of flow data and to generate output in different formats. The system can now be used to access the interaction flow data contained in the 1991 Special Migration Statistics Sets 1 and 2 and Special Workplace Statistics Set C. WICID has been demonstrated at the Origin-Destination Statistics Roadshows organised by GRO Scotland and held during May/June 2000 and the Census Offices have expressed interest in using the software in the Census Access Project
    • …
    corecore