7,618 research outputs found
Normalization of Dutch user-generated content
Abstract This paper describes a phrase-based machine translation approach to normalize Dutch user-generated content (UGC). We compiled a corpus of three different social media genres (text messages, message board posts and tweets) to have a sample of this recent domain. We describe the various characteristics of this noisy text material and explain how it has been manually normalized using newly developed guidelines. For the automatic normalization task we focus on text messages, and find that a cascaded SMT system where a token-based module is followed by a translation at the character level gives the best word error rate reduction. After these initial experiments, we investigate the system's robustness on the complete domain of UGC by testing it on the other two social media genres, and find that the cascaded approach performs best on these genres as well. To our knowledge, we deliver the first proof-of-concept system for Dutch UGC normalization, which can serve as a baseline for future work
Automatic Stress Detection in Working Environments from Smartphones' Accelerometer Data: A First Step
Increase in workload across many organisations and consequent increase in
occupational stress is negatively affecting the health of the workforce.
Measuring stress and other human psychological dynamics is difficult due to
subjective nature of self- reporting and variability between and within
individuals. With the advent of smartphones it is now possible to monitor
diverse aspects of human behaviour, including objectively measured behaviour
related to psychological state and consequently stress. We have used data from
the smartphone's built-in accelerometer to detect behaviour that correlates
with subjects stress levels. Accelerometer sensor was chosen because it raises
fewer privacy concerns (in comparison to location, video or audio recording,
for example) and because its low power consumption makes it suitable to be
embedded in smaller wearable devices, such as fitness trackers. 30 subjects
from two different organizations were provided with smartphones. The study
lasted for 8 weeks and was conducted in real working environments, with no
constraints whatsoever placed upon smartphone usage. The subjects reported
their perceived stress levels three times during their working hours. Using
combination of statistical models to classify self reported stress levels, we
achieved a maximum overall accuracy of 71% for user-specific models and an
accuracy of 60% for the use of similar-users models, relying solely on data
from a single accelerometer.Comment: in IEEE Journal of Biomedical and Health Informatics, 201
The Development of Web-Based Interface to Census Interaction Data
This project involves the development of a Web interface to origin-destination statistics from the 1991 Census (in a form that will be compatible with planned 2001 outputs). It provides the user with a set of screen-based tools for setting the parameters governing each data extraction (data set, areas, variables) in the form of a query. Traffic light icons are used to signal what the user has set so far and what remains to be done. There are options to extract different types of flow data and to generate output in different formats. The system can now be used to access the interaction flow data contained in the 1991 Special Migration Statistics Sets 1 and 2 and Special Workplace Statistics Set C. WICID has been demonstrated at the Origin-Destination Statistics Roadshows organised by GRO Scotland and held during May/June 2000 and the Census Offices have expressed interest in using the software in the Census Access Project
- …