5,622 research outputs found
Experiments in Clustering Homogeneous XML Documents to Validate an Existing Typology
This paper presents some experiments in clustering homogeneous XMLdocuments
to validate an existing classification or more generally anorganisational
structure. Our approach integrates techniques for extracting knowledge from
documents with unsupervised classification (clustering) of documents. We focus
on the feature selection used for representing documents and its impact on the
emerging classification. We mix the selection of structured features with fine
textual selection based on syntactic characteristics.We illustrate and evaluate
this approach with a collection of Inria activity reports for the year 2003.
The objective is to cluster projects into larger groups (Themes), based on the
keywords or different chapters of these activity reports. We then compare the
results of clustering using different feature selections, with the official
theme structure used by Inria.Comment: (postprint); This version corrects a couple of errors in authors'
names in the bibliograph
Context Aware Computing for The Internet of Things: A Survey
As we are moving towards the Internet of Things (IoT), the number of sensors
deployed around the world is growing at a rapid pace. Market research has shown
a significant growth of sensor deployments over the past decade and has
predicted a significant increment of the growth rate in the future. These
sensors continuously generate enormous amounts of data. However, in order to
add value to raw sensor data we need to understand it. Collection, modelling,
reasoning, and distribution of context in relation to sensor data plays
critical role in this challenge. Context-aware computing has proven to be
successful in understanding sensor data. In this paper, we survey context
awareness from an IoT perspective. We present the necessary background by
introducing the IoT paradigm and context-aware fundamentals at the beginning.
Then we provide an in-depth analysis of context life cycle. We evaluate a
subset of projects (50) which represent the majority of research and commercial
solutions proposed in the field of context-aware computing conducted over the
last decade (2001-2011) based on our own taxonomy. Finally, based on our
evaluation, we highlight the lessons to be learnt from the past and some
possible directions for future research. The survey addresses a broad range of
techniques, methods, models, functionalities, systems, applications, and
middleware solutions related to context awareness and IoT. Our goal is not only
to analyse, compare and consolidate past research work but also to appreciate
their findings and discuss their applicability towards the IoT.Comment: IEEE Communications Surveys & Tutorials Journal, 201
Automatic offensive language detection from Twitter data using machine learning and feature selection of metadata
The popularity of social networks has only increased
in recent years. In theory, the use of social media was proposed
so we could share our views online, keep in contact with loved
ones or share good moments of life. However, the reality is
not so perfect, so you have people sharing hate speech-related
messages, or using it to bully specific individuals, for instance,
or even creating robots where their only goal is to target specific
situations or people. Identifying who wrote such text is not easy
and there are several possible ways of doing it, such as using
natural language processing or machine learning algorithms
that can investigate and perform predictions using the metadata associated with it. In this work, we present an initial
investigation of which are the best machine learning techniques
to detect offensive language in tweets. After an analysis of the
current trend in the literature about the recent text classification
techniques, we have selected Linear SVM and Naive Bayes
algorithms for our initial tests. For the preprocessing of data,
we have used different techniques for attribute selection that
will be justified in the literature section. After our experiments,
we have obtained 92% of accuracy and 95% of recall to detect
offensive language with Naive Bayes and 90% of accuracy and
92% of recall with Linear SVM. From our understanding, these
results overcome our related literature and are a good indicative
of the importance of the data description approach we have used
Vision systems with the human in the loop
The emerging cognitive vision paradigm deals with vision systems that apply machine learning and automatic reasoning in order to learn from what they perceive. Cognitive vision systems can rate the relevance and consistency of newly acquired knowledge, they can adapt to their environment and thus will exhibit high robustness. This contribution presents vision systems that aim at flexibility and robustness. One is tailored for content-based image retrieval, the others are cognitive vision systems that constitute prototypes of visual active memories which evaluate, gather, and integrate contextual knowledge for visual analysis. All three systems are designed to interact with human users. After we will have discussed adaptive content-based image retrieval and object and action recognition in an office environment, the issue of assessing cognitive systems will be raised. Experiences from psychologically evaluated human-machine interactions will be reported and the promising potential of psychologically-based usability experiments will be stressed
- ā¦