28,925 research outputs found
Content-based Video Retrieval by Integrating Spatio-Temporal and Stochastic Recognition of Events
As amounts of publicly available video data grow the need to query this data efficiently becomes significant. Consequently content-based retrieval of video data turns out to be a challenging and important problem. We address the specific aspect of inferring semantics automatically from raw video data. In particular, we introduce a new video data model that supports the integrated use of two different approaches for mapping low-level features to high-level concepts. Firstly, the model is extended with a rule-based approach that supports spatio-temporal formalization of high-level concepts, and then with a stochastic approach. Furthermore, results on real tennis video data are presented, demonstrating the validity of both approaches, as well us advantages of their integrated us
Discovering unbounded episodes in sequential data
One basic goal in the analysis of time-series data is
to find frequent interesting episodes, i.e, collections
of events occurring frequently together in the input sequence.
Most widely-known work decide the interestingness of an episode from a
fixed user-specified window width or interval, that bounds the
subsequent sequential association rules.
We present in this paper, a more intuitive definition that
allows, in turn, interesting episodes to grow during the mining without any
user-specified help. A convenient algorithm to
efficiently discover the proposed unbounded episodes is also implemented.
Experimental results confirm that our approach results useful
and advantageous.Postprint (published version
The Ideal Candidate. Analysis of Professional Competences through Text Mining of Job Offers
The aim of this paper is to propose analytical tools for identifying peculiar aspects of job market for graduates. We propose a strategy for dealing with daa tat have different source and nature
Towards the quantification of the semantic information encoded in written language
Written language is a complex communication signal capable of conveying
information encoded in the form of ordered sequences of words. Beyond the local
order ruled by grammar, semantic and thematic structures affect long-range
patterns in word usage. Here, we show that a direct application of information
theory quantifies the relationship between the statistical distribution of
words and the semantic content of the text. We show that there is a
characteristic scale, roughly around a few thousand words, which establishes
the typical size of the most informative segments in written language.
Moreover, we find that the words whose contributions to the overall information
is larger, are the ones more closely associated with the main subjects and
topics of the text. This scenario can be explained by a model of word usage
that assumes that words are distributed along the text in domains of a
characteristic size where their frequency is higher than elsewhere. Our
conclusions are based on the analysis of a large database of written language,
diverse in subjects and styles, and thus are likely to be applicable to general
language sequences encoding complex information.Comment: 19 pages, 4 figure
- …