10,952 research outputs found
Of course we share! Testing Assumptions about Social Tagging Systems
Social tagging systems have established themselves as an important part in
today's web and have attracted the interest from our research community in a
variety of investigations. The overall vision of our community is that simply
through interactions with the system, i.e., through tagging and sharing of
resources, users would contribute to building useful semantic structures as
well as resource indexes using uncontrolled vocabulary not only due to the
easy-to-use mechanics. Henceforth, a variety of assumptions about social
tagging systems have emerged, yet testing them has been difficult due to the
absence of suitable data. In this work we thoroughly investigate three
available assumptions - e.g., is a tagging system really social? - by examining
live log data gathered from the real-world public social tagging system
BibSonomy. Our empirical results indicate that while some of these assumptions
hold to a certain extent, other assumptions need to be reflected and viewed in
a very critical light. Our observations have implications for the design of
future search and other algorithms to better reflect the actual user behavior
Effective Feature Representation for Clinical Text Concept Extraction
Crucial information about the practice of healthcare is recorded only in
free-form text, which creates an enormous opportunity for high-impact NLP.
However, annotated healthcare datasets tend to be small and expensive to
obtain, which raises the question of how to make maximally efficient uses of
the available data. To this end, we develop an LSTM-CRF model for combining
unsupervised word representations and hand-built feature representations
derived from publicly available healthcare ontologies. We show that this
combined model yields superior performance on five datasets of diverse kinds of
healthcare text (clinical, social, scientific, commercial). Each involves the
labeling of complex, multi-word spans that pick out different healthcare
concepts. We also introduce a new labeled dataset for identifying the treatment
relations between drugs and diseases
Semantic multimedia analysis using knowledge and context
PhDThe difficulty of semantic multimedia analysis can be attributed to the
extended diversity in form and appearance exhibited by the majority of
semantic concepts and the difficulty to express them using a finite number
of patterns. In meeting this challenge there has been a scientific debate
on whether the problem should be addressed from the perspective of using
overwhelming amounts of training data to capture all possible instantiations
of a concept, or from the perspective of using explicit knowledge about
the concepts’ relations to infer their presence. In this thesis we address
three problems of pattern recognition and propose solutions that combine
the knowledge extracted implicitly from training data with the knowledge
provided explicitly in structured form. First, we propose a BNs modeling
approach that defines a conceptual space where both domain related evi-
dence and evidence derived from content analysis can be jointly considered
to support or disprove a hypothesis. The use of this space leads to sig-
nificant gains in performance compared to analysis methods that can not
handle combined knowledge. Then, we present an unsupervised method
that exploits the collective nature of social media to automatically obtain
large amounts of annotated image regions. By proving that the quality of
the obtained samples can be almost as good as manually annotated images
when working with large datasets, we significantly contribute towards scal-
able object detection. Finally, we introduce a method that treats images,
visual features and tags as the three observable variables of an aspect model
and extracts a set of latent topics that incorporates the semantics of both
visual and tag information space. By showing that the cross-modal depen-
dencies of tagged images can be exploited to increase the semantic capacity
of the resulting space, we advocate the use of all existing information facets
in the semantic analysis of social media
- …