966 research outputs found
CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines
Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective.
The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines.
From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research
Accessing spoken interaction through dialogue processing [online]
Zusammenfassung
Unser Leben, unsere Leistungen und unsere Umgebung, alles wird
derzeit durch Schriftsprache dokumentiert. Die rasante
Fortentwicklung der technischen Möglichkeiten Audio, Bilder und
Video aufzunehmen, abzuspeichern und wiederzugeben kann genutzt
werden um die schriftliche Dokumentation von menschlicher
Kommunikation, zum Beispiel Meetings, zu unterstützen, zu
ergänzen oder gar zu ersetzen. Diese neuen Technologien können
uns in die Lage versetzen Information aufzunehmen, die
anderweitig verloren gehen, die Kosten der Dokumentation zu
senken und hochwertige Dokumente mit audiovisuellem Material
anzureichern. Die Indizierung solcher Aufnahmen stellt die
Kerntechnologie dar um dieses Potential auszuschöpfen. Diese
Arbeit stellt effektive Alternativen zu schlüsselwortbasierten
Indizes vor, die Suchraumeinschränkungen bewirken und teilweise
mit einfachen Mitteln zu berechnen sind.
Die Indizierung von Sprachdokumenten kann auf verschiedenen
Ebenen erfolgen: Ein Dokument gehört stilistisch einer
bestimmten Datenbasis an, welche durch sehr einfache Merkmale
bei hoher Genauigkeit automatisch bestimmt werden kann.
Durch diese Art von Klassifikation kann eine Reduktion des
Suchraumes um einen Faktor der Größenordnung 410 erfolgen. Die
Anwendung von thematischen Merkmalen zur Textklassifikation
bei einer Nachrichtendatenbank resultiert in einer Reduktion um
einen Faktor 18. Da Sprachdokumente sehr lang sein können müssen
sie in thematische Segmente unterteilt werden. Ein neuer
probabilistischer Ansatz sowie neue Merkmale (Sprecherinitia
tive und Stil) liefern vergleichbare oder bessere Resultate als
traditionelle schlüsselwortbasierte Ansätze. Diese thematische
Segmente können durch die vorherrschende Aktivität
charakterisiert werden (erzählen, diskutieren, planen, ...),
die durch ein neuronales Netz detektiert werden kann. Die
Detektionsraten sind allerdings begrenzt da auch Menschen
diese Aktivitäten nur ungenau bestimmen. Eine maximale
Reduktion des Suchraumes um den Faktor 6 ist bei den verwendeten
Daten theoretisch möglich. Eine thematische Klassifikation
dieser Segmente wurde ebenfalls auf einer Datenbasis
durchgeführt, die Detektionsraten für diesen Index sind jedoch
gering.
Auf der Ebene der einzelnen Äußerungen können Dialogakte wie
Aussagen, Fragen, Rückmeldungen (aha, ach ja, echt?, ...) usw.
mit einem diskriminativ trainierten Hidden Markov Model erkannt
werden. Dieses Verfahren kann um die Erkennung von kurzen Folgen
wie Frage/AntwortSpielen erweitert werden (Dialogspiele).
Dialogakte und spiele können eingesetzt werden um
Klassifikatoren für globale Sprechstile zu bauen. Ebenso
könnte ein Benutzer sich an eine bestimmte Dialogaktsequenz
erinnern und versuchen, diese in einer grafischen
Repräsentation wiederzufinden.
In einer Studie mit sehr pessimistischen Annahmen konnten
Benutzer eines aus vier ähnlichen und gleichwahrscheinlichen
Gesprächen mit einer Genauigkeit von ~ 43% durch eine graphische
Repräsentation von Aktivität bestimmt.
Dialogakte könnte in diesem Szenario ebenso nützlich sein, die
Benutzerstudie konnte aufgrund der geringen Datenmenge darüber
keinen endgültigen Aufschluß geben. Die Studie konnte allerdings
für detailierte Basismerkmale wie Formalität und
Sprecheridentität keinen Effekt zeigen.
Abstract
Written language is one of our primary means for documenting our
lives, achievements, and environment. Our capabilities to
record, store and retrieve audio, still pictures, and video are
undergoing a revolution and may support, supplement or even
replace written documentation. This technology enables us to
record information that would otherwise be lost, lower the cost
of documentation and enhance highquality documents with
original audiovisual material.
The indexing of the audio material is the key technology to
realize those benefits. This work presents effective
alternatives to keyword based indices which restrict the search
space and may in part be calculated with very limited resources.
Indexing speech documents can be done at a various levels:
Stylistically a document belongs to a certain database which can
be determined automatically with high accuracy using very simple
features. The resulting factor in search space reduction is in
the order of 410 while topic classification yielded a factor
of 18 in a news domain.
Since documents can be very long they need to be segmented into
topical regions. A new probabilistic segmentation framework as
well as new features (speaker initiative and style) prove to be
very effective compared to traditional keyword based methods. At
the topical segment level activities (storytelling, discussing,
planning, ...) can be detected using a machine learning approach
with limited accuracy; however even human annotators do not
annotate them very reliably. A maximum search space reduction
factor of 6 is theoretically possible on the databases used. A
topical classification of these regions has been attempted
on one database, the detection accuracy for that index, however,
was very low.
At the utterance level dialogue acts such as statements,
questions, backchannels (aha, yeah, ...), etc. are being
recognized using a novel discriminatively trained HMM procedure.
The procedure can be extended to recognize short sequences such
as question/answer pairs, so called dialogue games.
Dialog acts and games are useful for building classifiers for
speaking style. Similarily a user may remember a certain dialog
act sequence and may search for it in a graphical
representation.
In a study with very pessimistic assumptions users are able to
pick one out of four similar and equiprobable meetings correctly
with an accuracy ~ 43% using graphical activity information.
Dialogue acts may be useful in this situation as well but the
sample size did not allow to draw final conclusions. However the
user study fails to show any effect for detailed basic features
such as formality or speaker identity
Knowledge Extraction and Summarization for Textual Case-Based Reasoning: A Probabilistic Task Content Modeling Approach
Case-Based Reasoning (CBR) is an Artificial Intelligence (AI) technique that
has been successfully used for building knowledge systems for tasks/domains where different knowledge sources are easily available, particularly in the form of problem solving situations, known as cases. Cases generally display a clear
distinction between different components of problem solving, for instance, components of the problem description and of the problem solution. Thus, an existing and explicit structure of cases is presumed. However, when problem solving experiences are stored in the form of textual narratives (in natural language), there is no explicit case structure, so that CBR cannot be applied directly.
This thesis presents a novel approach for authoring cases from episodic textual
narratives and organizing these cases in a case base structure that permits a
better support for user goals. The approach is based on the following fundamental ideas:
- CBR as a problem solving technique is goal-oriented and goals are realized by
means of task strategies.
- Tasks have an internal structure that can be represented in terms of
participating events and event components.
- Episodic textual narratives are not random containers of domain concept
terms. Rather, the text can be considered as generated by the underlying
task structure whose content they describe.
The presented case base authoring process combines task knowledge with Natural
Language Processing (NLP) techniques to perform the needed knowledge extraction
and summarization
CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap
After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in
multimedia search engines, we have identified and analyzed gaps within European research effort during our second year.
In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio-
economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown
of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on
requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the
community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our
Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as
National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core
technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research
challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal
challenges
A Probabilistic Framework for Information Modelling and Retrieval Based on User Annotations on Digital Objects
Annotations are a means to make critical remarks, to explain and
comment things, to add notes and give opinions, and to relate objects.
Nowadays, they can be found in digital libraries and collaboratories,
for example as a building block for scientific discussion on the one
hand or as private notes on the other. We further find them in product
reviews, scientific databases and many "Web 2.0" applications; even
well-established concepts like emails can be regarded as annotations
in a certain sense. Digital annotations can be (textual) comments,
markings (i.e. highlighted parts) and references to other documents
or document parts. Since annotations convey information which is
potentially important to satisfy a user's information need, this
thesis tries to answer the question of how to exploit annotations for
information retrieval. It gives a first answer to the question if
retrieval effectiveness can be improved with annotations.
A survey of the "annotation universe" reveals some facets of
annotations; for example, they can be content level annotations
(extending the content of the annotation object) or meta level ones
(saying something about the annotated object). Besides the annotations
themselves, other objects created during the process of annotation can
be interesting for retrieval, these being the annotated fragments.
These objects are integrated into an object-oriented model comprising
digital objects such as structured documents and annotations as well
as fragments. In this model, the different relationships among the
various objects are reflected. From this model, the basic data
structure for annotation-based retrieval, the structured annotation
hypertext, is derived.
In order to thoroughly exploit the information contained in structured
annotation hypertexts, a probabilistic, object-oriented logical
framework called POLAR is introduced. In POLAR, structured annotation
hypertexts can be modelled by means of probabilistic propositions and
four-valued logics. POLAR allows for specifying several relationships
among annotations and annotated (sub)parts or fragments. Queries can
be posed to extract the knowledge contained in structured annotation
hypertexts. POLAR supports annotation-based retrieval, i.e. document
and discussion search, by applying an augmentation strategy (knowledge
augmentation, propagating propositions from subcontexts like annotations,
or relevance augmentation, where retrieval status values are propagated)
in conjunction with probabilistic inference, where P(d -> q), the probability
that a document d implies a query q, is estimated.
POLAR's semantics is based on possible worlds and accessibility
relations. It is implemented on top of four-valued probabilistic Datalog.
POLAR's core retrieval functionality, knowledge augmentation with
probabilistic inference, is evaluated for discussion and document
search. The experiments show that all relevant POLAR objects, merged
annotation targets, fragments and content annotations, are able to
increase retrieval effectiveness when used as a context for discussion
or document search. Additional experiments reveal that we can determine
the polarity of annotations with an accuracy of around 80%
Content And Multimedia Database Management Systems
A database management system is a general-purpose software system that facilitates the processes of defining, constructing, and manipulating databases for various applications. The main characteristic of the ‘database approach’ is that it increases the value of data by its emphasis on data independence. DBMSs, and in particular those based on the relational data model, have been very successful at the management of administrative data in the business domain. This thesis has investigated data management in multimedia digital libraries, and its implications on the design of database management systems. The main problem of multimedia data management is providing access to the stored objects. The content structure of administrative data is easily represented in alphanumeric values. Thus, database technology has primarily focused on handling the objects’ logical structure. In the case of multimedia data, representation of content is far from trivial though, and not supported by current database management systems
- …