1,370 research outputs found
Machine Learning in Automated Text Categorization
The automated categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last ten years, due to the
increased availability of documents in digital form and the ensuing need to
organize them. In the research community the dominant approach to this problem
is based on machine learning techniques: a general inductive process
automatically builds a classifier by learning, from a set of preclassified
documents, the characteristics of the categories. The advantages of this
approach over the knowledge engineering approach (consisting in the manual
definition of a classifier by domain experts) are a very good effectiveness,
considerable savings in terms of expert manpower, and straightforward
portability to different domains. This survey discusses the main approaches to
text categorization that fall within the machine learning paradigm. We will
discuss in detail issues pertaining to three different problems, namely
document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey
CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap
After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in
multimedia search engines, we have identified and analyzed gaps within European research effort during our second year.
In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio-
economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown
of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on
requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the
community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our
Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as
National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core
technological gaps that involve research challenges, and βenablersβ, which are not necessarily technical research
challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal
challenges
Digital Image Access & Retrieval
The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio
A pilot investigation of Information Extraction in the semantic annotation of archaeological reports
The paper discusses a prototype investigation of semantic annotation, a form of metadata assigning conceptual entities to textual instances; in the case of archaeological grey literature. The use of Information Extraction (IE), a Natural Language Processing (NLP) technique, is central to the annotation process while the use of Knowledge Organization System (KOS) is explored for the association of semantic annotation with both ontological and terminological references. The annotation process follows a rule-based information extraction approach using the GATE NLP toolkit, together with the CIDOC CRM ontology, its CRM-EH archaeological extension and English Heritage thesauri and glossaries. Results are reported from an initial evaluation, which suggest that these information extraction techniques can be applied to archaeological grey literature reports. Further work is discussed drawing on the evaluation and consideration of the characteristics of the archaeology domain. Copyright Β© 2012 Inderscience Enterprises Ltd
Collaborative recommendations with content-based filters for cultural activities via a scalable event distribution platform
Nowadays, most people have limited leisure time and the offer of (cultural) activities to spend this time is enormous. Consequently, picking the most appropriate events becomes increasingly difficult for end-users. This complexity of choice reinforces the necessity of filtering systems that assist users in finding and selecting relevant events. Whereas traditional filtering tools enable e.g. the use of keyword-based or filtered searches, innovative recommender systems draw on user ratings, preferences, and metadata describing the events. Existing collaborative recommendation techniques, developed for suggesting web-shop products or audio-visual content, have difficulties with sparse rating data and can not cope at all with event-specific restrictions like availability, time, and location. Moreover, aggregating, enriching, and distributing these events are additional requisites for an optimal communication channel. In this paper, we propose a highly-scalable event recommendation platform which considers event-specific characteristics. Personal suggestions are generated by an advanced collaborative filtering algorithm, which is more robust on sparse data by extending user profiles with presumable future consumptions. The events, which are described using an RDF/OWL representation of the EventsML-G2 standard, are categorized and enriched via smart indexing and open linked data sets. This metadata model enables additional content-based filters, which consider event-specific characteristics, on the recommendation list. The integration of these different functionalities is realized by a scalable and extendable bus architecture. Finally, focus group conversations were organized with external experts, cultural mediators, and potential end-users to evaluate the event distribution platform and investigate the possible added value of recommendations for cultural participation
Dagstuhl News January - December 1999
"Dagstuhl News" is a publication edited especially for the members of the Foundation "Informatikzentrum Schloss Dagstuhl" to thank them for their support. The News give a summary of the scientific work being done in Dagstuhl. Each Dagstuhl Seminar is presented by a small abstract describing the contents and scientific highlights of the seminar as well as the perspectives or challenges of the research topic
Automatic object classification for surveillance videos.
PhDThe recent popularity of surveillance video systems, specially located in urban
scenarios, demands the development of visual techniques for monitoring purposes.
A primary step towards intelligent surveillance video systems consists on automatic
object classification, which still remains an open research problem and the keystone
for the development of more specific applications.
Typically, object representation is based on the inherent visual features. However,
psychological studies have demonstrated that human beings can routinely categorise
objects according to their behaviour. The existing gap in the understanding
between the features automatically extracted by a computer, such as appearance-based
features, and the concepts unconsciously perceived by human beings but
unattainable for machines, or the behaviour features, is most commonly known
as semantic gap. Consequently, this thesis proposes to narrow the semantic gap
and bring together machine and human understanding towards object classification.
Thus, a Surveillance Media Management is proposed to automatically detect and
classify objects by analysing the physical properties inherent in their appearance
(machine understanding) and the behaviour patterns which require a higher level of
understanding (human understanding). Finally, a probabilistic multimodal fusion
algorithm bridges the gap performing an automatic classification considering both
machine and human understanding.
The performance of the proposed Surveillance Media Management framework
has been thoroughly evaluated on outdoor surveillance datasets. The experiments
conducted demonstrated that the combination of machine and human understanding
substantially enhanced the object classification performance. Finally, the inclusion
of human reasoning and understanding provides the essential information to bridge
the semantic gap towards smart surveillance video systems
A Web-based multimedia collaboratory. Empirical work studies in film archives
This report represents the latest study in the activity on Ecological Information Systems conducted in the Center for Human Machine Interaction situated at Ris National Laboratory and the University of Aarhus. The purpose of this activity is to give a description of the characteristics of work domains that will serve to outline the general context of concern to design of collaboratories. In addition, a set of preliminary implications for the design of a collaboratory are derived from the cognitive work analysis. To anticipate, further research on this approach to the design of collaboratories will show how the preceding analysis is likely to lead to a novel theoretical framework, called Ecological Collaborative Information Systems (ECIS), required for the design of collaboratories. The intention is to illustrate how the general principles of ECIS can be instantiated to develop a concrete design product: A crossdisciplinary and cross-cultural collaboratory to support customer service and professional research in archives. A web based Collaboratory Numerous valuable historic and cultural films and their sources are scattered in various national archives. Knowledge and usage of the multinational film material are severely impeded by access problems. To fully exploit the cultural film heritage internationally, a high degree of cross-disciplinary and international collaboration among professionals working with the film media is required. The Collaboratory for Annotation, Indexing and Retrieval of Digitized Historical Archive Material (Collate) is intended to foster and support collaboration on research, cultural mediation and preservation of films through a distributed multimedia repository. The collaboratory will provide webbased tools and interfac..
Interim research assessment 2003-2005 - Computer Science
This report primarily serves as a source of information for the 2007 Interim Research Assessment Committee for Computer Science at the three technical universities in the Netherlands. The report also provides information for others interested in our research activities
Recommended from our members
Fast embedding for image classification & retrieval and its application to the hostel industry
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonContent-based image classification and retrieval are the automatic processes of taking
an unseen image input and extracting its features representing the input image. Then,
for the classification task, this mathematically measured input is categorized according
to established criteria in the server and consequently shows the output as a result. On
the other hand, for the retrieval task, the extracted features of an unseen query image
are sent to the server to search for the most visually similar images to a given image
and retrieve these images as a result. Despite image features could be represented
by classical features, artificial intelligence-based features, Convolutional Neural
Networks (CNN) to be precise, have become powerful tools in the field. Nonetheless,
the high dimensional CNN features have been a challenge in particular for applications
on mobile or Internet of Things devices. Therefore, in this thesis, several fast
embeddings are explored and proposed to overcome the constraints of low memory,
bandwidth, and power. Furthermore, the first hostel image database is created with
three datasets, hostel image dataset containing 13,908 interior and exterior images of
hostels across the world, and Hostels-900 dataset and Hostels-2K dataset containing
972 images and 2,380 images, respectively, of 20 London hostel buildings. The results
demonstrate that the proposed fast embeddings such as the application of GHM-Rand
operator, GHM-Fix operator, and binary feature vectors are able to outperform or give
competitive results to those state-of-the-art methods with a lot less computational
resource. Additionally, the findings from a ten-year literature review of CBIR study in
the tourism industry could picturize the relevant research activities in the past decade
which are not only beneficial to the hostel industry or tourism sector but also to the
computer science and engineering research communities for the potential real-life
applications of the existing and developing technologies in the field
- β¦