7,785 research outputs found
Organizing the Internet
This paper examines XML and its relationships with SGML (Standardized General Markup Language) and HTML (HyperText Markup Language). It examines the importance of metatags and the XML Document Type Definition (DTD) and proposed alternatives. It looks at the differences between the two types of XML data: âvalidâ and âwell-formedâ documents
Semantic web-based document: editing and browsing in AktiveDoc
This paper presents a tool for supporting sharing and reuse of knowledge in document creation (writing) and use (reading). Semantic Web technologies are used to support the production of ontology based annotations while the document is written. Free text annotations (comments) can be added to integrate the knowledge in the document. In addition the tool uses external services (e.g. a Semantic Web harvester) to propose relevant content to writing
user, enabling easy knowledge reuse. Similar facilities are provided for readers when their task does not coincide with the authorâs one. The tool is specifically designed for Knowledge Management in organisations. In this paper we present and discuss how Semantic Web technologies are designed and integrated in the system
Examining the contributions of automatic speech transcriptions and metadata sources for searching spontaneous conversational speech
The searching spontaneous speech can be enhanced by combining automatic speech transcriptions with semantically
related metadata. An important question is what can be expected from search of such transcriptions and different
sources of related metadata in terms of retrieval effectiveness. The Cross-Language Speech Retrieval (CL-SR) track at recent CLEF workshops provides a spontaneous speech
test collection with manual and automatically derived metadata fields. Using this collection we investigate the comparative search effectiveness of individual fields comprising automated transcriptions and the available metadata. A further important question is how transcriptions and metadata should be combined for the greatest benefit to search accuracy. We compare simple field merging of individual fields with the extended BM25 model for weighted field combination (BM25F). Results indicate that BM25F can produce improved search accuracy, but that it is currently important to set its parameters suitably using a suitable training set
SEMA4A: An ontology for emergency notification systems accessibility
This is the post-print version of the final paper published in Expert Systems with Applications. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2009 Elsevier B.V.Providing alert communication in emergency situations is vital to reduce the number of victims. Reaching this goal is challenging due to usersâ diversity: people with disabilities, elderly and children, and other vulnerable groups. Notifications are critical when an emergency scenario is going to happen (e.g. a typhoon approaching) so the ability to transmit notifications to different kind of users is a crucial feature for such systems. In this work an ontology was developed by investigating different sources: accessibility guidelines, emergency response systems, communication devices and technologies, taking into account the different abilities of people to react to different alarms (e.g. mobile phone vibration as an alarm for deafblind people). We think that the proposed ontology addresses the information needs for sharing and integrating emergency notification messages over distinct emergency response information systems providing accessibility under different conditions and for different kind of users.Ministerio de EducaciĂłn y Cienci
Language-based multimedia information retrieval
This paper describes various methods and approaches for language-based multimedia information retrieval, which have been developed in the projects POP-EYE and OLIVE and which will be developed further in the MUMIS project. All of these project aim at supporting automated indexing of video material by use of human language technologies. Thus, in contrast to image or sound-based retrieval methods, where both the query language and the indexing methods build on non-linguistic data, these methods attempt to exploit advanced text retrieval technologies for the retrieval of non-textual material. While POP-EYE was building on subtitles or captions as the prime language key for disclosing video fragments, OLIVE is making use of speech recognition to automatically derive transcriptions of the sound tracks, generating time-coded linguistic elements which then serve as the basis for text-based retrieval functionality
Content-Aware DataGuides for Indexing Large Collections of XML Documents
XML is well-suited for modelling structured data with
textual content. However, most indexing approaches perform
structure and content matching independently, combining
the retrieved path and keyword occurrences in a third
step. This paper shows that retrieval in XML documents can
be accelerated significantly by processing text and structure
simultaneously during all retrieval phases. To this end,
the Content-Aware DataGuide (CADG) enhances the wellknown
DataGuide with (1) simultaneous keyword and path
matching and (2) a precomputed content/structure join. Extensive
experiments prove the CADG to be 50-90% faster
than the DataGuide for various sorts of query and document,
including difficult cases such as poorly structured
queries and recursive document paths. A new query classification
scheme identifies precise query characteristics with
a predominant influence on the performance of the individual
indices. The experiments show that the CADG is applicable
to many real-world applications, in particular large
collections of heterogeneously structured XML documents
Firsthand Opiates Abuse on Social Media: Monitoring Geospatial Patterns of Interest Through a Digital Cohort
In the last decade drug overdose deaths reached staggering proportions in the
US. Besides the raw yearly deaths count that is worrisome per se, an alarming
picture comes from the steep acceleration of such rate that increased by 21%
from 2015 to 2016. While traditional public health surveillance suffers from
its own biases and limitations, digital epidemiology offers a new lens to
extract signals from Web and Social Media that might be complementary to
official statistics. In this paper we present a computational approach to
identify a digital cohort that might provide an updated and complementary view
on the opioid crisis. We introduce an information retrieval algorithm suitable
to identify relevant subspaces of discussion on social media, for mining data
from users showing explicit interest in discussions about opioid consumption in
Reddit. Moreover, despite the pseudonymous nature of the user base, almost 1.5
million users were geolocated at the US state level, resembling the census
population distribution with a good agreement. A measure of prevalence of
interest in opiate consumption has been estimated at the state level, producing
a novel indicator with information that is not entirely encoded in the standard
surveillance. Finally, we further provide a domain specific vocabulary
containing informal lexicon and street nomenclature extracted by user-generated
content that can be used by researchers and practitioners to implement novel
digital public health surveillance methodologies for supporting policy makers
in fighting the opioid epidemic.Comment: Proceedings of the 2019 World Wide Web Conference (WWW '19
Use of Subimages in Fish Species Identification: A Qualitative Study
Many scholarly tasks involve working with subdocuments, or contextualized fine-grain information, i.e., with information that is part of some larger unit. A digital library (DL) facil- itates management, access, retrieval, and use of collections of data and metadata through services. However, most DLs do not provide infrastructure or services to support working with subdocuments. Superimposed information (SI) refers to new information that is created to reference subdocu- ments in existing information resources. We combine this idea of SI with traditional DL services, to define and develop a DL with SI (SI-DL). We explored the use of subimages and evaluated the use of a prototype SI-DL (SuperIDR) in fish species identification, a scholarly task that involves work- ing with subimages. The contexts and strategies of working with subimages in SuperIDR suggest new and enhanced sup- port (SI-DL services) for scholarly tasks that involve working with subimages, including new ways of querying and search- ing for subimages and associated information. The main contribution of our work are the insights gained from these findings of use of subimages and of SuperIDR (a prototype SI-DL), which lead to recommendations for the design of digital libraries with superimposed information
- âŠ