85,242 research outputs found
TopSig: Topology Preserving Document Signatures
Performance comparisons between File Signatures and Inverted Files for text
retrieval have previously shown several significant shortcomings of file
signatures relative to inverted files. The inverted file approach underpins
most state-of-the-art search engine algorithms, such as Language and
Probabilistic models. It has been widely accepted that traditional file
signatures are inferior alternatives to inverted files. This paper describes
TopSig, a new approach to the construction of file signatures. Many advances in
semantic hashing and dimensionality reduction have been made in recent times,
but these were not so far linked to general purpose, signature file based,
search engines. This paper introduces a different signature file approach that
builds upon and extends these recent advances. We are able to demonstrate
significant improvements in the performance of signature file based indexing
and retrieval, performance that is comparable to that of state of the art
inverted file based systems, including Language models and BM25. These findings
suggest that file signatures offer a viable alternative to inverted files in
suitable settings and from the theoretical perspective it positions the file
signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201
Automatic semantic video annotation in wide domain videos based on similarity and commonsense knowledgebases
In this paper, we introduce a novel framework for automatic Semantic Video Annotation. As this framework detects possible events occurring in video clips, it forms the annotating base of video search engine. To achieve this purpose, the system has to able to operate on uncontrolled wide-domain videos. Thus, all layers have to be based on generic features.
This framework aims to bridge the "semantic gap", which is the difference between the low-level visual features and the human's perception, by finding videos with similar visual events, then analyzing their free text annotation to find a common area then to decide the best description for this new video using commonsense knowledgebases.
Experiments were performed on wide-domain video clips from the TRECVID 2005 BBC rush standard database. Results from these experiments show promising integrity between those two layers in order to find expressing annotations for the input video. These results were evaluated based on retrieval performance
Optical Music Recognition with Convolutional Sequence-to-Sequence Models
Optical Music Recognition (OMR) is an important technology within Music
Information Retrieval. Deep learning models show promising results on OMR
tasks, but symbol-level annotated data sets of sufficient size to train such
models are not available and difficult to develop. We present a deep learning
architecture called a Convolutional Sequence-to-Sequence model to both move
towards an end-to-end trainable OMR pipeline, and apply a learning process that
trains on full sentences of sheet music instead of individually labeled
symbols. The model is trained and evaluated on a human generated data set, with
various image augmentations based on real-world scenarios. This data set is the
first publicly available set in OMR research with sufficient size to train and
evaluate deep learning models. With the introduced augmentations a pitch
recognition accuracy of 81% and a duration accuracy of 94% is achieved,
resulting in a note level accuracy of 80%. Finally, the model is compared to
commercially available methods, showing a large improvements over these
applications.Comment: ISMIR 201
Curating E-Mails; A life-cycle approach to the management and preservation of e-mail messages
E-mail forms the backbone of communications in many modern institutions and organisations and is a valuable type of organisational, cultural, and historical record. Successful management and preservation of valuable e-mail messages and collections is therefore vital if organisational accountability is to be achieved and historical or cultural memory retained for the future. This requires attention by all stakeholders across the entire life-cycle of the e-mail records.
This instalment of the Digital Curation Manual reports on the several issues involved in managing and curating e-mail messages for both current and future use. Although there is no 'one-size-fits-all' solution, this instalment outlines a generic framework for e-mail curation and preservation, provides a summary of current approaches, and addresses the technical, organisational and cultural challenges to successful e-mail management and longer-term curation.
- …