301 research outputs found

    Robust short clip representation and fast search through large video collections

    Get PDF
    Master'sMASTER OF ENGINEERIN

    Data Management for Dynamic Multimedia Analytics and Retrieval

    Get PDF
    Multimedia data in its various manifestations poses a unique challenge from a data storage and data management perspective, especially if search, analysis and analytics in large data corpora is considered. The inherently unstructured nature of the data itself and the curse of dimensionality that afflicts the representations we typically work with in its stead are cause for a broad range of issues that require sophisticated solutions at different levels. This has given rise to a huge corpus of research that puts focus on techniques that allow for effective and efficient multimedia search and exploration. Many of these contributions have led to an array of purpose-built, multimedia search systems. However, recent progress in multimedia analytics and interactive multimedia retrieval, has demonstrated that several of the assumptions usually made for such multimedia search workloads do not hold once a session has a human user in the loop. Firstly, many of the required query operations cannot be expressed by mere similarity search and since the concrete requirement cannot always be anticipated, one needs a flexible and adaptable data management and query framework. Secondly, the widespread notion of staticity of data collections does not hold if one considers analytics workloads, whose purpose is to produce and store new insights and information. And finally, it is impossible even for an expert user to specify exactly how a data management system should produce and arrive at the desired outcomes of the potentially many different queries. Guided by these shortcomings and motivated by the fact that similar questions have once been answered for structured data in classical database research, this Thesis presents three contributions that seek to mitigate the aforementioned issues. We present a query model that generalises the notion of proximity-based query operations and formalises the connection between those queries and high-dimensional indexing. We complement this by a cost-model that makes the often implicit trade-off between query execution speed and results quality transparent to the system and the user. And we describe a model for the transactional and durable maintenance of high-dimensional index structures. All contributions are implemented in the open-source multimedia database system Cottontail DB, on top of which we present an evaluation that demonstrates the effectiveness of the proposed models. We conclude by discussing avenues for future research in the quest for converging the fields of databases on the one hand and (interactive) multimedia retrieval and analytics on the other

    Signal processing methods for beat tracking, music segmentation, and audio retrieval

    Get PDF
    The goal of music information retrieval (MIR) is to develop novel strategies and techniques for organizing, exploring, accessing, and understanding music data in an efficient manner. The conversion of waveform-based audio data into semantically meaningful feature representations by the use of digital signal processing techniques is at the center of MIR and constitutes a difficult field of research because of the complexity and diversity of music signals. In this thesis, we introduce novel signal processing methods that allow for extracting musically meaningful information from audio signals. As main strategy, we exploit musical knowledge about the signals\u27 properties to derive feature representations that show a significant degree of robustness against musical variations but still exhibit a high musical expressiveness. We apply this general strategy to three different areas of MIR: Firstly, we introduce novel techniques for extracting tempo and beat information, where we particularly consider challenging music with changing tempo and soft note onsets. Secondly, we present novel algorithms for the automated segmentation and analysis of folk song field recordings, where one has to cope with significant fluctuations in intonation and tempo as well as recording artifacts. Thirdly, we explore a cross-version approach to content-based music retrieval based on the query-by-example paradigm. In all three areas, we focus on application scenarios where strong musical variations make the extraction of musically meaningful information a challenging task.Ziel der automatisierten Musikverarbeitung ist die Entwicklung neuer Strategien und Techniken zur effizienten Organisation großer Musiksammlungen. Ein Schwerpunkt liegt in der Anwendung von Methoden der digitalen Signalverarbeitung zur Umwandlung von Audiosignalen in musikalisch aussagekräftige Merkmalsdarstellungen. Große Herausforderungen bei dieser Aufgabe ergeben sich aus der Komplexität und Vielschichtigkeit der Musiksignale. In dieser Arbeit werden neuartige Methoden vorgestellt, mit deren Hilfe musikalisch interpretierbare Information aus Musiksignalen extrahiert werden kann. Hierbei besteht eine grundlegende Strategie in der konsequenten Ausnutzung musikalischen Vorwissens, um Merkmalsdarstellungen abzuleiten die zum einen ein hohes Maß an Robustheit gegenüber musikalischen Variationen und zum anderen eine hohe musikalische Ausdruckskraft besitzen. Dieses Prinzip wenden wir auf drei verschieden Aufgabenstellungen an: Erstens stellen wir neuartige Ansätze zur Extraktion von Tempo- und Beat-Information aus Audiosignalen vor, die insbesondere auf anspruchsvolle Szenarien mit wechselnden Tempo und weichen Notenanfängen angewendet werden. Zweitens tragen wir mit neuartigen Algorithmen zur Segmentierung und Analyse von Feldaufnahmen von Volksliedern unter Vorliegen großer Intonationsschwankungen bei. Drittens entwickeln wir effiziente Verfahren zur inhaltsbasierten Suche in großen Datenbeständen mit dem Ziel, verschiedene Interpretationen eines Musikstückes zu detektieren. In allen betrachteten Szenarien richten wir unser Augenmerk insbesondere auf die Fälle in denen auf Grund erheblicher musikalischer Variationen die Extraktion musikalisch aussagekräftiger Informationen eine große Herausforderung darstellt

    Joint Image-Text Hashing for Fast Large-Scale Cross-Media Retrieval Using Self-Supervised Deep Learning

    Get PDF
    Recent years have witnessed the promising future of hashing in the industrial applications for fast similarity retrieval. In this paper, we propose a novel supervised hashing method for large-scale cross-media search, termed Self-Supervised Deep Multimodal Hashing (SSDMH), which learns unified hash codes as well as deep hash functions for different modalities in a self-supervised manner. With the proposed regularized binary latent model, unified binary codes can be solved directly without relaxation strategy while retaining the neighborhood structures by the graph regularization term. Moreover, we propose a new discrete optimization solution, termed as Binary Gradient Descent, which aims at improving the optimization efficiency towards real-time operation. Extensive experiments on three benchmark datasets demonstrate the superiority of SSDMH over state-of-the-art cross-media hashing approaches

    Thirty Years of Machine Learning: The Road to Pareto-Optimal Wireless Networks

    Full text link
    Future wireless networks have a substantial potential in terms of supporting a broad range of complex compelling applications both in military and civilian fields, where the users are able to enjoy high-rate, low-latency, low-cost and reliable information services. Achieving this ambitious goal requires new radio techniques for adaptive learning and intelligent decision making because of the complex heterogeneous nature of the network structures and wireless services. Machine learning (ML) algorithms have great success in supporting big data analytics, efficient parameter estimation and interactive decision making. Hence, in this article, we review the thirty-year history of ML by elaborating on supervised learning, unsupervised learning, reinforcement learning and deep learning. Furthermore, we investigate their employment in the compelling applications of wireless networks, including heterogeneous networks (HetNets), cognitive radios (CR), Internet of things (IoT), machine to machine networks (M2M), and so on. This article aims for assisting the readers in clarifying the motivation and methodology of the various ML algorithms, so as to invoke them for hitherto unexplored services as well as scenarios of future wireless networks.Comment: 46 pages, 22 fig

    A Tale of Two Approaches: Comparing Top-Down and Bottom-Up Strategies for Analyzing and Visualizing High-Dimensional Data

    Get PDF
    The proliferation of high-throughput and sensory technologies in various fields has led to a considerable increase in data volume, complexity, and diversity. Traditional data storage, analysis, and visualization methods are struggling to keep pace with the growth of modern data sets, necessitating innovative approaches to overcome the challenges of managing, analyzing, and visualizing data across various disciplines. One such approach is utilizing novel storage media, such as deoxyribonucleic acid~(DNA), which presents efficient, stable, compact, and energy-saving storage option. Researchers are exploring the potential use of DNA as a storage medium for long-term storage of significant cultural and scientific materials. In addition to novel storage media, scientists are also focussing on developing new techniques that can integrate multiple data modalities and leverage machine learning algorithms to identify complex relationships and patterns in vast data sets. These newly-developed data management and analysis approaches have the potential to unlock previously unknown insights into various phenomena and to facilitate more effective translation of basic research findings to practical and clinical applications. Addressing these challenges necessitates different problem-solving approaches. Researchers are developing novel tools and techniques that require different viewpoints. Top-down and bottom-up approaches are essential techniques that offer valuable perspectives for managing, analyzing, and visualizing complex high-dimensional multi-modal data sets. This cumulative dissertation explores the challenges associated with handling such data and highlights top-down, bottom-up, and integrated approaches that are being developed to manage, analyze, and visualize this data. The work is conceptualized in two parts, each reflecting the two problem-solving approaches and their uses in published studies. The proposed work showcases the importance of understanding both approaches, the steps of reasoning about the problem within them, and their concretization and application in various domains

    A Machine Learning Approach to Indoor Localization Data Mining

    Get PDF
    Indoor positioning systems are increasingly commonplace in various environments and produce large quantities of data. They are used in industrial applications, robotics, asset and employee tracking just to name a few use cases. The growing amount of data and the accelerating progress of machine learning opens up many new possibilities for analyzing this data in ways that were not conceivable or relevant before. This paper introduces connected concepts and implementations to answer question how this data can be utilized. Data gathered in this thesis originates from an indoor positioning system deployed in retail environment, but the discussed methods can be applied generally. The issue will be approached by first introducing the concept of machine learning and more generally, artificial intelligence, and how they work on a general level. A deeper dive is done to subfields and algorithms that are relevant to the data mining task at hand. Indoor positioning system basics are also shortly discussed to create a base understanding on the realistic capabilities and constraints that these kinds of systems encase. These methods and previous knowledge from literature are put to test with the freshly gathered data. An algorithm based on existing example from literature was tested and improved upon with the new data. A novel method to cluster and classify movement patterns was introduced, utilizing deep learning to create embedded representations of the trajectories in a more complex learning pipeline. This type of learning is often referred to as deep clustering. The results are promising and both of the methods produce useful high level representations of the complex dataset that can help a human operator to discern the relevant patterns from raw data and to be used as an input for subsequent supervised and unsupervised learning steps. Several factors related to optimizing the learning pipeline, such as regularization were also researched and the results presented as visualizations. The research found that pipeline consisting of CNN-autoencoder followed by a classic clustering algorithm such as DBSCAN produces useful results in the form of trajectory clusters. Regularization such as L1 regression improves this performance. The research done in this paper presents useful algorithms for processing raw, noisy localization data from indoor environments that can be used for further implementations in both industrial applications and academia

    Improving Data Management and Data Movement Efficiency in Hybrid Storage Systems

    Get PDF
    University of Minnesota Ph.D. dissertation.July 2017. Major: Computer Science. Advisor: David Du. 1 computer file (PDF); ix, 116 pages.In the big data era, large volumes of data being continuously generated drive the emergence of high performance large capacity storage systems. To reduce the total cost of ownership, storage systems are built in a more composite way with many different types of emerging storage technologies/devices including Storage Class Memory (SCM), Solid State Drives (SSD), Shingle Magnetic Recording (SMR), Hard Disk Drives (HDD), and even across off-premise cloud storage. To make better utilization of each type of storage, industries have provided multi-tier storage through dynamically placing hot data in the faster tiers and cold data in the slower tiers. Data movement happens between devices on one single device and as well as between devices connected via various networks. Toward improving data management and data movement efficiency in such hybrid storage systems, this work makes the following contributions: To bridge the giant semantic gap between applications and modern storage systems, passing a piece of tiny and useful information (I/O access hints) from upper layers to the block storage layer may greatly improve application performance or ease data management in heterogeneous storage systems. We present and develop a generic and flexible framework, called HintStor, to execute and evaluate various I/O access hints on heterogeneous storage systems with minor modifications to the kernel and applications. The design of HintStor contains a new application/user level interface, a file system plugin and a block storage data manager. With HintStor, storage systems composed of various storage devices can perform pre-devised data placement, space reallocation and data migration polices assisted by the added access hints. Each storage device/technology has its own unique price-performance tradeoffs and idiosyncrasies with respect to workload characteristics they prefer to support. To explore the internal access patterns and thus efficiently place data on storage systems with fully connected (i.e., data can move from one device to any other device instead of moving tier by tier) differential pools (each pool consists of storage devices of a particular type), we propose a chunk-level storage-aware workload analyzer framework, simplified as ChewAnalyzer. With ChewAnalzyer, the storage manager can adequately distribute and move the data chunks across different storage pools. To reduce the duplicate content transferred between local storage devices and devices in remote data centers, an inline Network Redundancy Elimination (NRE) process with Content-Defined Chunking (CDC) policy can obtain a higher Redundancy Elimination (RE) ratio but may suffer from a considerably higher computational requirement than fixed-size chunking. We build an inline NRE appliance which incorporates an improved FPGA based scheme to speed up CDC processing. To efficiently utilize the hardware resources, the whole NRE process is handled by a Virtualized NRE (VNRE) controller. The uniqueness of this VNRE that we developed lies in its ability to exploit the redundancy patterns of different TCP flows and customize the chunking process to achieve a higher RE ratio

    Technologies for Reusing Text from the Web

    Get PDF
    Texts from the web can be reused individually or in large quantities. The former is called text reuse and the latter language reuse. We first present a comprehensive overview of the different ways in which text and language is reused today, and how exactly information retrieval technologies can be applied in this respect. The remainder of the thesis then deals with specific retrieval tasks. In general, our contributions consist of models and algorithms, their evaluation, and for that purpose, large-scale corpus construction. The thesis divides into two parts. The first part introduces technologies for text reuse detection, and our contributions are as follows: (1) A unified view of projecting-based and embedding-based fingerprinting for near-duplicate detection and the first time evaluation of fingerprint algorithms on Wikipedia revision histories as a new, large-scale corpus of near-duplicates. (2) A new retrieval model for the quantification of cross-language text similarity, which gets by without parallel corpora. We have evaluated the model in comparison to other models on many different pairs of languages. (3) An evaluation framework for text reuse and particularly plagiarism detectors, which consists of tailored detection performance measures and a large-scale corpus of automatically generated and manually written plagiarism cases. The latter have been obtained via crowdsourcing. This framework has been successfully applied to evaluate many different state-of-the-art plagiarism detection approaches within three international evaluation competitions. The second part introduces technologies that solve three retrieval tasks based on language reuse, and our contributions are as follows: (4) A new model for the comparison of textual and non-textual web items across media, which exploits web comments as a source of information about the topic of an item. In this connection, we identify web comments as a largely neglected information source and introduce the rationale of comment retrieval. (5) Two new algorithms for query segmentation, which exploit web n-grams and Wikipedia as a means of discerning the user intent of a keyword query. Moreover, we crowdsource a new corpus for the evaluation of query segmentation which surpasses existing corpora by two orders of magnitude. (6) A new writing assistance tool called Netspeak, which is a search engine for commonly used language. Netspeak indexes the web in the form of web n-grams as a source of writing examples and implements a wildcard query processor on top of it.Texte aus dem Web können einzeln oder in großen Mengen wiederverwendet werden. Ersteres wird Textwiederverwendung und letzteres Sprachwiederverwendung genannt. Zunächst geben wir einen ausführlichen Überblick darüber, auf welche Weise Text und Sprache heutzutage wiederverwendet und wie Technologien des Information Retrieval in diesem Zusammenhang angewendet werden können. In der übrigen Arbeit werden dann spezifische Retrievalaufgaben behandelt. Unsere Beiträge bestehen dabei aus Modellen und Algorithmen, ihrer empirischen Auswertung und der Konstruktion von großen Korpora hierfür. Die Dissertation ist in zwei Teile gegliedert. Im ersten Teil präsentieren wir Technologien zur Erkennung von Textwiederverwendungen und leisten folgende Beiträge: (1) Ein Überblick über projektionsbasierte- und einbettungsbasierte Fingerprinting-Verfahren für die Erkennung nahezu identischer Texte, sowie die erstmalige Evaluierung einer Reihe solcher Verfahren auf den Revisionshistorien der Wikipedia. (2) Ein neues Modell zum sprachübergreifenden, inhaltlichen Vergleich von Texten. Das Modell basiert auf einem mehrsprachigen Korpus bestehend aus Pärchen themenverwandter Texte, wie zum Beispiel der Wikipedia. Wir vergleichen das Modell in mehreren Sprachen mit herkömmlichen Modellen. (3) Eine Evaluierungsumgebung für Algorithmen zur Plagiaterkennung. Die Umgebung besteht aus Maßen, die die Güte der Erkennung eines Algorithmus' quantifizieren, und einem großen Korpus von Plagiaten. Die Plagiate wurden automatisch generiert sowie mit Hilfe von Crowdsourcing manuell erstellt. Darüber hinaus haben wir zwei Workshops veranstaltet, in denen unsere Evaluierungsumgebung erfolgreich zur Evaluierung aktueller Plagiaterkennungsalgorithmen eingesetzt wurde. Im zweiten Teil präsentieren wir auf Sprachwiederverwendung basierende Technologien für drei verschiedene Retrievalaufgaben und leisten folgende Beiträge: (4) Ein neues Modell zum medienübergreifenden, inhaltlichen Vergleich von Objekten aus dem Web. Das Modell basiert auf der Auswertung der zu einem Objekt vorliegenden Kommentare. In diesem Zusammenhang identifizieren wir Webkommentare als eine in der Forschung bislang vernachlässigte Informationsquelle und stellen die Grundlagen des Kommentarretrievals vor. (5) Zwei neue Algorithmen zur Segmentierung von Websuchanfragen. Die Algorithmen nutzen Web n-Gramme sowie Wikipedia, um die Intention des Suchenden in einer Suchanfrage festzustellen. Darüber hinaus haben wir mittels Crowdsourcing ein neues Evaluierungskorpus erstellt, das zwei Größenordnungen größer ist als bisherige Korpora. (6) Eine neuartige Suchmaschine, genannt Netspeak, die die Suche nach gebräuchlicher Sprache ermöglicht. Netspeak indiziert das Web als Quelle für gebräuchliche Sprache in der Form von n-Grammen und implementiert eine Wildcardsuche darauf
    corecore