59 research outputs found

    Algebraic Multimedia: Theory and Implementation

    Get PDF
    Storage, processing, and presentation of multimedia data, such as images, video, audio, or multimedia presentations, has become an important area of computer science. The goal of this dissertation is to formalize access methods to multimedia data by developing algebras that operate on PowerPoint presentations, video, and audio as in the case of relational algebra operating on tabular data. This dissertation also proposes ways to create summaries of multimedia data

    Towards Context-free Information Importance Estimation

    Get PDF
    The amount of information contained in heterogeneous text documents such as news articles, blogs, social media posts, scientific articles, discussion forums, and microblogging platforms is already huge and is going to increase further. It is not possible for humans to cope with this flood of information, so that important information can neither be found nor be utilized. This situation is unfortunate since information is the key driver in many areas of society in the present Information Age. Hence, developing automatic means that can assist people to handle the information overload is crucial. Developing methods for automatic estimation of information importance is an essential step towards this goal. The guiding hypothesis of this work is that prior methods for automatic information importance estimation are inherently limited because they are based on merely correlated signals that are, however, not causally linked with information importance. To resolve this issue, we lay in this work the foundations for a fundamentally new approach for importance estimation. The key idea of context-free information importance estimation is to equip machine learning models with world knowledge so that they can estimate information importance based on causal reasons. In the first part of this work, we lay the theoretical foundations for context-free information importance estimation. First, we discuss how the abstract concept of information importance can be formally defined. So far, a formal definition of this concept is missing in the research community. We close this gap by discussing two information importance definitions, which equate the importance of information with its impact on the behavior and the impact on the course of life of the information recipients, respectively. Second, we discuss how information importance estimation abilities can be assessed. Usually, this is done by performing automatic summarization of text documents. However, we find that this approach is not ideal. Instead, we propose to consider ranking, regression, and preference prediction tasks as alternatives in future work. Third, we deduce context-free information importance estimation as a logical consequence of the previously introduced importance definitions. We find that reliable importance estimation, in particular for heterogeneous text documents, is only possible with context-free methods. In the second part, we develop the first machine learning models based on the idea of context-free information importance estimation. To this end, we first tackle the lack of suited datasets that are required to train and test machine learning models. In particular, large and heterogeneous datasets to investigate automatic summarization of multiple source documents are missing, because their construction is complicated and costly. To solve this problem, we present a simple and cost-efficient corpus construction approach and demonstrate its applicability by creating new multi-document summarization datasets. Second, we develop a new machine learning approach for context-free information importance estimation, implement a concrete realization, and demonstrate its advantages over contextual importance estimators. Third, we develop a new method to evaluate automatic summarization methods. Previous works are based on expensive reference summaries and unreliable semantic comparisons of text documents. On the contrary, our approach uses cheap pairwise preference annotations and only much simpler sentence-level similarity estimation. This work lays the foundations for context-free information importance estimation. We hope that future research will explore if this fundamentally new type of information importance estimation can eventually lead to human-level information importance estimation abilities

    On representation learning for generative models of text

    Full text link
    Cette thèse fait des petits pas dans la construction et la compréhension des systèmes d'apprentissage des représentations neuronales et des modèles génératifs pour le traitement du langage naturel. Il est présenté comme une thèse par article qui contient quatre travaux. Dans le premier article, nous montrons que l'apprentissage multi-tâches peut être utilisé pour combiner les biais inductifs de plusieurs tâches d'apprentissage auto-supervisées et supervisées pour apprendre des représentations de phrases distribuées de longueur fixe à usage général qui obtiennent des résultats solides sur les tâches d'apprentissage par transfert en aval sans tout modèle de réglage fin. Le deuxième article s'appuie sur le premier et présente un modèle génératif en deux étapes pour le texte qui modélise la distribution des représentations de phrases pour produire de nouveaux plongements de phrases qui servent de "contour neuronal" de haut niveau qui est reconstruit en mots avec un récurrent neuronal autorégressif conditionnel décodeur. Le troisième article étudie la nécessité de représentations démêlées pour la génération de texte contrôlable. Une grande partie des systèmes de génération de texte contrôlables reposent sur l'idée que le contrôle d'un attribut (ou d'un style) particulier nécessite la construction de représentations dissociées qui séparent le contenu et le style. Nous démontrons que les représentations produites dans des travaux antérieurs qui utilisent la formation contradictoire du domaine ne sont pas dissociées dans la pratique. Nous présentons ensuite une approche qui ne vise pas à apprendre des représentations démêlées et montrons qu'elle permet d'obtenir des résultats nettement meilleurs que les travaux antérieurs. Dans le quatrième article, nous concevons des modèles de langage de transformateur qui apprennent les représentations à plusieurs échelles de temps et montrent que ceux-ci peuvent aider à réduire l'empreinte mémoire importante de ces modèles. Il présente trois architectures multi-échelles différentes qui présentent des compromis favorables entre la perplexité et l'empreinte mémoire.This thesis takes baby steps in building and understanding neural representation learning systems and generative models for natural language processing. It is presented as a thesis by article that contains four pieces of work. In the first article, we show that multi-task learning can be used to combine the inductive biases of several self-supervised and supervised learning tasks to learn general-purpose fixed-length distributed sentence representations that achieve strong results on downstream transfer learning tasks without any model fine-tuning. The second article builds on the first and presents a two-step generative model for text that models the distribution of sentence representations to produce novel sentence embeddings that serves as a high level ``neural outline'' that is reconstructed to words with a conditional autoregressive RNN decoder. The third article studies the necessity of disentangled representations for controllable text generation. A large fraction of controllable text generation systems rely on the idea that control over a particular attribute (or style) requires building disentangled representations that separate content and style. We demonstrate that representations produced in previous work that uses domain adversarial training are not disentangled in practice. We then present an approach that does not aim to learn disentangled representations and show that it achieves significantly better results than prior work. In the fourth article, we design transformer language models that learn representations at multiple time scales and show that these can help address the large memory footprint these models typically have. It presents three different multi-scale architectures that exhibit favorable perplexity vs memory footprint trade-offs

    Exploring Hidden Coherent Feature Groups and Temporal Semantics for Multimedia Big Data Analysis

    Get PDF
    Thanks to the advanced technologies and social networks that allow the data to be widely shared among the Internet, there is an explosion of pervasive multimedia data, generating high demands of multimedia services and applications in various areas for people to easily access and manage multimedia data. Towards such demands, multimedia big data analysis has become an emerging hot topic in both industry and academia, which ranges from basic infrastructure, management, search, and mining to security, privacy, and applications. Within the scope of this dissertation, a multimedia big data analysis framework is proposed for semantic information management and retrieval with a focus on rare event detection in videos. The proposed framework is able to explore hidden semantic feature groups in multimedia data and incorporate temporal semantics, especially for video event detection. First, a hierarchical semantic data representation is presented to alleviate the semantic gap issue, and the Hidden Coherent Feature Group (HCFG) analysis method is proposed to capture the correlation between features and separate the original feature set into semantic groups, seamlessly integrating multimedia data in multiple modalities. Next, an Importance Factor based Temporal Multiple Correspondence Analysis (i.e., IF-TMCA) approach is presented for effective event detection. Specifically, the HCFG algorithm is integrated with the Hierarchical Information Gain Analysis (HIGA) method to generate the Importance Factor (IF) for producing the initial detection results. Then, the TMCA algorithm is proposed to efficiently incorporate temporal semantics for re-ranking and improving the final performance. At last, a sampling-based ensemble learning mechanism is applied to further accommodate the imbalanced datasets. In addition to the multimedia semantic representation and class imbalance problems, lack of organization is another critical issue for multimedia big data analysis. In this framework, an affinity propagation-based summarization method is also proposed to transform the unorganized data into a better structure with clean and well-organized information. The whole framework has been thoroughly evaluated across multiple domains, such as soccer goal event detection and disaster information management

    Sequence queries on temporal graphs

    Get PDF
    Graphs that evolve over time are called temporal graphs. They can be used to describe and represent real-world networks, including transportation networks, social networks, and communication networks, with higher fidelity and accuracy. However, research is still limited on how to manage large scale temporal graphs and execute queries over these graphs efficiently and effectively. This thesis investigates the problems of temporal graph data management related to node and edge sequence queries. In temporal graphs, nodes and edges can evolve over time. Therefore, sequence queries on nodes and edges can be key components in managing temporal graphs. In this thesis, the node sequence query decomposes into two parts: graph node similarity and subsequence matching. For node similarity, this thesis proposes a modified tree edit distance that is metric and polynomially computable and has a natural, intuitive interpretation. Note that the proposed node similarity works even for inter-graph nodes and therefore can be used for graph de-anonymization, network transfer learning, and cross-network mining, among other tasks. The subsequence matching query proposed in this thesis is a framework that can be adopted to index generic sequence and time-series data, including trajectory data and even DNA sequences for subsequence retrieval. For edge sequence queries, this thesis proposes an efficient storage and optimized indexing technique that allows for efficient retrieval of temporal subgraphs that satisfy certain temporal predicates. For this problem, this thesis develops a lightweight data management engine prototype that can support time-sensitive temporal graph analytics efficiently even on a single PC

    Semantischer Schutz und Personalisierung von Videoinhalten. PIAF: MPEG-kompatibles Multimedia-Adaptierungs-Framework zur Bewahrung der vom Nutzer wahrgenommenen Qualität.

    Get PDF
    UME is the notion that a user should receive informative adapted content anytime and anywhere. Personalization of videos, which adapts their content according to user preferences, is a vital aspect of achieving the UME vision. User preferences can be translated into several types of constraints that must be considered by the adaptation process, including semantic constraints directly related to the content of the video. To deal with these semantic constraints, a fine-grained adaptation, which can go down to the level of video objects, is necessary. The overall goal of this adaptation process is to provide users with adapted content that maximizes their Quality of Experience (QoE). This QoE depends at the same time on the level of the user's satisfaction in perceiving the adapted content, the amount of knowledge assimilated by the user, and the adaptation execution time. In video adaptation frameworks, the Adaptation Decision Taking Engine (ADTE), which can be considered as the "brain" of the adaptation engine, is responsible for achieving this goal. The task of the ADTE is challenging as many adaptation operations can satisfy the same semantic constraint, and thus arising in several feasible adaptation plans. Indeed, for each entity undergoing the adaptation process, the ADTE must decide on the adequate adaptation operator that satisfies the user's preferences while maximizing his/her quality of experience. The first challenge to achieve in this is to objectively measure the quality of the adapted video, taking into consideration the multiple aspects of the QoE. The second challenge is to assess beforehand this quality in order to choose the most appropriate adaptation plan among all possible plans. The third challenge is to resolve conflicting or overlapping semantic constraints, in particular conflicts arising from constraints expressed by owner's intellectual property rights about the modification of the content. In this thesis, we tackled the aforementioned challenges by proposing a Utility Function (UF), which integrates semantic concerns with user's perceptual considerations. This UF models the relationships among adaptation operations, user preferences, and the quality of the video content. We integrated this UF into an ADTE. This ADTE performs a multi-level piecewise reasoning to choose the adaptation plan that maximizes the user-perceived quality. Furthermore, we included intellectual property rights in the adaptation process. Thereby, we modeled content owner constraints. We dealt with the problem of conflicting user and owner constraints by mapping it to a known optimization problem. Moreover, we developed the SVCAT, which produces structural and high-level semantic annotation according to an original object-based video content model. We modeled as well the user's preferences proposing extensions to MPEG-7 and MPEG-21. All the developed contributions were carried out as part of a coherent framework called PIAF. PIAF is a complete modular MPEG standard compliant framework that covers the whole process of semantic video adaptation. We validated this research with qualitative and quantitative evaluations, which assess the performance and the efficiency of the proposed adaptation decision-taking engine within PIAF. The experimental results show that the proposed UF has a high correlation with subjective video quality evaluation.Der Begriff "Universal Multimedia Experience" (UME) beschreibt die Vision, dass ein Nutzer nach seinen individuellen Vorlieben zugeschnittene Videoinhalte konsumieren kann. In dieser Dissertation werden im UME nun auch semantische Constraints berücksichtigt, welche direkt mit der Konsumierung der Videoinhalte verbunden sind. Dabei soll die Qualität der Videoerfahrung für den Nutzer maximiert werden. Diese Qualität ist in der Dissertation durch die Benutzerzufriedenheit bei der Wahrnehmung der Veränderung der Videos repräsentiert. Die Veränderung der Videos wird durch eine Videoadaptierung erzeugt, z.B. durch die Löschung oder Veränderung von Szenen, Objekten, welche einem semantischen Constraints nicht entsprechen. Kern der Videoadaptierung ist die "Adaptation Decision Taking Engine" (ADTE). Sie bestimmt die Operatoren, welche die semantischen Constraints auflösen, und berechnet dann mögliche Adaptierungspläne, die auf dem Video angewandt werden sollen. Weiterhin muss die ADTE für jeden Adaptierungsschritt anhand der Operatoren bestimmen, wie die Vorlieben des Nutzers berücksichtigt werden können. Die zweite Herausforderung ist die Beurteilung und Maximierung der Qualität eines adaptierten Videos. Die dritte Herausforderung ist die Berücksichtigung sich widersprechender semantischer Constraints. Dies betrifft insbesondere solche, die mit Urheberrechten in Verbindung stehen. In dieser Dissertation werden die oben genannten Herausforderungen mit Hilfe eines "Personalized video Adaptation Framework" (PIAF) gelöst, welche auf den "Moving Picture Expert Group" (MPEG)-Standard MPEG-7 und MPEG-21 basieren. PIAF ist ein Framework, welches den gesamten Prozess der Videoadaptierung umfasst. Es modelliert den Zusammenhang zwischen den Adaptierungsoperatoren, den Vorlieben der Nutzer und der Qualität der Videos. Weiterhin wird das Problem der optimalen Auswahl eines Adaptierungsplans für die maximale Qualität der Videos untersucht. Dafür wird eine Utility Funktion (UF) definiert und in der ADTE eingesetzt, welche die semantischen Constraints mit den vom Nutzer ausgedrückten Vorlieben vereint. Weiterhin ist das "Semantic Video Content Annotation Tool" (SVCAT) entwickelt worden, um strukturelle und semantische Annotationen durchzuführen. Ebenso sind die Vorlieben der Nutzer mit MPEG-7 und MPEG-21 Deskriptoren berücksichtigt worden. Die Entwicklung dieser Software-Werkzeuge und Algorithmen ist notwendig, um ein vollständiges und modulares Framework zu erhalten. Dadurch deckt PIAF den kompletten Bereich der semantischen Videoadaptierung ab. Das ADTE ist in qualitativen und quantitativen Evaluationen validiert worden. Die Ergebnisse der Evaluation zeigen unter anderem, dass die UF im Bereich Qualität eine hohe Korrelation mit der subjektiven Wahrnehmung von ausgewählten Nutzern aufweist

    Multimedia Retrieval

    Get PDF
    • …
    corecore