Search CORE

3 research outputs found

Massiv-Parallele Algorithmen zum Laden von Daten auf Moderner Hardware

Author: Stehle Elias Johannes Berthold
Publication venue: Technische Universität München
Publication date: 24/08/2020
Field of study

While systems face an ever-growing amount of data that needs to be ingested, queried and analysed, processors are seeing only moderate improvements in sequential processing performance. This thesis addresses the fundamental shift towards increasingly parallel processors and contributes multiple massively parallel algorithms to accelerate different stages of the ingestion pipeline, such as data parsing and sorting.Systeme sehen sich mit einer stetig anwachsenden Menge an Daten konfrontiert, die geladen und analysiert, sowie Anfragen darauf bearbeitet werden müssen. Gleichzeitig nimmt die sequentielle Verarbeitungsgeschwindigkeit von Prozessoren nur noch moderat zu. Diese Arbeit adressiert den Wandel hin zu zunehmend parallelen Prozessoren und leistet mit mehreren massiv-parallelen Algorithmen einen Beitrag um unterschiedliche Phasen der Datenverarbeitung wie zum Beispiel Parsing und Sortierung zu beschleunigen

MediaTUM

Signals on Networks: Random Asynchronous and Multirate Processing, and Uncertainty Principles

Author: Teke Oguzhan
Publication venue
Publication date: 01/01/2021
Field of study

The processing of signals defined on graphs has been of interest for many years, and finds applications in a diverse set of fields such as sensor networks, social and economic networks, and biological networks. In graph signal processing applications, signals are not defined as functions on a uniform time-domain grid but they are defined as vectors indexed by the vertices of a graph, where the underlying graph is assumed to model the irregular signal domain. Although analysis of such networked models is not new (it can be traced back to the consensus problem studied more than four decades ago), such models are studied recently from the view-point of signal processing, in which the analysis is based on the "graph operator" whose eigenvectors serve as a Fourier basis for the graph of interest. With the help of graph Fourier basis, a number of topics from classical signal processing (such as sampling, reconstruction, filtering, etc.) are extended to the case of graphs. The main contribution of this thesis is to provide new directions in the field of graph signal processing and provide further extensions of topics in classical signal processing. The first part of this thesis focuses on a random and asynchronous variant of "graph shift," i.e., localized communication between neighboring nodes. Since the dynamical behavior of randomized asynchronous updates is very different from standard graph shift (i.e., state-space models), this part of the thesis focuses on the convergence and stability behavior of such random asynchronous recursions. Although non-random variants of asynchronous state recursions (possibly with non-linear updates) are well-studied problems with early results dating back to the late 60's, this thesis considers the convergence (and stability) in the statistical mean-squared sense and presents the precise conditions for the stability by drawing parallels with switching systems. It is also shown that systems exhibit unexpected behavior under randomized asynchronicity: an unstable system (in the synchronous world) may be stabilized simply by the use of randomized asynchronicity. Moreover, randomized asynchronicity may result in a lower total computational complexity in certain parameter settings. The thesis presents applications of the random asynchronous model in the context of graph signal processing including an autonomous clustering of network of agents, and a node-asynchronous communication protocol that implements a given rational filter on the graph. The second part of the thesis focuses on extensions of the following topics in classical signal processing to the case of graph: multirate processing and filter banks, discrete uncertainty principles, and energy compaction filters for optimal filter design. The thesis also considers an application to the heat diffusion over networks. Multirate systems and filter banks find many applications in signal processing theory and implementations. Despite the possibility of extending 2-channel filter banks to bipartite graphs, this thesis shows that this relation cannot be generalized to M-channel systems on M-partite graphs. As a result, the extension of classical multirate theory to graphs is nontrivial, and such extensions cannot be obtained without certain mathematical restrictions on the graph. The thesis provides the necessary conditions on the graph such that fundamental building blocks of multirate processing remain valid in the graph domain. In particular, it is shown that when the underlying graph satisfies a condition called M-block cyclic property, classical multirate theory can be extended to the graphs. The uncertainty principle is an essential mathematical concept in science and engineering, and uncertainty principles generally state that a signal cannot have an arbitrarily "short" description in the original basis and in the Fourier basis simultaneously. Based on the fact that graph signal processing proposes two different bases (i.e., vertex and the graph Fourier domains) to represent graph signals, this thesis shows that the total number of nonzero elements of a graph signal and its representation in the graph Fourier domain is lower bounded by a quantity depending on the underlying graph. The thesis also presents the necessary and sufficient condition for the existence of 2-sparse and 3-sparse eigenvectors of a connected graph. When such eigenvectors exist, the uncertainty bound is very low, tight, and independent of the global structure of the graph. The thesis also considers the classical spectral concentration problem. In the context of polynomial graph filters, the problem reduces to the polynomial concentration problem studied more generally by Slepian in the 70's. The thesis studies the asymptotic behavior of the optimal solution in the case of narrow bandwidth. Different examples of graphs are also compared in order to show that the maximum energy compaction and the optimal filter depends heavily on the graph spectrum. In the last part, the thesis considers the estimation of the starting time of a heat diffusion process from its noisy measurements when there is a single point source located on a known vertex of a graph with unknown starting time. In particular, the Cramér-Rao lower bound for the estimation problem is derived, and it is shown that for graphs with higher connectivity the problem has a larger lower bound making the estimation problem more difficult.</p

Caltech Theses and Dissertations

Approaches to implement and evaluate aggregated search

Author: Kopliku Arlind
Publication venue
Publication date: 07/12/2011
Field of study

La recherche d'information agrégée peut être vue comme un troisième paradigme de recherche d'information après la recherche d'information ordonnée (ranked retrieval) et la recherche d'information booléenne (boolean retrieval). Les deux paradigmes les plus explorés jusqu'à aujourd'hui retournent un ensemble ou une liste ordonnée de résultats. C'est à l'usager de parcourir ces ensembles/listes et d'en extraire l'information nécessaire qui peut se retrouver dans plusieurs documents. De manière alternative, la recherche d'information agrégée ne s'intéresse pas seulement à l'identification des granules (nuggets) d'information pertinents, mais aussi à l'assemblage d'une réponse agrégée contenant plusieurs éléments. Dans nos travaux, nous analysons les travaux liés à la recherche d'information agrégée selon un schéma général qui comprend 3 parties: dispatching de la requête, recherche de granules d'information et agrégation du résultat. Les approches existantes sont groupées autours de plusieurs perspectives générales telle que la recherche relationnelle, la recherche fédérée, la génération automatique de texte, etc. Ensuite, nous nous sommes focalisés sur deux pistes de recherche selon nous les plus prometteuses: (i) la recherche agrégée relationnelle et (ii) la recherche agrégée inter-verticale. * La recherche agrégée relationnelle s'intéresse aux relations entre les granules d'information pertinents qui servent à assembler la réponse agrégée. En particulier, nous nous sommes intéressés à trois types de requêtes notamment: requête attribut (ex. président de la France, PIB de l'Italie, maire de Glasgow, ...), requête instance (ex. France, Italie, Glasgow, Nokia e72, ...) et requête classe (pays, ville française, portable Nokia, ...). Pour ces requêtes qu'on appelle requêtes relationnelles nous avons proposés trois approches pour permettre la recherche de relations et l'assemblage des résultats. Nous avons d'abord mis l'accent sur la recherche d'attributs qui peut aider à répondre aux trois types de requêtes. Nous proposons une approche à large échelle capable de répondre à des nombreuses requêtes indépendamment de la classe d'appartenance. Cette approche permet l'extraction des attributs à partir des tables HTML en tenant compte de la qualité des tables et de la pertinence des attributs. Les différentes évaluations de performances effectuées prouvent son efficacité qui dépasse les méthodes de l'état de l'art. Deuxièmement, nous avons traité l'agrégation des résultats composés d'instances et d'attributs. Ce problème est intéressant pour répondre à des requêtes de type classe avec une table contenant des instances (lignes) et des attributs (colonnes). Pour garantir la qualité du résultat, nous proposons des pondérations sur les instances et les attributs promouvant ainsi les plus représentatifs. Le troisième problème traité concerne les instances de la même classe (ex. France, Italie, Allemagne, ...). Nous proposons une approche capable d'identifier massivement ces instances en exploitant les listes HTML. Toutes les approches proposées fonctionnent à l'échelle Web et sont importantes et complémentaires pour la recherche agrégée relationnelle. Enfin, nous proposons 4 prototypes d'application de recherche agrégée relationnelle. Ces derniers peuvent répondre des types de requêtes différents avec des résultats relationnels. Plus précisément, ils recherchent et assemblent des attributs, des instances, mais aussi des passages et des images dans des résultats agrégés. Un exemple est la requête ``Nokia e72" dont la réponse sera composée d'attributs (ex. prix, poids, autonomie batterie, ...), de passages (ex. description, reviews, ...) et d'images. Les résultats sont encourageants et illustrent l'utilité de la recherche agrégée relationnelle. * La recherche agrégée inter-verticale s'appuie sur plusieurs moteurs de recherche dits verticaux tel que la recherche d'image, recherche vidéo, recherche Web traditionnelle, etc. Son but principal est d'assembler des résultats provenant de toutes ces sources dans une même interface pour répondre aux besoins des utilisateurs. Les moteurs de recherche majeurs et la communauté scientifique nous offrent déjà une série d'approches. Notre contribution consiste en une étude sur l'évaluation et les avantages de ce paradigme. Plus précisément, nous comparons 4 types d'études qui simulent des situations de recherche sur un total de 100 requêtes et 9 sources différentes. Avec cette étude, nous avons identifiés clairement des avantages de la recherche agrégée inter-verticale et nous avons pu déduire de nombreux enjeux sur son évaluation. En particulier, l'évaluation traditionnelle utilisée en RI, certes la moins rapide, reste la plus réaliste. Pour conclure, nous avons proposé des différents approches et études sur deux pistes prometteuses de recherche dans le cadre de la recherche d'information agrégée. D'une côté, nous avons traité trois problèmes importants de la recherche agrégée relationnelle qui ont porté à la construction de 4 prototypes d'application avec des résultats encourageants. De l'autre côté, nous avons mis en place 4 études sur l'intérêt et l'évaluation de la recherche agrégée inter-verticale qui ont permis d'identifier les enjeux d'évaluation et les avantages du paradigme. Comme suite à long terme de ce travail, nous pouvons envisager une recherche d'information qui intègre plus de granules relationnels et plus de multimédia.Aggregated search or aggregated retrieval can be seen as a third paradigm for information retrieval following the Boolean retrieval paradigm and the ranked retrieval paradigm. In the first two, we are returned respectively sets and ranked lists of search results. It is up to the time-poor user to scroll this set/list, scan within different documents and assemble his/her information need. Alternatively, aggregated search not only aims the identification of relevant information nuggets, but also the assembly of these nuggets into a coherent answer. In this work, we present at first an analysis of related work to aggregated search which is analyzed with a general framework composed of three steps: query dispatching, nugget retrieval and result aggregation. Existing work is listed aside different related domains such as relational search, federated search, question answering, natural language generation, etc. Within the possible research directions, we have then focused on two directions we believe promise the most namely: relational aggregated search and cross-vertical aggregated search. * Relational aggregated search targets relevant information, but also relations between relevant information nuggets which are to be used to assemble reasonably the final answer. In particular, there are three types of queries which would easily benefit from this paradigm: attribute queries (e.g. president of France, GDP of Italy, major of Glasgow, ...), instance queries (e.g. France, Italy, Glasgow, Nokia e72, ...) and class queries (countries, French cities, Nokia mobile phones, ...). We call these queries as relational queries and we tackle with three important problems concerning the information retrieval and aggregation for these types of queries. First, we propose an attribute retrieval approach after arguing that attribute retrieval is one of the crucial problems to be solved. Our approach relies on the HTML tables in the Web. It is capable to identify useful and relevant tables which are used to extract relevant attributes for whatever queries. The different experimental results show that our approach is effective, it can answer many queries with high coverage and it outperforms state of the art techniques. Second, we deal with result aggregation where we are given relevant instances and attributes for a given query. The problem is particularly interesting for class queries where the final answer will be a table with many instances and attributes. To guarantee the quality of the aggregated result, we propose the use of different weights on instances and attributes to promote the most representative and important ones. The third problem we deal with concerns instances of the same class (e.g. France, Germany, Italy ... are all instances of the same class). Here, we propose an approach that can massively extract instances of the same class from HTML lists in the Web. All proposed approaches are applicable at Web-scale and they can play an important role for relational aggregated search. Finally, we propose 4 different prototype applications for relational aggregated search. They can answer different types of queries with relevant and relational information. Precisely, we not only retrieve attributes and their values, but also passages and images which are assembled into a final focused answer. An example is the query ``Nokia e72" which will be answered with attributes (e.g. price, weight, battery life ...), passages (e.g. description, reviews ...) and images. Results are encouraging and they illustrate the utility of relational aggregated search. * The second research direction that we pursued concerns cross-vertical aggregated search, which consists of assembling results from different vertical search engines (e.g. image search, video search, traditional Web search, ...) into one single interface. Here, different approaches exist in both research and industry. Our contribution concerns mostly evaluation and the interest (advantages) of this paradigm. We propose 4 different studies which simulate different search situations. Each study is tested with 100 different queries and 9 vertical sources. Here, we could clearly identify new advantages of this paradigm and we could identify different issues with evaluation setups. In particular, we observe that traditional information retrieval evaluation is not the fastest but it remains the most realistic. To conclude, we propose different studies with respect to two promising research directions. On one hand, we deal with three important problems of relational aggregated search following with real prototype applications with encouraging results. On the other hand, we have investigated on the interest and evaluation of cross-vertical aggregated search. Here, we could clearly identify some of the advantages and evaluation issues. In a long term perspective, we foresee a possible combination of these two kinds of approaches to provide relational and cross-vertical information retrieval incorporating more focus, structure and multimedia in search results

Thèses en ligne de l'Université Toulouse III - Paul Sabatier