3,096 research outputs found

    Parallel and Flow-Based High Quality Hypergraph Partitioning

    Get PDF
    Balanced hypergraph partitioning is a classic NP-hard optimization problem that is a fundamental tool in such diverse disciplines as VLSI circuit design, route planning, sharding distributed databases, optimizing communication volume in parallel computing, and accelerating the simulation of quantum circuits. Given a hypergraph and an integer kk, the task is to divide the vertices into kk disjoint blocks with bounded size, while minimizing an objective function on the hyperedges that span multiple blocks. In this dissertation we consider the most commonly used objective, the connectivity metric, where we aim to minimize the number of different blocks connected by each hyperedge. The most successful heuristic for balanced partitioning is the multilevel approach, which consists of three phases. In the coarsening phase, vertex clusters are contracted to obtain a sequence of structurally similar but successively smaller hypergraphs. Once sufficiently small, an initial partition is computed. Lastly, the contractions are successively undone in reverse order, and an iterative improvement algorithm is employed to refine the projected partition on each level. An important aspect in designing practical heuristics for optimization problems is the trade-off between solution quality and running time. The appropriate trade-off depends on the specific application, the size of the data sets, and the computational resources available to solve the problem. Existing algorithms are either slow, sequential and offer high solution quality, or are simple, fast, easy to parallelize, and offer low quality. While this trade-off cannot be avoided entirely, our goal is to close the gaps as much as possible. We achieve this by improving the state of the art in all non-trivial areas of the trade-off landscape with only a few techniques, but employed in two different ways. Furthermore, most research on parallelization has focused on distributed memory, which neglects the greater flexibility of shared-memory algorithms and the wide availability of commodity multi-core machines. In this thesis, we therefore design and revisit fundamental techniques for each phase of the multilevel approach, and develop highly efficient shared-memory parallel implementations thereof. We consider two iterative improvement algorithms, one based on the Fiduccia-Mattheyses (FM) heuristic, and one based on label propagation. For these, we propose a variety of techniques to improve the accuracy of gains when moving vertices in parallel, as well as low-level algorithmic improvements. For coarsening, we present a parallel variant of greedy agglomerative clustering with a novel method to resolve cluster join conflicts on-the-fly. Combined with a preprocessing phase for coarsening based on community detection, a portfolio of from-scratch partitioning algorithms, as well as recursive partitioning with work-stealing, we obtain our first parallel multilevel framework. It is the fastest partitioner known, and achieves medium-high quality, beating all parallel partitioners, and is close to the highest quality sequential partitioner. Our second contribution is a parallelization of an n-level approach, where only one vertex is contracted and uncontracted on each level. This extreme approach aims at high solution quality via very fine-grained, localized refinement, but seems inherently sequential. We devise an asynchronous n-level coarsening scheme based on a hierarchical decomposition of the contractions, as well as a batch-synchronous uncoarsening, and later fully asynchronous uncoarsening. In addition, we adapt our refinement algorithms, and also use the preprocessing and portfolio. This scheme is highly scalable, and achieves the same quality as the highest quality sequential partitioner (which is based on the same components), but is of course slower than our first framework due to fine-grained uncoarsening. The last ingredient for high quality is an iterative improvement algorithm based on maximum flows. In the sequential setting, we first improve an existing idea by solving incremental maximum flow problems, which leads to smaller cuts and is faster due to engineering efforts. Subsequently, we parallelize the maximum flow algorithm and schedule refinements in parallel. Beyond the strive for highest quality, we present a deterministically parallel partitioning framework. We develop deterministic versions of the preprocessing, coarsening, and label propagation refinement. Experimentally, we demonstrate that the penalties for determinism in terms of partition quality and running time are very small. All of our claims are validated through extensive experiments, comparing our algorithms with state-of-the-art solvers on large and diverse benchmark sets. To foster further research, we make our contributions available in our open-source framework Mt-KaHyPar. While it seems inevitable, that with ever increasing problem sizes, we must transition to distributed memory algorithms, the study of shared-memory techniques is not in vain. With the multilevel approach, even the inherently slow techniques have a role to play in fast systems, as they can be employed to boost quality on coarse levels at little expense. Similarly, techniques for shared-memory parallelism are important, both as soon as a coarse graph fits into memory, and as local building blocks in the distributed algorithm

    Extracting Temporal and Causal Relations between Events

    Full text link
    Structured information resulting from temporal information processing is crucial for a variety of natural language processing tasks, for instance to generate timeline summarization of events from news documents, or to answer temporal/causal-related questions about some events. In this thesis we present a framework for an integrated temporal and causal relation extraction system. We first develop a robust extraction component for each type of relations, i.e. temporal order and causality. We then combine the two extraction components into an integrated relation extraction system, CATENA---CAusal and Temporal relation Extraction from NAtural language texts---, by utilizing the presumption about event precedence in causality, that causing events must happened BEFORE resulting events. Several resources and techniques to improve our relation extraction systems are also discussed, including word embeddings and training data expansion. Finally, we report our adaptation efforts of temporal information processing for languages other than English, namely Italian and Indonesian.Comment: PhD Thesi

    Advanced Knowledge Technologies at the Midterm: Tools and Methods for the Semantic Web

    Get PDF
    The University of Edinburgh and research sponsors are authorised to reproduce and distribute reprints and on-line copies for their purposes notwithstanding any copyright annotation hereon. The views and conclusions contained herein are the author’s and shouldn’t be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of other parties.In a celebrated essay on the new electronic media, Marshall McLuhan wrote in 1962:Our private senses are not closed systems but are endlessly translated into each other in that experience which we call consciousness. Our extended senses, tools, technologies, through the ages, have been closed systems incapable of interplay or collective awareness. Now, in the electric age, the very instantaneous nature of co-existence among our technological instruments has created a crisis quite new in human history. Our extended faculties and senses now constitute a single field of experience which demands that they become collectively conscious. Our technologies, like our private senses, now demand an interplay and ratio that makes rational co-existence possible. As long as our technologies were as slow as the wheel or the alphabet or money, the fact that they were separate, closed systems was socially and psychically supportable. This is not true now when sight and sound and movement are simultaneous and global in extent. (McLuhan 1962, p.5, emphasis in original)Over forty years later, the seamless interplay that McLuhan demanded between our technologies is still barely visible. McLuhan’s predictions of the spread, and increased importance, of electronic media have of course been borne out, and the worlds of business, science and knowledge storage and transfer have been revolutionised. Yet the integration of electronic systems as open systems remains in its infancy.Advanced Knowledge Technologies (AKT) aims to address this problem, to create a view of knowledge and its management across its lifecycle, to research and create the services and technologies that such unification will require. Half way through its sixyear span, the results are beginning to come through, and this paper will explore some of the services, technologies and methodologies that have been developed. We hope to give a sense in this paper of the potential for the next three years, to discuss the insights and lessons learnt in the first phase of the project, to articulate the challenges and issues that remain.The WWW provided the original context that made the AKT approach to knowledge management (KM) possible. AKT was initially proposed in 1999, it brought together an interdisciplinary consortium with the technological breadth and complementarity to create the conditions for a unified approach to knowledge across its lifecycle. The combination of this expertise, and the time and space afforded the consortium by the IRC structure, suggested the opportunity for a concerted effort to develop an approach to advanced knowledge technologies, based on the WWW as a basic infrastructure.The technological context of AKT altered for the better in the short period between the development of the proposal and the beginning of the project itself with the development of the semantic web (SW), which foresaw much more intelligent manipulation and querying of knowledge. The opportunities that the SW provided for e.g., more intelligent retrieval, put AKT in the centre of information technology innovation and knowledge management services; the AKT skill set would clearly be central for the exploitation of those opportunities.The SW, as an extension of the WWW, provides an interesting set of constraints to the knowledge management services AKT tries to provide. As a medium for the semantically-informed coordination of information, it has suggested a number of ways in which the objectives of AKT can be achieved, most obviously through the provision of knowledge management services delivered over the web as opposed to the creation and provision of technologies to manage knowledge.AKT is working on the assumption that many web services will be developed and provided for users. The KM problem in the near future will be one of deciding which services are needed and of coordinating them. Many of these services will be largely or entirely legacies of the WWW, and so the capabilities of the services will vary. As well as providing useful KM services in their own right, AKT will be aiming to exploit this opportunity, by reasoning over services, brokering between them, and providing essential meta-services for SW knowledge service management.Ontologies will be a crucial tool for the SW. The AKT consortium brings a lot of expertise on ontologies together, and ontologies were always going to be a key part of the strategy. All kinds of knowledge sharing and transfer activities will be mediated by ontologies, and ontology management will be an important enabling task. Different applications will need to cope with inconsistent ontologies, or with the problems that will follow the automatic creation of ontologies (e.g. merging of pre-existing ontologies to create a third). Ontology mapping, and the elimination of conflicts of reference, will be important tasks. All of these issues are discussed along with our proposed technologies.Similarly, specifications of tasks will be used for the deployment of knowledge services over the SW, but in general it cannot be expected that in the medium term there will be standards for task (or service) specifications. The brokering metaservices that are envisaged will have to deal with this heterogeneity.The emerging picture of the SW is one of great opportunity but it will not be a wellordered, certain or consistent environment. It will comprise many repositories of legacy data, outdated and inconsistent stores, and requirements for common understandings across divergent formalisms. There is clearly a role for standards to play to bring much of this context together; AKT is playing a significant role in these efforts. But standards take time to emerge, they take political power to enforce, and they have been known to stifle innovation (in the short term). AKT is keen to understand the balance between principled inference and statistical processing of web content. Logical inference on the Web is tough. Complex queries using traditional AI inference methods bring most distributed computer systems to their knees. Do we set up semantically well-behaved areas of the Web? Is any part of the Web in which semantic hygiene prevails interesting enough to reason in? These and many other questions need to be addressed if we are to provide effective knowledge technologies for our content on the web

    Application of advanced on-board processing concepts to future satellite communications systems: Bibliography

    Get PDF
    Abstracts are presented of a literature survey of reports concerning the application of signal processing concepts. Approximately 300 references are included

    Automated detection and analysis of fluorescence changes evoked by molecular signalling

    Get PDF
    Fluorescent dyes and genetically encoded fluorescence indicators (GEFI) are common tools for visualizing concentration changes of specific ions and messenger molecules during intra- as well as intercellular communication. While fluorescent dyes have to be directly loaded into target cells and function only transiently, the expression of GEFIs can be controlled in a cell and time-specific fashion, even allowing long-term analysis in living organisms. Dye and GEFI based fluorescence fluctuations, recorded using advanced imaging technologies, are the foundation for the analysis of physiological molecular signaling. Analyzing the plethora of complex fluorescence signals is a laborious and time-consuming task. An automated analysis of fluorescent signals circumvents user bias and time constraints. However, it requires to overcome several challenges, including correct estimation of fluorescence fluctuations at basal concentrations of messenger molecules, detection and extraction of events themselves, proper segmentation of neighboring events as well as tracking of propagating events. Moreover, event detection algorithms need to be sensitive enough to accurately capture localized and low amplitude events exhibiting a limited spatial extent. This thesis presents three novel algorithms, PBasE, CoRoDe and KalEve, for the automated analysis of fluorescence events, developed to overcome the aforementioned challenges. The algorithms are integrated into a graphical application called MSparkles, specifically designed for the analysis of fluorescence signals, developed in MATLAB. The capabilities of the algorithms are demonstrated by analyzing astroglial Ca2+ events, recorded in anesthetized and awake mice, visualized using genetically encoded Ca2+ indicators (GECIs) GCaMP3 as well as GCaMP5. The results were compared to those obtained by other software packages. In addition, the analysis of neuronal Na+ events recorded in acute brain slices using SBFI-AM serve to indicate the putatively broad application range of the presented algorithms. Finally, due to increasing evidence of the pivotal role of astrocytes in neurodegenerative diseases such as epilepsy, a metric to assess the synchronous occurrence of fluorescence events is introduced. In a proof-of-principle analysis, this metric is used to correlate astroglial Ca2+ events with EEG measurementsFluoreszenzfarbstoffe und genetisch kodierte Fluoreszenzindikatoren (GEFI) sind gängige Werkzeuge zur Visualisierung von Konzentrationsänderungen bestimmter Ionen und Botenmoleküle der intra- sowie interzellulären Kommunikation. Während Fluoreszenzfarbstoffe direkt in die Zielzellen eingebracht werden müssen und nur über einen begrenzten Zeitraum funktionieren, kann die Expression von GEFIs zell- und zeitspezifisch gesteuert werden, was darüber hinaus Langzeitanalysen in lebenden Organismen ermöglicht. Farbstoff- und GEFI-basierte Fluoreszenzfluktuationen, die mit Hilfe moderner bildgebender Verfahren aufgezeichnet werden, bilden die Grundlage für die Analyse physiologischer molekularer Kommunikation. Die Analyse einer großen Zahl komplexer Fluoreszenzsignale ist jedoch eine schwierige und zeitaufwändige Aufgabe. Eine automatisierte Analyse ist dagegen weniger zeitaufwändig und unabhängig von der Voreingenommenheit des Anwenders. Allerdings müssen hierzu mehrere Herausforderungen bewältigt werden. Unter anderem die korrekte Schätzung von Fluoreszenzschwankungen bei Basalkonzentrationen von Botenmolekülen, die Detektion und Extraktion von Signalen selbst, die korrekte Segmentierung benachbarter Signale sowie die Verfolgung sich ausbreitender Signale. Darüber hinaus müssen die Algorithmen zur Signalerkennung empfindlich genug sein, um lokalisierte Signale mit geringer Amplitude sowie begrenzter räumlicher Ausdehnung genau zu erfassen. In dieser Arbeit werden drei neue Algorithmen, PBasE, CoRoDe und KalEve, für die automatische Extraktion und Analyse von Fluoreszenzsignalen vorgestellt, die entwickelt wurden, um die oben genannten Herausforderungen zu bewältigen. Die Algorithmen sind in eine grafische Anwendung namens MSparkles integriert, die speziell für die Analyse von Fluoreszenzsignalen entwickelt und in MATLAB implementiert wurde. Die Fähigkeiten der Algorithmen werden anhand der Analyse astroglialer Ca2+-Signale demonstriert, die in narkotisierten sowie wachen Mäusen aufgezeichnet und mit den genetisch kodierten Ca2+-Indikatoren (GECIs) GCaMP3 und GCaMP5 visualisiert wurden. Erlangte Ergebnisse werden anschließend mit denen anderer Softwarepakete verglichen. Darüber hinaus dient die Analyse neuronaler Na+-Signale, die in akuten Hirnschnitten mit SBFI-AM aufgezeichnet wurden, dazu, den breiten Anwendungsbereich der Algorithmen aufzuzeigen. Zu guter Letzt wird aufgrund der zunehmenden Indizien auf die zentrale Rolle von Astrozyten bei neurodegenerativen Erkrankungen wie Epilepsie eine Metrik zur Bewertung des synchronen Auftretens fluoreszenter Signale eingeführt. In einer Proof-of-Principle-Analyse wird diese Metrik verwendet, um astrogliale Ca2+-Signale mit EEG-Messungen zu korrelieren
    • …
    corecore