113 research outputs found

    Image Processing Applications in Real Life: 2D Fragmented Image and Document Reassembly and Frequency Division Multiplexed Imaging

    Get PDF
    In this era of modern technology, image processing is one the most studied disciplines of signal processing and its applications can be found in every aspect of our daily life. In this work three main applications for image processing has been studied. In chapter 1, frequency division multiplexed imaging (FDMI), a novel idea in the field of computational photography, has been introduced. Using FDMI, multiple images are captured simultaneously in a single shot and can later be extracted from the multiplexed image. This is achieved by spatially modulating the images so that they are placed at different locations in the Fourier domain. Finally, a Texas Instruments digital micromirror device (DMD) based implementation of FDMI is presented and results are shown. Chapter 2 discusses the problem of image reassembly which is to restore an image back to its original form from its pieces after it has been fragmented due to different destructive reasons. We propose an efficient algorithm for 2D image fragment reassembly problem based on solving a variation of Longest Common Subsequence (LCS) problem. Our processing pipeline has three steps. First, the boundary of each fragment is extracted automatically; second, a novel boundary matching is performed by solving LCS to identify the best possible adjacency relationship among image fragment pairs; finally, a multi-piece global alignment is used to filter out incorrect pairwise matches and compose the final image. We perform experiments on complicated image fragment datasets and compare our results with existing methods to show the improved efficiency and robustness of our method. The problem of reassembling a hand-torn or machine-shredded document back to its original form is another useful version of the image reassembly problem. Reassembling a shredded document is different from reassembling an ordinary image because the geometric shape of fragments do not carry a lot of valuable information if the document has been machine-shredded rather than hand-torn. On the other hand, matching words and context can be used as an additional tool to help improve the task of reassembly. In the final chapter, document reassembly problem has been addressed through solving a graph optimization problem

    Evolution of whole genomes through inversions:models and algorithms for duplicates, ancestors, and edit scenarios

    Get PDF
    Advances in sequencing technology are yielding DNA sequence data at an alarming rate – a rate reminiscent of Moore's law. Biologists' abilities to analyze this data, however, have not kept pace. On the other hand, the discrete and mechanical nature of the cell life-cycle has been tantalizing to computer scientists. Thus in the 1980s, pioneers of the field now called Computational Biology began to uncover a wealth of computer science problems, some confronting modern Biologists and some hidden in the annals of the biological literature. In particular, many interesting twists were introduced to classical string matching, sorting, and graph problems. One such problem, first posed in 1941 but rediscovered in the early 1980s, is that of sorting by inversions (also called reversals): given two permutations, find the minimum number of inversions required to transform one into the other, where an inversion inverts the order of a subpermutation. Indeed, many genomes have evolved mostly or only through inversions. Thus it becomes possible to trace evolutionary histories by inferring sequences of such inversions that led to today's genomes from a distant common ancestor. But unlike the classic edit distance problem where string editing was relatively simple, editing permutation in this way has proved to be more complex. In this dissertation, we extend the theory so as to make these edit distances more broadly applicable and faster to compute, and work towards more powerful tools that can accurately infer evolutionary histories. In particular, we present work that for the first time considers genomic distances between any pair of genomes, with no limitation on the number of occurrences of a gene. Next we show that there are conditions under which an ancestral genome (or one close to the true ancestor) can be reliably reconstructed. Finally we present new methodology that computes a minimum-length sequence of inversions to transform one permutation into another in, on average, O(n log n) steps, whereas the best worst-case algorithm to compute such a sequence uses O(n√n log n) steps

    Techniques of design optimisation for algorithms implemented in software

    Get PDF
    The overarching objective of this thesis was to develop tools for parallelising, optimising, and implementing algorithms on parallel architectures, in particular General Purpose Graphics Processors (GPGPUs). Two projects were chosen from different application areas in which GPGPUs are used: a defence application involving image compression, and a modelling application in bioinformatics (computational immunology). Each project had its own specific objectives, as well as supporting the overall research goal. The defence / image compression project was carried out in collaboration with the Jet Propulsion Laboratories. The specific questions were: to what extent an algorithm designed for bit-serial for the lossless compression of hyperspectral images on-board unmanned vehicles (UAVs) in hardware could be parallelised, whether GPGPUs could be used to implement that algorithm, and whether a software implementation with or without GPGPU acceleration could match the throughput of a dedicated hardware (FPGA) implementation. The dependencies within the algorithm were analysed, and the algorithm parallelised. The algorithm was implemented in software for GPGPU, and optimised. During the optimisation process, profiling revealed less than optimal device utilisation, but no further optimisations resulted in an improvement in speed. The design had hit a local-maximum of performance. Analysis of the arithmetic intensity and data-flow exposed flaws in the standard optimisation metric of kernel occupancy used for GPU optimisation. Redesigning the implementation with revised criteria (fused kernels, lower occupancy, and greater data locality) led to a new implementation with 10x higher throughput. GPGPUs were shown to be viable for on-board implementation of the CCSDS lossless hyperspectral image compression algorithm, exceeding the performance of the hardware reference implementation, and providing sufficient throughput for the next generation of image sensor as well. The second project was carried out in collaboration with biologists at the University of Arizona and involved modelling a complex biological system – VDJ recombination involved in the formation of T-cell receptors (TCRs). Generation of immune receptors (T cell receptor and antibodies) by VDJ recombination is an enormously complex process, which can theoretically synthesize greater than 1018 variants. Originally thought to be a random process, the underlying mechanisms clearly have a non-random nature that preferentially creates a small subset of immune receptors in many individuals. Understanding this bias is a longstanding problem in the field of immunology. Modelling the process of VDJ recombination to determine the number of ways each immune receptor can be synthesized, previously thought to be untenable, is a key first step in determining how this special population is made. The computational tools developed in this thesis have allowed immunologists for the first time to comprehensively test and invalidate a longstanding theory (convergent recombination) for how this special population is created, while generating the data needed to develop novel hypothesis

    Algorithms for string and graph layout

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.Includes bibliographical references (p. 121-125).Many graph optimization problems can be viewed as graph layout problems. A layout of a graph is a geometric arrangement of the vertices subject to given constraints. For example, the vertices of a graph can be arranged on a line or a circle, on a two- or three-dimensional lattice, etc. The goal is usually to place all the vertices so as to optimize some specified objective function. We develop combinatorial methods as well as models based on linear and semidefinite programming for graph layout problems. We apply these techniques to some well-known optimization problems. In particular, we give improved approximation algorithms for the string folding problem on the two- and three-dimensional square lattices. This combinatorial graph problem is motivated by the protein folding problem, which is central in computational biology. We then present a new semidefinite programming formulation for the linear ordering problem (also known as the maximum acyclic subgraph problem) and show that it provides an improved bound on the value of an optimal solution for random graphs. This is the first relaxation that improves on the trivial "all edges" bound for random graphs.by Alantha Newman.Ph.D

    Utilization of automated location tracking for clinical workflow analytics and visualization

    Get PDF
    abstract: The analysis of clinical workflow offers many challenges to clinical stakeholders and researchers, especially in environments characterized by dynamic and concurrent processes. Workflow analysis in such environments is essential for monitoring performance and finding bottlenecks and sources of error. Clinical workflow analysis has been enhanced with the inclusion of modern technologies. One such intervention is automated location tracking which is a system that detects the movement of clinicians and equipment. Utilizing the data produced from automated location tracking technologies can lead to the development of novel workflow analytics that can be used to complement more traditional approaches such as ethnography and grounded-theory based qualitative methods. The goals of this research are to: (i) develop a series of analytic techniques to derive deeper workflow-related insight in an emergency department setting, (ii) overlay data from disparate sources (quantitative and qualitative) to develop strategies that facilitate workflow redesign, and (iii) incorporate visual analytics methods to improve the targeted visual feedback received by providers based on the findings. The overarching purpose is to create a framework to demonstrate the utility of automated location tracking data used in conjunction with clinical data like EHR logs and its vital role in the future of clinical workflow analysis/analytics. This document is categorized based on two primary aims of the research. The first aim deals with the use of automated location tracking data to develop a novel methodological/exploratory framework for clinical workflow. The second aim is to overlay the quantitative data generated from the previous aim on data from qualitative observation and shadowing studies (mixed methods) to develop a deeper view of clinical workflow that can be used to facilitate workflow redesign. The final sections of the document speculate on the direction of this work where the potential of this research in the creation of fully integrated clinical environments i.e. environments with state-of-the-art location tracking and other data collection mechanisms, is discussed. The main purpose of this research is to demonstrate ways by which clinical processes can be continuously monitored allowing for proactive adaptations in the face of technological and process changes to minimize any negative impact on the quality of patient care and provider satisfaction.Dissertation/ThesisDoctoral Dissertation Biomedical Informatics 201

    Linking Music Metadata.

    Get PDF
    PhDThe internet has facilitated music metadata production and distribution on an unprecedented scale. A contributing factor of this data deluge is a change in the authorship of this data from the expert few to the untrained crowd. The resulting unordered flood of imperfect annotations provides challenges and opportunities in identifying accurate metadata and linking it to the music audio in order to provide a richer listening experience. We advocate novel adaptations of Dynamic Programming for music metadata synchronisation, ranking and comparison. This thesis introduces Windowed Time Warping, Greedy, Constrained On-Line Time Warping for synchronisation and the Concurrence Factor for automatically ranking metadata. We begin by examining the availability of various music metadata on the web. We then review Dynamic Programming methods for aligning and comparing two source sequences whilst presenting novel, specialised adaptations for efficient, realtime synchronisation of music and metadata that make improvements in speed and accuracy over existing algorithms. The Concurrence Factor, which measures the degree in which an annotation of a song agrees with its peers, is proposed in order to utilise the wisdom of the crowds to establish a ranking system. This attribute uses a combination of the standard Dynamic Programming methods Levenshtein Edit Distance, Dynamic Time Warping, and Longest Common Subsequence to compare annotations. We present a synchronisation application for applying the aforementioned methods as well as a tablature-parsing application for mining and analysing guitar tablatures from the web. We evaluate the Concurrence Factor as a ranking system on a largescale collection of guitar tablatures and lyrics to show a correlation with accuracy that is superior to existing methods currently used in internet search engines, which are based on popularity and human ratingsEngineering and Physical Sciences Research Council; Travel grant from the Royal Engineering Society

    Data Integration: Techniques and Evaluation

    Get PDF
    Within the DIECOFIS framework, ec3, the Division of Business Statistics from the Vienna University of Economics and Business Administration and ISTAT worked together to find methods to create a comprehensive database of enterprise data required for taxation microsimulations via integration of existing disparate enterprise data sources. This paper provides an overview of the broad spectrum of investigated methodology (including exact and statistical matching as well as imputation) and related statistical quality indicators, and emphasises the relevance of data integration, especially for official statistics, as a means of using available information more efficiently and improving the quality of a statistical agency's products. Finally, an outlook on an empirical study comparing different exact matching procedures in the maintenance of Statistics Austria's Business Register is presented

    Making Thin Data Thick: User Behavior Analysis with Minimum Information

    Get PDF
    abstract: With the rise of social media, user-generated content has become available at an unprecedented scale. On Twitter, 1 billion tweets are posted every 5 days and on Facebook, 20 million links are shared every 20 minutes. These massive collections of user-generated content have introduced the human behavior's big-data. This big data has brought about countless opportunities for analyzing human behavior at scale. However, is this data enough? Unfortunately, the data available at the individual-level is limited for most users. This limited individual-level data is often referred to as thin data. Hence, researchers face a big-data paradox, where this big-data is a large collection of mostly limited individual-level information. Researchers are often constrained to derive meaningful insights regarding online user behavior with this limited information. Simply put, they have to make thin data thick. In this dissertation, how human behavior's thin data can be made thick is investigated. The chief objective of this dissertation is to demonstrate how traces of human behavior can be efficiently gleaned from the, often limited, individual-level information; hence, introducing an all-inclusive user behavior analysis methodology that considers social media users with different levels of information availability. To that end, the absolute minimum information in terms of both link or content data that is available for any social media user is determined. Utilizing only minimum information in different applications on social media such as prediction or recommendation tasks allows for solutions that are (1) generalizable to all social media users and that are (2) easy to implement. However, are applications that employ only minimum information as effective or comparable to applications that use more information? In this dissertation, it is shown that common research challenges such as detecting malicious users or friend recommendation (i.e., link prediction) can be effectively performed using only minimum information. More importantly, it is demonstrated that unique user identification can be achieved using minimum information. Theoretical boundaries of unique user identification are obtained by introducing social signatures. Social signatures allow for user identification in any large-scale network on social media. The results on single-site user identification are generalized to multiple sites and it is shown how the same user can be uniquely identified across multiple sites using only minimum link or content information. The findings in this dissertation allows finding the same user across multiple sites, which in turn has multiple implications. In particular, by identifying the same users across sites, (1) patterns that users exhibit across sites are identified, (2) how user behavior varies across sites is determined, and (3) activities that are observed only across sites are identified and studied.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Accelerating Event Stream Processing in On- and Offline Systems

    Get PDF
    Due to a growing number of data producers and their ever-increasing data volume, the ability to ingest, analyze, and store potentially never-ending streams of data is a mission-critical task in today's data processing landscape. A widespread form of data streams are event streams, which consist of continuously arriving notifications about some real-world phenomena. For example, a temperature sensor naturally generates an event stream by periodically measuring the temperature and reporting it with measurement time in case of a substantial change to the previous measurement. In this thesis, we consider two kinds of event stream processing: online and offline. Online refers to processing events solely in main memory as soon as they arrive, while offline means processing event data previously persisted to non-volatile storage. Both modes are supported by widely used scale-out general-purpose stream processing engines (SPEs) like Apache Flink or Spark Streaming. However, such engines suffer from two significant deficiencies that severely limit their processing performance. First, for offline processing, they load the entire stream from non-volatile secondary storage and replay all data items into the associated online engine in order of their original arrival. While this naturally ensures unified query semantics for on- and offline processing, the costs for reading the entire stream from non-volatile storage quickly dominate the overall processing costs. Second, modern SPEs focus on scaling out computations across the nodes of a cluster, but use only a fraction of the available resources of individual nodes. This thesis tackles those problems with three different approaches. First, we present novel techniques for the offline processing of two important query types (windowed aggregation and sequential pattern matching). Our methods utilize well-understood indexing techniques to reduce the total amount of data to read from non-volatile storage. We show that this improves the overall query runtime significantly. In particular, this thesis develops the first index-based algorithms for pattern queries expressed with the Match_Recognize clause, a new and powerful language feature of SQL that has received little attention so far. Second, we show how to maximize resource utilization of single nodes by exploiting the capabilities of modern hardware. Therefore, we develop a prototypical shared-memory CPU-GPU-enabled event processing system. The system provides implementations of all major event processing operators (filtering, windowed aggregation, windowed join, and sequential pattern matching). Our experiments reveal that regarding resource utilization and processing throughput, such a hardware-enabled system is superior to hardware-agnostic general-purpose engines. Finally, we present TPStream, a new operator for pattern matching over temporal intervals. TPStream achieves low processing latency and, in contrast to sequential pattern matching, is easily parallelizable even for unpartitioned input streams. This results in maximized resource utilization, especially for modern CPUs with multiple cores
    • …
    corecore