3,318 research outputs found

    Supervised manifold distance segmentation

    Get PDF
    In this paper, I will propose a simple and robust method for image and volume data segmentation based on manifold distance metrics. In this approach, pixels in an image are not considered as points with color values arranged in a grid. In this way, a new data set is built by a transform function from one traditional 2D image or 3D volume to a manifold in higher dimension feature space. Multiple possible feature spaces like position, gradient and probabilistic measures are studied and experimented. Graph algorithm and probabilistic classification are involved. Both time and space complexity of this algorithm is O(N). With appropriate choice of feature vector, this method could produce similar qualitative and quantitative results to other algorithms like Level Sets and Random Walks. Analysis of sensitivity to parameters is presented. Comparison between segmentation results and ground-truth images is also provided to validate of the robustness of this method

    Database System Acceleration on FPGAs

    Get PDF
    Relational database systems provide various services and applications with an efficient means for storing, processing, and retrieving their data. The performance of these systems has a direct impact on the quality of service of the applications that rely on them. Therefore, it is crucial that database systems are able to adapt and grow in tandem with the demands of these applications, ensuring that their performance scales accordingly. In the past, Moore's law and algorithmic advancements have been sufficient to meet these demands. However, with the slowdown of Moore's law, researchers have begun exploring alternative methods, such as application-specific technologies, to satisfy the more challenging performance requirements. One such technology is field-programmable gate arrays (FPGAs), which provide ideal platforms for developing and running custom architectures for accelerating database systems. The goal of this thesis is to develop a domain-specific architecture that can enhance the performance of in-memory database systems when executing analytical queries. Our research is guided by a combination of academic and industrial requirements that seek to strike a balance between generality and performance. The former ensures that our platform can be used to process a diverse range of workloads, while the latter makes it an attractive solution for high-performance use cases. Throughout this thesis, we present the development of a system-on-chip for database system acceleration that meets our requirements. The resulting architecture, called CbMSMK, is capable of processing the projection, sort, aggregation, and equi-join database operators and can also run some complex TPC-H queries. CbMSMK employs a shared sort-merge pipeline for executing all these operators, which results in an efficient use of FPGA resources. This approach enables the instantiation of multiple acceleration cores on the FPGA, allowing it to serve multiple clients simultaneously. CbMSMK can process both arbitrarily deep and wide tables efficiently. The former is achieved through the use of the sort-merge algorithm which utilizes the FPGA RAM for buffering intermediate sort results. The latter is achieved through the use of KeRRaS, a novel variant of the forward radix sort algorithm introduced in this thesis. KeRRaS allows CbMSMK to process a table a few columns at a time, incrementally generating the final result through multiple iterations. Given that acceleration is a key objective of our work, CbMSMK benefits from many performance optimizations. For instance, multi-way merging is employed to reduce the number of merge passes required for the execution of the sort-merge algorithm, thus improving the performance of all our pipeline-breaking operators. Another example is our in-depth analysis of early aggregation, which led to the development of a novel cache-based algorithm that significantly enhances aggregation performance. Our experiments demonstrate that CbMSMK performs on average 5 times faster than the state-of-the-art CPU-based database management system MonetDB.:I Database Systems & FPGAs 1 INTRODUCTION 1.1 Databases & the Importance of Performance 1.2 Accelerators & FPGAs 1.3 Requirements 1.4 Outline & Summary of Contributions 2 BACKGROUND ON DATABASE SYSTEMS 2.1 Databases 2.1.1 Storage Model 2.1.2 Storage Medium 2.2 Database Operators 2.2.1 Projection 2.2.2 Filter 2.2.3 Sort 2.2.4 Aggregation 2.2.5 Join 2.2.6 Operator Classification 2.3 Database Queries 2.4 Impact of Acceleration 3 BACKGROUND ON FPGAS 3.1 FPGA 3.1.1 Logic Element 3.1.2 Block RAM (BRAM) 3.1.3 Digital Signal Processor (DSP) 3.1.4 IO Element 3.1.5 Programmable Interconnect 3.2 FPGADesignFlow 3.2.1 Specifications 3.2.2 RTL Description 3.2.3 Verification 3.2.4 Synthesis, Mapping, Placement, and Routing 3.2.5 TimingAnalysis 3.2.6 Bitstream Generation and FPGA Programming 3.3 Implementation Quality Metrics 3.4 FPGA Cards 3.5 Benefits of Using FPGAs 3.6 Challenges of Using FPGAs 4 RELATED WORK 4.1 Summary of Related Work 4.2 Platform Type 4.2.1 Accelerator Card 4.2.2 Coprocessor 4.2.3 Smart Storage 4.2.4 Network Processor 4.3 Implementation 4.3.1 Loop-based implementation 4.3.2 Sort-based Implementation 4.3.3 Hash-based Implementation 4.3.4 Mixed Implementation 4.4 A Note on Quantitative Performance Comparisons II Cache-Based Morphing Sort-Merge with KeRRaS (CbMSMK) 5 OBJECTIVES AND ARCHITECTURE OVERVIEW 5.1 From Requirements to Objectives 5.2 Architecture Overview 5.3 Outlineof Part II 6 COMPARATIVE ANALYSIS OF OPENCL AND RTL FOR SORT-MERGE PRIMITIVES ON FPGAS 6.1 Programming FPGAs 6.2 RelatedWork 6.3 Architecture 6.3.1 Global Architecture 6.3.2 Sorter Architecture 6.3.3 Merger Architecture 6.3.4 Scalability and Resource Adaptability 6.4 Experiments 6.4.1 OpenCL Sort-Merge Implementation 6.4.2 RTLSorters 6.4.3 RTLMergers 6.4.4 Hybrid OpenCL-RTL Sort-Merge Implementation 6.5 Summary & Discussion 7 RESOURCE-EFFICIENT ACCELERATION OF PIPELINE-BREAKING DATABASE OPERATORS ON FPGAS 7.1 The Case for Resource Efficiency 7.2 Related Work 7.3 Architecture 7.3.1 Sorters 7.3.2 Sort-Network 7.3.3 X:Y Mergers 7.3.4 Merge-Network 7.3.5 Join Materialiser (JoinMat) 7.4 Experiments 7.4.1 Experimental Setup 7.4.2 Implementation Description & Tuning 7.4.3 Sort Benchmarks 7.4.4 Aggregation Benchmarks 7.4.5 Join Benchmarks 7. Summary 8 KERRAS: COLUMN-ORIENTED WIDE TABLE PROCESSING ON FPGAS 8.1 The Scope of Database System Accelerators 8.2 Related Work 8.3 Key-Reduce Radix Sort(KeRRaS) 8.3.1 Time Complexity 8.3.2 Space Complexity (Memory Utilization) 8.3.3 Discussion and Optimizations 8.4 Architecture 8.4.1 MSM 8.4.2 MSMK: Extending MSM with KeRRaS 8.4.3 Payload, Aggregation and Join Processing 8.4.4 Limitations 8.5 Experiments 8.5.1 Experimental Setup 8.5.2 Datasets 8.5.3 MSMK vs. MSM 8.5.4 Payload-Less Benchmarks 8.5.5 Payload-Based Benchmarks 8.5.6 Flexibility 8.6 Summary 9 A STUDY OF EARLY AGGREGATION IN DATABASE QUERY PROCESSING ON FPGAS 9.1 Early Aggregation 9.2 Background & Related Work 9.2.1 Sort-Based Early Aggregation 9.2.2 Cache-Based Early Aggregation 9.3 Simulations 9.3.1 Datasets 9.3.2 Metrics 9.3.3 Sort-Based Versus Cache-Based Early Aggregation 9.3.4 Comparison of Set-Associative Caches 9.3.5 Comparison of Cache Structures 9.3.6 Comparison of Replacement Policies 9.3.7 Cache Selection Methodology 9.4 Cache System Architecture 9.4.1 Window Aggregator 9.4.2 Compressor & Hasher 9.4.3 Collision Detector 9.4.4 Collision Resolver 9.4.5 Cache 9.5 Experiments 9.5.1 Experimental Setup 9.5.2 Resource Utilization and Parameter Tuning 9.5.3 Datasets 9.5.4 Benchmarks on Synthetic Data 9.5.5 Benchmarks on Real Data 9.6 Summary 10 THE FULL PICTURE 10.1 System Architecture 10.2 Benchmarks 10.3 Meeting the Objectives III Conclusion 11 SUMMARY AND OUTLOOK ON FUTURE RESEARCH 11.1 Summary 11.2 Future Work BIBLIOGRAPHY LIST OF FIGURES LIST OF TABLE

    Extracting the Structure and Conformations of Biological Entities from Large Datasets

    Get PDF
    In biology, structure determines function, which often proceeds via changes in conformation. Efficient means for determining structure exist, but mapping conformations continue to present a serious challenge. Single-particles approaches, such as cryogenic electron microscopy (cryo-EM) and emerging diffract & destroy X-ray techniques are, in principle, ideally positioned to overcome these challenges. But the algorithmic ability to extract information from large heterogeneous datasets consisting of unsorted snapshots - each emanating from an unknown orientation of an object in an unknown conformation - remains elusive. It is the objective of this thesis to describe and validate a powerful suite of manifold-based algorithms able to extract structural and conformational information from large datasets. These computationally efficient algorithms offer a new approach to determining the structure and conformations of viruses and macromolecules. After an introduction, we demonstrate a distributed, exact k-Nearest Neighbor Graph (k-NNG) construction method, in order to establish a firm algorithmic basis for manifold-based analysis. The proposed algorithm uses Graphics Processing Units (GPUs) and exploits multiple levels of parallelism in distributed computational environment and it is scalable for different cluster sizes, with each compute node in the cluster containing multiple GPUs. Next, we present applications of manifold-based analysis in determining structure and conformational variability. Using the Diffusion Map algorithm, a new approach is presented, which is capable of determining structure of symmetric objects, such as viruses, to 1/100th of the object diameter, using low-signal diffraction snapshots. This is demonstrated by means of a successful 3D reconstruction of the Satellite Tobacco Necrosis Virus (STNV) to atomic resolution from simulated diffraction snapshots with and without noise. We next present a new approach for determining discrete conformational changes of the enzyme Adenylate kinase (ADK) from very large datasets of up to 20 million snapshots, each with ~104 pixels. This exceeds by an order of magnitude the largest dataset previously analyzed. Finally, we present a theoretical framework and an algorithmic pipeline for capturing continuous conformational changes of the ribosome from ultralow-signal (-12dB) experimental cryo-EM. Our analysis shows a smooth, concerted change in molecular structure in two-dimensional projection, which might be indicative of the way the ribosome functions as a molecular machine. The thesis ends with a summary and future prospects

    Generating renderers

    Get PDF
    Most production renderers developed for the film industry are huge pieces of software that are able to render extremely complex scenes. Unfortunately, they are implemented using the currently available programming models that are not well suited to modern computing hardware like CPUs with vector units or GPUs. Thus, they have to deal with the added complexity of expressing parallelism and using hardware features in those models. Since compilers cannot alone optimize and generate efficient programs for any type of hardware, because of the large optimization spaces and the complexity of the underlying compiler problems, programmers have to rely on compiler-specific hardware intrinsics or write non-portable code. The consequence of these limitations is that programmers resort to writing the same code twice when they need to port their algorithm on a different architecture, and that the code itself becomes difficult to maintain, as algorithmic details are buried under hardware details. Thankfully, there are solutions to this problem, taking the form of Domain-Specific Lan- guages. As their name suggests, these languages are tailored for one domain, and compilers can therefore use domain-specific knowledge to optimize algorithms and choose the best execution policy for a given target hardware. In this thesis, we opt for another way of encoding domain- specific knowledge: We implement a generic, high-level, and declarative rendering and traversal library in a functional language, and later refine it for a target machine by providing partial evaluation annotations. The partial evaluator then specializes the entire renderer according to the available knowledge of the scene: Shaders are specialized when their inputs are known, and in general, all redundant computations are eliminated. Our results show that the generated renderers are faster and more portable than renderers written with state-of-the-art competing libraries, and that in comparison, our rendering library requires less implementation effort.Die meisten in der Filmindustrie zum Einsatz kommenden Renderer sind riesige Softwaresysteme, die in der Lage sind, extrem aufwendige Szenen zu rendern. Leider sind diese mit den aktuell verfügbaren Programmiermodellen implementiert, welche nicht gut geeignet sind für moderne Rechenhardware wie CPUs mit Vektoreinheiten oder GPUs. Deshalb müssen Entwickler sich mit der zusätzlichen Komplexität auseinandersetzen, Parallelismus und Hardwarefunktionen in diesen Programmiermodellen auszudrücken. Da Compiler nicht selbständig optimieren und effiziente Programme für jeglichen Typ Hardware generieren können, wegen des großen Optimierungsraumes und der Komplexität des unterliegenden Kompilierungsproblems, müssen Programmierer auf Compiler-spezifische Hardware-“Intrinsics” zurückgreifen, oder nicht portierbaren Code schreiben. Die Konsequenzen dieser Limitierungen sind, dass Programmierer darauf zurückgreifen den gleichen Code zweimal zu schreiben, wenn sie ihre Algorithmen für eine andere Architektur portieren müssen, und dass der Code selbst schwer zu warten wird, da algorithmische Details unter Hardwaredetails verloren gehen. Glücklicherweise gibt es Lösungen für dieses Problem, in der Form von DSLs. Diese Sprachen sind maßgeschneidert für eine Domäne und Compiler können deshalb Domänenspezifisches Wissen nutzen, um Algorithmen zu optimieren und die beste Ausführungsstrategie für eine gegebene Zielhardware zu wählen. In dieser Dissertation wählen wir einen anderen Weg, Domänenspezifisches Wissen zu enkodieren: Wir implementieren eine generische, high-level und deklarative Rendering- und Traversierungsbibliothek in einer funktionalen Programmiersprache, und verfeinern sie später für eine Zielmaschine durch Bereitstellung von Annotationen für die partielle Auswertung. Der “Partial Evaluator” spezialisiert dann den kompletten Renderer, basierend auf dem verfügbaren Wissen über die Szene: Shader werden spezialisiert, wenn ihre Eingaben bekannt sind, und generell werden alle redundanten Berechnungen eliminiert. Unsere Ergebnisse zeigen, dass die generierten Renderer schneller und portierbarer sind, als Renderer geschrieben mit den aktuellen Techniken konkurrierender Bibliotheken und dass, im Vergleich, unsere Rendering Bibliothek weniger Implementierungsaufwand erfordert.This work was supported by the Federal Ministry of Education and Research (BMBF) as part of the Metacca and ProThOS projects as well as by the Intel Visual Computing Institute (IVCI) and Cluster of Excellence on Multimodal Computing and Interaction (MMCI) at Saarland University. Parts of it were also co-funded by the European Union(EU), as part of the Dreamspace project

    Leader Labeling of Employees within Organizations: Descriptions, Daily Patterns, and Contextual Factors

    Get PDF
    This study explored how formally assigned, organizational leaders perceive their employees using an explanatory sequential mixed-method approach. Applying the tropes associated with labeling theory (i.e., the perceptual frame within the labeling process) and positive organizational elements (i.e., positive deviance and positive leadership), the research determined what potential labels leaders assign to employees they supervise, examined the degree to which self-assessed positive leaders assign more positive descriptors, and identified contextual factors that influence the leaders’ labeling process. As part of an eligibility process for the study, leaders completed a positive leader self-assessment (n = 62), of which a sample (n = 46) participated in a diary study throughout one workweek. As a group, the leaders assigned positive descriptors to their employees 78% of the time during the study. Leaders who assessed themselves as effective positive leaders (M = 20.42, SD = 4.010) used more positive descriptors than those who did not (M = 15.24, SD = 5.533). Of the descriptors that were considered potential labels, 34% were positive and only 4% were negative. Leader labeling of person-related deviances (rather than job-related) was more likely used to describe extreme traits, behaviors, and emotions that the leader did or did not value. A more meaningful understanding of what labels leaders apply to employees, why they apply them, and whether they relate to self-assessed positivity can improve leadership within organizations. Empowered with this understanding, leaders can improve self-awareness and more positively influence employees

    Simulation in medical education : a case study evaluating the efficacy of high-fidelity patient simulation

    Get PDF
    Indiana University-Purdue University Indianapolis (IUPUI)High-fidelity patient simulation (HFPS) recreates clinical scenarios by combining mock patients and realistic environments to prepare learners with practical experience to meet the demands of modern clinical practice while ensuring patient safety. This research investigated the efficacy of HFPS in medical education through a case study of the Indiana University Bloomington Interprofessional Simulation Center. The goal of this research was to understand the role of simulated learning for attaining clinical selfefficacy and how HFPS training impacts performance. Three research questions were addressed to investigate HFPS in medical education using a mixed methods study design. Clinical competence and self-efficacy were quantified among medical students at IUSMBloomington utilizing HFPS compared to two IUSM campuses that did not incorporate this instructional intervention. Clinical competence was measured as performance on the Objective Structured Clinical Examination (OSCE), while self-efficacy of medical students was measured through a validated questionnaire. Although the effect of HFPS on quantitative results was not definitive, general trends allude to the ability of HFPS to recalibrate learners’ perceived and actual performance. Additionally, perceptual data regarding HFPS from both medical students and medical residents was analyzed. Qualitative results discovered the utility of HFPS for obtaining the clinical mental framework of a physician, fundamental psychomotor skills, and essential practice communicating and functioning as a healthcare team during interprofessional education simulations. Continued studies of HFPS are necessary to fully elucidate the value of this instructional adjunct, however positive outcomes of simulated learning on both medical students and medical residents were discovered in this study contributing to the existing HFPS literature

    Ore sorting using microwave resonant cavities

    Get PDF
    Bibliography: leaves 71-73

    Scalable Hash Tables

    Get PDF
    The term scalability with regards to this dissertation has two meanings: It means taking the best possible advantage of the provided resources (both computational and memory resources) and it also means scaling data structures in the literal sense, i.e., growing the capacity, by “rescaling” the table. Scaling well to computational resources implies constructing the fastest best per- forming algorithms and data structures. On today’s many-core machines the best performance is immediately associated with parallelism. Since CPU frequencies have stopped growing about 10-15 years ago, parallelism is the only way to take ad- vantage of growing computational resources. But for data structures in general and hash tables in particular performance is not only linked to faster computations. The most execution time is actually spent waiting for memory. Thus optimizing data structures to reduce the amount of memory accesses or to take better advantage of the memory hierarchy especially through predictable access patterns and prefetch- ing is just as important. In terms of scaling the size of hash tables we have identified three domains where scaling hash-based data structures have been lacking previously, i.e., space effi- cient growing, concurrent hash tables, and Approximate Membership Query data structures (AMQ-filter). Throughout this dissertation, we describe the problems in these areas and develop efficient solutions. We highlight three different libraries that we have developed over the course of this dissertation, each containing mul- tiple implementations that have shown throughout our testing to be among the best implementations in their respective domains. In this composition they offer a comprehensive toolbox that can be used to solve many kinds of hashing related problems or to develop individual solutions for further ones. DySECT is a library for space efficient hash tables specifically growing space effi- cient hash tables that scale with their input size. It contains the namesake DySECT data structure in addition to a number of different probing and cuckoo based im- plementations. Growt is a library for highly efficient concurrent hash tables. It contains a very fast base table and a number of extensions to adapt this table to match any purpose. All extension can be combined to create a variety of different interfaces. In our extensive experimental evaluation, each adaptation has shown to be among the best hash tables for their specific purpose. Lpqfilter is a library for concurrent approximate membership query (AMQ) data structures. It contains some original data structures, like the linear probing quotient filter, as well as some novel approaches to dynamically sized quotient filters

    Protein sorting to the apical membrane of epithelial cells

    Get PDF
    The structure and functions of lipid rafts and the mechanisms of intracellular membrane trafficking are major topics in current cell biological research. Rafts have been proposed to act as sorting platforms during biosynthetic transport, especially along pathways that deliver proteins to the apical membrane of polarised cells. Based on this, the aim of this work was to contribute to the understanding of apical sorting in epithelial cells. The study of how lipid rafts are structured has been hampered by the scarcity of techniques for their purification. Rafts are thought to be partially resistant to solubilisation by mild detergents, which has made the isolation of detergent-resistant membranes (DRMs) the primary method to characterise them biochemically. While a growing number of detergents is being used to prepare DRMs, it is not clear what can be inferred about the native structure of cell membranes from the composition of different DRMs. This issue was addressed by an analysis of DRMs prepared with a variety of mild detergents. The protein and lipid content of different DRMs from two cell lines, Madin-Darby canine kidney (MDCK) and Jurkat cells, was compared. It was shown that the detergents differed considerably in their ability to selectively solubilise membrane proteins and lipids. These results make it unlikely that different DRMs reflect the same underlying principle of membrane organisation. Another obstacle for understanding apical sorting is that the evidence implicating certain proteins in this process has come from various disparate approaches. It would be helpful to re-examine the putative components of the apical sorting machinery in a single experimental system. To this end, a retroviral system for RNA interference (RNAi) in MDCK cells was established. Efficient suppression of thirteen genes was achieved by retroviral co-expression of short hairpin RNAs and a selectable marker. In addition, the system was extended to simultaneously target two genes, giving rise to double knockdowns.Retroviral RNAi was applied to deplete proteins implicated in apical sorting. Surprisingly, none of the knockdowns analysed caused defects in surface delivery of influenza virus hemagglutinin, a common marker protein for apical transport. Therefore, none of the proteins examined is absolutely required for transport to the apical membrane of MDCK cells. Cells may adapt to the depletion of proteins involved in membrane trafficking by activating alternative pathways. To avoid such adaptation, a visual transport assay was established. It is based on the adenoviral expression of fluorescent marker proteins whose surface transport can be followed microscopically as soon as RNAi has become effective. With this assay, it should now be possible to screen the knockdowns for defects in surface transport. Taken together, this work has provided a number of experimental tools for the study of membrane trafficking in epithelial cells. First, the biochemical analysis of DRMs highlighted that DRMs obtained with different detergents are unlikely to correspond to distinct types of membrane microdomains in cell membranes. Second, the retroviral RNAi system should be valuable for defining the function of proteins, not only in membrane transport, but also in processes like epithelial polarisation. Third, the visual assay for monitoring the surface transport of adenovirally expressed marker proteins should be suitable to detect defects in polarised sorting
    • …
    corecore