197 research outputs found

    Parallel Rendering and Large Data Visualization

    Full text link
    We are living in the big data age: An ever increasing amount of data is being produced through data acquisition and computer simulations. While large scale analysis and simulations have received significant attention for cloud and high-performance computing, software to efficiently visualise large data sets is struggling to keep up. Visualization has proven to be an efficient tool for understanding data, in particular visual analysis is a powerful tool to gain intuitive insight into the spatial structure and relations of 3D data sets. Large-scale visualization setups are becoming ever more affordable, and high-resolution tiled display walls are in reach even for small institutions. Virtual reality has arrived in the consumer space, making it accessible to a large audience. This thesis addresses these developments by advancing the field of parallel rendering. We formalise the design of system software for large data visualization through parallel rendering, provide a reference implementation of a parallel rendering framework, introduce novel algorithms to accelerate the rendering of large amounts of data, and validate this research and development with new applications for large data visualization. Applications built using our framework enable domain scientists and large data engineers to better extract meaning from their data, making it feasible to explore more data and enabling the use of high-fidelity visualization installations to see more detail of the data.Comment: PhD thesi

    On pulsar radio emission

    Get PDF
    This work intends to contribute to the understanding of the radio emission of pulsars. Pulsars are neutron stars with a radius of about 10^6 cm and a mass of about one to three solar masses, that rotate with a period between seconds and milliseconds. They exhibit tremendous magnetic fields of 10^8 to 10^13 Gauss. These fields facilitate the conversion of rotational energy to mainly dipole radiation, x-ray emission and the pulsar wind. Less than a thousandth of the total energy loss is being emitted as radio emission. This contribution however is generated by a collective plasma radiation process that acts coherently on a time scale of nanoseconds and below. Since the topic has been an active field of research for nearly half a century, we introduce the resulting theoretical concepts and ideas for an emission process and the appearance of the so called “magnetosphere”, the plasma filled volume around a pulsar, in Chapter 1. We show that many basic questions have been answered satisfactorily. Questions concerning the emission process, however, suffer some uncertainty. Especially the exact energy source of the radio emission remains unclear. The early works of Goldreich and Julian [1969] and Ruderman and Sutherland [1975] predict high electric fields to arise that are capable of driving a strong electric current. To supplement the energy to power the radio emission, rather mildly relativistic particle energies and a moderate current are favourable. How the system converts current into flow is unclear. In fact, the earlier theories are opposed by recent simulations that also do not predict a relativistic flow near the pulsar. We examine the observed radiation and its form, especially in light of the illustrated models in Chapter 2. We notice that the radio emission is generated in extremely short time scales, that are comparable to the inverse of the Plasma frequency. We elaborate why this places high demands on the theoretical models leaving in fact only one viable candidate process. We conclude that profound questions of energy flow and energy source remain unanswered by current theory. Furthermore, the compression of available energy in space and time to a few centimetres and nanoseconds remains unclear, especially when facing the fact that only a small fraction of the theoretically available energy is being converted. Since the fluctuations relevant for the compression of the energy take place on an intermediate scale of nanoseconds to micro- and milliseconds, it should be possible to detect these observationally. To facilitate this, we decide to analyse the statistics of the Receiver equation of radio radiation in Chapter 3, also since this is relevant to other topics of Pulsar research. The results presented in Chapter 4 show that the developed Bayesian method excels conventional methods to extract parameters from observation data in both precision and accuracy. The method for example weights rotation phase measurements differently than conventional techniques and assigns a more accurate error estimation to single measurements. This is of great relevance to gravitational wave search with so called “pulsar timing array”, as the validity of the total measurement is substantially dependent on the understanding of the accuracy assigned to the single observations. However, the work on single observation data with Bayesian techniques also exemplifies the numerical limits of this method. It is desirable to enable algorithms to include single observation data in the analysis. Therefore we developed a runtime library that writes out currently unneeded data to hard disk, being capable to manage huge data sets (substantial fractions of the hard disk space, not the main memory) in Chapter 5. This library has been written in a generic form so that it can be also used in other data-intensive areas of research. While we thereby lay the foundations to evaluate fluctuation models by observational data, we approach the problem from theoretical grounds in Chapter 6. We propose that the energetic coupling of radio emission could be of magnetic origin, as this is also a relevant mechanism in solar flare physics. We argue in a general way that the rotation of the pulsar pumps energy into the magnetic field, due to topological reasons. This energy can be released again by current decay. We show that already the annihilation of electrons and positrons may suffice to generate radio emission on non-negligible energy scales. This mechanism is not dependent on relativistic flow and thus does not suffer from the problem of requiring high kinetic particle energies. We conclude that the existing gaps in the theory of the radio emission process could possibly be closed in the future, if we analyse observational data statistically more accurate and especially if we put more effort into understanding the problem of energy transport. This thesis serves as an example that scientific investigation of a very theoretical question such as the origin of radio emission can lead to results that may be used directly in other Areas of research

    Case Studies on Optimizing Algorithms for GPU Architectures

    Get PDF
    Modern GPUs are complex, massively multi-threaded, and high-performance. Programmers naturally gravitate towards taking advantage of this high performance for achieving faster results. However, in order to do so successfully, programmers must first understand and then master a new set of skills – writing parallel code, using different types of parallelism, adapting to GPU architectural features, and understanding issues that limit performance. In order to ease this learning process and help GPU programmers become productive more quickly, this dissertation introduces three data access skeletons (DASks) – Block, Column, and Row -- and two block access skeletons (BASks) – Block-By-Block and Warp-by-Warp. Each “skeleton” provides a high-performance implementation framework that partitions data arrays into data blocks and then iterates over those blocks. The programmer must still write “body” methods on individual data blocks to solve their specific problem. These skeletons provide efficient machine dependent data access patterns for use on GPUs. DASks group n data elements into m fixed size data blocks. These m data block are then partitioned across p thread blocks using a 1D or 2D layout pattern. The fixed-size data blocks are parameterized using three C++ template parameters – nWork, WarpSize, and nWarps. Generic programming techniques use these three parameters to enable performance experiments on three different types of parallelism – instruction-level parallelism (ILP), data-level parallelism (DLP), and thread-level parallelism (TLP). These different DASks and BASks are introduced using a simple memory I/O (Copy) case study. A nearest neighbor search case study resulted in the development of DASKs and BASks but does not use these skeletons itself. Three additional case studies – Reduce/Scan, Histogram, and Radix Sort -- demonstrate DASks and BASks in action on parallel primitives and also provides more valuable performance lessons.Doctor of Philosoph

    On pulsar radio emission

    Get PDF
    This work intends to contribute to the understanding of the radio emission of pulsars. Pulsars are neutron stars with a radius of about 10^6 cm and a mass of about one to three solar masses, that rotate with a period between seconds and milliseconds. They exhibit tremendous magnetic fields of 10^8 to 10^13 Gauss. These fields facilitate the conversion of rotational energy to mainly dipole radiation, x-ray emission and the pulsar wind. Less than a thousandth of the total energy loss is being emitted as radio emission. This contribution however is generated by a collective plasma radiation process that acts coherently on a time scale of nanoseconds and below. Since the topic has been an active field of research for nearly half a century, we introduce the resulting theoretical concepts and ideas for an emission process and the appearance of the so called “magnetosphere”, the plasma filled volume around a pulsar, in Chapter 1. We show that many basic questions have been answered satisfactorily. Questions concerning the emission process, however, suffer some uncertainty. Especially the exact energy source of the radio emission remains unclear. The early works of Goldreich and Julian [1969] and Ruderman and Sutherland [1975] predict high electric fields to arise that are capable of driving a strong electric current. To supplement the energy to power the radio emission, rather mildly relativistic particle energies and a moderate current are favourable. How the system converts current into flow is unclear. In fact, the earlier theories are opposed by recent simulations that also do not predict a relativistic flow near the pulsar. We examine the observed radiation and its form, especially in light of the illustrated models in Chapter 2. We notice that the radio emission is generated in extremely short time scales, that are comparable to the inverse of the Plasma frequency. We elaborate why this places high demands on the theoretical models leaving in fact only one viable candidate process. We conclude that profound questions of energy flow and energy source remain unanswered by current theory. Furthermore, the compression of available energy in space and time to a few centimetres and nanoseconds remains unclear, especially when facing the fact that only a small fraction of the theoretically available energy is being converted. Since the fluctuations relevant for the compression of the energy take place on an intermediate scale of nanoseconds to micro- and milliseconds, it should be possible to detect these observationally. To facilitate this, we decide to analyse the statistics of the Receiver equation of radio radiation in Chapter 3, also since this is relevant to other topics of Pulsar research. The results presented in Chapter 4 show that the developed Bayesian method excels conventional methods to extract parameters from observation data in both precision and accuracy. The method for example weights rotation phase measurements differently than conventional techniques and assigns a more accurate error estimation to single measurements. This is of great relevance to gravitational wave search with so called “pulsar timing array”, as the validity of the total measurement is substantially dependent on the understanding of the accuracy assigned to the single observations. However, the work on single observation data with Bayesian techniques also exemplifies the numerical limits of this method. It is desirable to enable algorithms to include single observation data in the analysis. Therefore we developed a runtime library that writes out currently unneeded data to hard disk, being capable to manage huge data sets (substantial fractions of the hard disk space, not the main memory) in Chapter 5. This library has been written in a generic form so that it can be also used in other data-intensive areas of research. While we thereby lay the foundations to evaluate fluctuation models by observational data, we approach the problem from theoretical grounds in Chapter 6. We propose that the energetic coupling of radio emission could be of magnetic origin, as this is also a relevant mechanism in solar flare physics. We argue in a general way that the rotation of the pulsar pumps energy into the magnetic field, due to topological reasons. This energy can be released again by current decay. We show that already the annihilation of electrons and positrons may suffice to generate radio emission on non-negligible energy scales. This mechanism is not dependent on relativistic flow and thus does not suffer from the problem of requiring high kinetic particle energies. We conclude that the existing gaps in the theory of the radio emission process could possibly be closed in the future, if we analyse observational data statistically more accurate and especially if we put more effort into understanding the problem of energy transport. This thesis serves as an example that scientific investigation of a very theoretical question such as the origin of radio emission can lead to results that may be used directly in other Areas of research

    NLP Methods in Host-based Intrusion Detection Systems: A Systematic Review and Future Directions

    Full text link
    Host based Intrusion Detection System (HIDS) is an effective last line of defense for defending against cyber security attacks after perimeter defenses (e.g., Network based Intrusion Detection System and Firewall) have failed or been bypassed. HIDS is widely adopted in the industry as HIDS is ranked among the top two most used security tools by Security Operation Centers (SOC) of organizations. Although effective and efficient HIDS is highly desirable for industrial organizations, the evolution of increasingly complex attack patterns causes several challenges resulting in performance degradation of HIDS (e.g., high false alert rate creating alert fatigue for SOC staff). Since Natural Language Processing (NLP) methods are better suited for identifying complex attack patterns, an increasing number of HIDS are leveraging the advances in NLP that have shown effective and efficient performance in precisely detecting low footprint, zero day attacks and predicting the next steps of attackers. This active research trend of using NLP in HIDS demands a synthesized and comprehensive body of knowledge of NLP based HIDS. Thus, we conducted a systematic review of the literature on the end to end pipeline of the use of NLP in HIDS development. For the end to end NLP based HIDS development pipeline, we identify, taxonomically categorize and systematically compare the state of the art of NLP methods usage in HIDS, attacks detected by these NLP methods, datasets and evaluation metrics which are used to evaluate the NLP based HIDS. We highlight the relevant prevalent practices, considerations, advantages and limitations to support the HIDS developers. We also outline the future research directions for the NLP based HIDS development

    Achieving Energy Efficiency on Networking Systems with Optimization Algorithms and Compressed Data Structures

    Get PDF
    To cope with the increasing quantity, capacity and energy consumption of transmission and routing equipment in the Internet, energy efficiency of communication networks has attracted more and more attention from researchers around the world. In this dissertation, we proposed three methodologies to achieve energy efficiency on networking devices: the NP-complete problems and heuristics, the compressed data structures, and the combination of the first two methods. We first consider the problem of achieving energy efficiency in Data Center Networks (DCN). We generalize the energy efficiency networking problem in data centers as optimal flow assignment problems, which is NP-complete, and then propose a heuristic called CARPO, a correlation-aware power optimization algorithm, that dynamically consolidate traffic flows onto a small set of links and switches in a DCN and then shut down unused network devices for power savings. We then achieve energy efficiency on Internet routers by using the compressive data structure. A novel data structure called the Probabilistic Bloom Filter (PBF), which extends the classical bloom filter into the probabilistic direction, so that it can effectively identify heavy hitters with a small memory foot print to reduce energy consumption of network measurement. To achieve energy efficiency on Wireless Sensor Networks (WSN), we developed one data collection protocol called EDAL, which stands for Energy-efficient Delay-aware Lifetime-balancing data collection. Based on the Open Vehicle Routing problem, EDAL exploits the topology requirements of Compressive Sensing (CS), then implement CS to save more energy on sensor nodes

    Goddard Conference on Mass Storage Systems and Technologies, Volume 1

    Get PDF
    Copies of nearly all of the technical papers and viewgraphs presented at the Goddard Conference on Mass Storage Systems and Technologies held in Sep. 1992 are included. The conference served as an informational exchange forum for topics primarily relating to the ingestion and management of massive amounts of data and the attendant problems (data ingestion rates now approach the order of terabytes per day). Discussion topics include the IEEE Mass Storage System Reference Model, data archiving standards, high-performance storage devices, magnetic and magneto-optic storage systems, magnetic and optical recording technologies, high-performance helical scan recording systems, and low end helical scan tape drives. Additional topics addressed the evolution of the identifiable unit for processing purposes as data ingestion rates increase dramatically, and the present state of the art in mass storage technology

    Content-aware compression for big textual data analysis

    Get PDF
    A substantial amount of information on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. This thesis proposes techniques for more efficient textual big data analysis suitable for the Hadoop analytic platform. This research explores the direct processing of compressed textual data. The focus is on developing novel compression methods with a number of desirable properties to support text-based big data analysis in distributed environments. The novel contributions of this work include the following. Firstly, a Content-aware Partial Compression (CaPC) scheme is developed. CaPC makes a distinction between informational and functional content in which only the informational content is compressed. Thus, the compressed data is made transparent to existing software libraries which often rely on functional content to work. Secondly, a context-free bit-oriented compression scheme (Approximated Huffman Compression) based on the Huffman algorithm is developed. This uses a hybrid data structure that allows pattern searching in compressed data in linear time. Thirdly, several modern compression schemes have been extended so that the compressed data can be safely split with respect to logical data records in distributed file systems. Furthermore, an innovative two layer compression architecture is used, in which each compression layer is appropriate for the corresponding stage of data processing. Peripheral libraries are developed that seamlessly link the proposed compression schemes to existing analytic platforms and computational frameworks, and also make the use of the compressed data transparent to developers. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements
    corecore