22,146 research outputs found

    Online horizontal partitioning of heterogeneous data

    Get PDF
    In an increasing number of use cases, databases face the challenge of managing heterogeneous data. Heterogeneous data is characterized by a quickly evolving variety of entities without a common set of attributes. These entities do not show enough regularity to be captured in a traditional database schema. A common solution is to centralize the diverse entities in a universal table. Usually, this leads to a very sparse table. Although todayā€™s techniques allow efficient storage of sparse universal tables, query efficiency is still a problem. Queries that address only a subset of attributes have to read the whole universal table includingmany irrelevant entities. Asolution is to use a partitioning of the table, which allows pruning partitions of irrelevant entities before they are touched. Creating and maintaining such a partitioning manually is very laborious or even infeasible, due to the enormous complexity. Thus an autonomous solution is desirable. In this article, we define the Online Partitioning Problem for heterogeneous data. We sketch how an optimal solution for this problem can be determined based on hypergraph partitioning. Although it leads to the optimal partitioning, the hypergraph approach is inappropriate for an implementation in a database system. We present Cinderella, an autonomous online algorithm for horizontal partitioning of heterogeneous entities in universal tables. Cinderella is designed to keep its overhead low by operating online; it incrementally assigns entities to partition while they are touched anyway duringmodifications. This enables a reasonable physical database design at runtime instead of static modeling

    A regional scale modeling analysis of aerosol and trace gas distributions over the eastern Pacific during the INTEX-B field campaign

    Get PDF
    The Sulfur Transport and dEposition Model (STEM) is applied to the analysis of observations obtained during the Intercontinental Chemical Transport Experiment-Phase B (INTEX-B), conducted over the eastern Pacific Ocean during spring 2006. Predicted trace gas and aerosol distributions over the Pacific are presented and discussed in terms of transport and source region contributions. Trace species distributions show a strong west (high) to east (low) gradient, with the bulk of the pollutant transport over the central Pacific occurring between similar to 20 degrees N and 50 degrees N in the 2-6 km altitude range. These distributions are evaluated in the eastern Pacific by comparison with the NASA DC-8 and NSF/NCAR C-130 airborne measurements along with observations from the Mt. Bachelor (MBO) surface site. Thirty different meteorological, trace gas and aerosol parameters are compared. In general the meteorological fields are better predicted than gas phase species, which in turn are better predicted than aerosol quantities. PAN is found to be significantly overpredicted over the eastern Pacific, which is attributed to uncertainties in the chemical reaction mechanisms used in current atmospheric chemistry models in general and to the specifically high PAN production in the SAPRC-99 mechanism used in the regional model. A systematic underprediction of the elevated sulfate layer in the eastern Pacific observed by the C-130 is another issue that is identified and discussed. Results from source region tagged CO simulations are used to estimate how the different source regions around the Pacific contribute to the trace gas species distributions. During this period the largest contributions were from China and from fires in South/Southeast and North Asia. For the C-130 flights, which operated off the coast of the Northwest US, the regional CO contributions range as follows: China (35%), South/Southeast Asia fires (35%), North America anthropogenic (20%), and North Asia fires (10%). The transport of pollution into the western US is studied at MBO and a variety of events with elevated Asian dust, and periods with contributions from China and fires from both Asia and North America are discussed. The role of heterogeneous chemistry on the composition over the eastern Pacific is also studied. The impacts of heterogeneous reactions at specific times can be significant, increasing sulfate and nitrate aerosol production and reducing gas phase nitric acid levels appreciably (~50%)

    Testing microelectronic biofluidic systems

    Get PDF
    According to the 2005 International Technology Roadmap for Semiconductors, the integration of emerging nondigital CMOS technologies will require radically different test methods, posing a major challenge for designers and test engineers. One such technology is microelectronic fluidic (MEF) arrays, which have rapidly gained importance in many biological, pharmaceutical, and industrial applications. The advantages of these systems, such as operation speed, use of very small amounts of liquid, on-board droplet detection, signal conditioning, and vast digital signal processing, make them very promising. However, testable design of these devices in a mass-production environment is still in its infancy, hampering their low-cost introduction to the market. This article describes analog and digital MEF design and testing method

    Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine

    Get PDF
    Load balancing, operator instance collocations and horizontal scaling are critical issues in Parallel Stream Processing Engines to achieve low data processing latency, optimized cluster utilization and minimized communication cost respectively. In previous work, these issues are typically tackled separately and independently. We argue that these problems are tightly coupled in the sense that they all need to determine the allocations of workloads and migrate computational states at runtime. Optimizing them independently would result in suboptimal solutions. Therefore, in this paper, we investigate how these three issues can be modeled as one integrated optimization problem. In particular, we first consider jobs where workload allocations have little effect on the communication cost, and model the problem of load balance as a Mixed-Integer Linear Program. Afterwards, we present an extended solution called ALBIC, which support general jobs. We implement the proposed techniques on top of Apache Storm, an open-source Parallel Stream Processing Engine. The extensive experimental results over both synthetic and real datasets show that our techniques clearly outperform existing approaches

    HeteroCore GPU to exploit TLP-resource diversity

    Get PDF
    • ā€¦
    corecore