262 research outputs found
Performance of customized DCT quantization tables on scientific data
We show that it is desirable to use data-specific or customized quantization tables for scaling the spatial frequency coefficients obtained using the Discrete Cosine Transform (DCT). DCT is widely used for image and video compression (MP89, PM93) but applications typically use default quantization matrices. Using actual scientific data gathered from divers sources such as spacecrafts and electron-microscopes, we show that the default compression/quality tradeoffs can be significantly improved upon by using customized tables. We also show that significant improvements are possible for the standard test images Lena and Baboon. This work is part of an effort to develop a practical scheme for optimizing quantization matrices for any given image or video stream, under any given quality or compression constraints
Method and system for data clustering for very large databases
Multi-dimensional data contained in very large databases is efficiently and accurately clustered to determine patterns therein and extract useful information from such patterns. Conventional computer processors may be used which have limited memory capacity and conventional operating speed, allowing massive data sets to be processed in a reasonable time and with reasonable computer resources. The clustering process is organized using a clustering feature tree structure wherein each clustering feature comprises the number of data points in the cluster, the linear sum of the data points in the cluster, and the square sum of the data points in the cluster. A dense region of data points is treated collectively as a single cluster, and points in sparsely occupied regions can be treated as outliers and removed from the clustering feature tree. The clustering can be carried out continuously with new data points being received and processed, and with the clustering feature tree being restructured as necessary to accommodate the information from the newly received data points
Multiclass Query Scheduling in Real-Time Database Systems
In recent years, a demand for real-time systems that can manipulate large amounts of shared data has led to the emer-gence of real-time database systems (RTDBS) as a research area. This paper focuses on the problem of scheduling queries in RTDBSs. We introduce and evaluate a new algorithm called Priority Adaptation Query Resource Scheduling (PAQRS) for handling both single class and multiclass query workloads. The performance objective of the algorithm is to minimize the number of missed deadlines, while at the same time ensuring that any deadline misses are scattered across the different classes according to an administratively-defined miss distribution. This objective is achieved by dynamically adapting the system’s admission, mem-ory allocation, and priority assignment policies according to its current resource configuration and workload characteristics. A series of experiments confirms that PAQRS is very effective for real-time query scheduling
Transactional Client-Server Cache Consistency: Alternatives and Performance
Client-server database systems based on a page server model can
exploit client memory resources by caching copies of pages across
transaction boundaries. Caching reduces the need to obtain data from
servers or other sites on the network. In order to ensure that such
caching does not result in the violation of transaction semantics, a cache
consistency maintenance algorithm is required. Many such algorithms have
been proposed in the literature and, as all provide the same
functionality, performance is a primary concern in choosing among them. In
this paper we provide a taxonomy that describes the design space for
transactional cache consistency maintenance algorithms and show how
proposed algorithms relate to one another. We then investigate the
performance of six of these algorithms, and use these results to examine
the tradeoffs inherent in the design choices identified in the taxonomy.
The insight gained in this manner is then used to reflect upon the
characteristics of other algorithms that have been proposed. The results
show that the interactions among dimensions of the design space can impact
performance in many ways, and that classifications of algorithms as simply
Pessimistic" or Optimistic" do not accurately characterize the
similarities and differences among the many possible cache consistency
algorithms.
(Also cross-referenced as UMIACS-TR-95-84
New science on the Open Science Grid
The Open Science Grid (OSG) includes work to enable new science, new scientists, and new modalities in support of computationally based research. There are frequently significant sociological and organizational changes required in transformation from the existing to the new. OSG leverages its deliverables to the large-scale physics experiment member communities to benefit new communities at all scales through activities in education, engagement, and the distributed facility. This paper gives both a brief general description and specific examples of new science enabled on the OSG. More information is available at the OSG web site: www.opensciencegrid.org
Flexible Session Management in a Distributed Environment
Many secure communication libraries used by distributed systems, such as SSL,
TLS, and Kerberos, fail to make a clear distinction between the authentication,
session, and communication layers. In this paper we introduce CEDAR, the secure
communication library used by the Condor High Throughput Computing software,
and present the advantages to a distributed computing system resulting from
CEDAR's separation of these layers. Regardless of the authentication method
used, CEDAR establishes a secure session key, which has the flexibility to be
used for multiple capabilities. We demonstrate how a layered approach to
security sessions can avoid round-trips and latency inherent in network
authentication. The creation of a distinct session management layer allows for
optimizations to improve scalability by way of delegating sessions to other
components in the system. This session delegation creates a chain of trust that
reduces the overhead of establishing secure connections and enables centralized
enforcement of system-wide security policies. Additionally, secure channels
based upon UDP datagrams are often overlooked by existing libraries; we show
how CEDAR's structure accommodates this as well. As an example of the utility
of this work, we show how the use of delegated security sessions and other
techniques inherent in CEDAR's architecture enables US CMS to meet their
scalability requirements in deploying Condor over large-scale, wide-area grid
systems
Validation of rice genome sequence by optical mapping
<p>Abstract</p> <p>Background</p> <p>Rice feeds much of the world, and possesses the simplest genome analyzed to date within the grass family, making it an economically relevant model system for other cereal crops. Although the rice genome is sequenced, validation and gap closing efforts require purely independent means for accurate finishing of sequence build data.</p> <p>Results</p> <p>To facilitate ongoing sequencing finishing and validation efforts, we have constructed a whole-genome SwaI optical restriction map of the rice genome. The physical map consists of 14 contigs, covering 12 chromosomes, with a total genome size of 382.17 Mb; this value is about 11% smaller than original estimates. 9 of the 14 optical map contigs are without gaps, covering chromosomes 1, 2, 3, 4, 5, 7, 8 10, and 12 in their entirety – including centromeres and telomeres. Alignments between optical and <it>in silico </it>restriction maps constructed from IRGSP (International Rice Genome Sequencing Project) and TIGR (The Institute for Genomic Research) genome sequence sources are comprehensive and informative, evidenced by map coverage across virtually all published gaps, discovery of new ones, and characterization of sequence misassemblies; all totalling ~14 Mb. Furthermore, since optical maps are ordered restriction maps, identified discordances are pinpointed on a reliable physical scaffold providing an independent resource for closure of gaps and rectification of misassemblies.</p> <p>Conclusion</p> <p>Analysis of sequence and optical mapping data effectively validates genome sequence assemblies constructed from large, repeat-rich genomes. Given this conclusion we envision new applications of such single molecule analysis that will merge advantages offered by high-resolution optical maps with inexpensive, but short sequence reads generated by emerging sequencing platforms. Lastly, map construction techniques presented here points the way to new types of comparative genome analysis that would focus on discernment of structural differences revealed by optical maps constructed from a broad range of rice subspecies and varieties.</p
Practical resource monitoring for robust high throughput computing
Abstract-Robust high throughput computing requires effective monitoring and enforcement of a variety of resources including CPU cores, memory, disk, and network traffic. Without effective monitoring and enforcement, it is easy to overload machines, causing failures and slowdowns, or underutilize machines, which results in wasted opportunities. This paper explores how to describe, measure, and enforce resources used by computational tasks. We focus on tasks running in distributed execution systems, in which a task requests the resources it needs, and the execution system ensures the availability of such resources. This presents two non-trivial problems: how to measure the resources consumed by a task, and how to monitor and report resource exhaustion in a robust and timely manner. For both of these tasks, operating systems have a variety of mechanisms with different degrees of availability, accuracy, overhead, and intrusiveness. We describe various forms of monitoring and the available mechanisms in contemporary operating systems. We then present two specific monitoring tools that choose different tradeoffs in overhead and accuracy, and evaluate them on a selection of benchmarks
The CMS Integration Grid Testbed
The CMS Integration Grid Testbed (IGT) comprises USCMS Tier-1 and Tier-2
hardware at the following sites: the California Institute of Technology, Fermi
National Accelerator Laboratory, the University of California at San Diego, and
the University of Florida at Gainesville. The IGT runs jobs using the Globus
Toolkit with a DAGMan and Condor-G front end. The virtual organization (VO) is
managed using VO management scripts from the European Data Grid (EDG). Gridwide
monitoring is accomplished using local tools such as Ganglia interfaced into
the Globus Metadata Directory Service (MDS) and the agent based Mona Lisa.
Domain specific software is packaged and installed using the Distrib ution
After Release (DAR) tool of CMS, while middleware under the auspices of the
Virtual Data Toolkit (VDT) is distributed using Pacman. During a continuo us
two month span in Fall of 2002, over 1 million official CMS GEANT based Monte
Carlo events were generated and returned to CERN for analysis while being
demonstrated at SC2002. In this paper, we describe the process that led to one
of the world's first continuously available, functioning grids.Comment: CHEP 2003 MOCT01
- …