746 research outputs found
The Helioseismic and Magnetic Imager (HMI) Vector Magnetic Field Pipeline: SHARPs -- Space-weather HMI Active Region Patches
A new data product from the Helioseismic and Magnetic Imager (HMI) onboard
the Solar Dynamics Observatory (SDO) called Space-weather HMI Active Region
Patches (SHARPs) is now available. SDO/HMI is the first space-based instrument
to map the full-disk photospheric vector magnetic field with high cadence and
continuity. The SHARP data series provide maps in patches that encompass
automatically tracked magnetic concentrations for their entire lifetime; map
quantities include the photospheric vector magnetic field and its uncertainty,
along with Doppler velocity, continuum intensity, and line-of-sight magnetic
field. Furthermore, keywords in the SHARP data series provide several
parameters that concisely characterize the magnetic-field distribution and its
deviation from a potential-field configuration. These indices may be useful for
active-region event forecasting and for identifying regions of interest. The
indices are calculated per patch and are available on a twelve-minute cadence.
Quick-look data are available within approximately three hours of observation;
definitive science products are produced approximately five weeks later. SHARP
data are available at http://jsoc.stanford.edu and maps are available in either
of two different coordinate systems. This article describes the SHARP data
products and presents examples of SHARP data and parameters.Comment: 27 pages, 7 figures. Accepted to Solar Physic
The Helioseismic and Magnetic Imager (HMI) Vector Magnetic Field Pipeline: Overview and Performance
The Helioseismic and Magnetic Imager (HMI) began near-continuous full-disk
solar measurements on 1 May 2010 from the Solar Dynamics Observatory (SDO). An
automated processing pipeline keeps pace with observations to produce
observable quantities, including the photospheric vector magnetic field, from
sequences of filtergrams. The primary 720s observables were released in mid
2010, including Stokes polarization parameters measured at six wavelengths as
well as intensity, Doppler velocity, and the line-of-sight magnetic field. More
advanced products, including the full vector magnetic field, are now available.
Automatically identified HMI Active Region Patches (HARPs) track the location
and shape of magnetic regions throughout their lifetime.
The vector field is computed using the Very Fast Inversion of the Stokes
Vector (VFISV) code optimized for the HMI pipeline; the remaining 180 degree
azimuth ambiguity is resolved with the Minimum Energy (ME0) code. The
Milne-Eddington inversion is performed on all full-disk HMI observations. The
disambiguation, until recently run only on HARP regions, is now implemented for
the full disk. Vector and scalar quantities in the patches are used to derive
active region indices potentially useful for forecasting; the data maps and
indices are collected in the SHARP data series, hmi.sharp_720s. Patches are
provided in both CCD and heliographic coordinates.
HMI provides continuous coverage of the vector field, but has modest spatial,
spectral, and temporal resolution. Coupled with limitations of the analysis and
interpretation techniques, effects of the orbital velocity, and instrument
performance, the resulting measurements have a certain dynamic range and
sensitivity and are subject to systematic errors and uncertainties that are
characterized in this report.Comment: 42 pages, 19 figures, accepted to Solar Physic
ROOT - A C++ Framework for Petabyte Data Storage, Statistical Analysis and Visualization
ROOT is an object-oriented C++ framework conceived in the high-energy physics
(HEP) community, designed for storing and analyzing petabytes of data in an
efficient way. Any instance of a C++ class can be stored into a ROOT file in a
machine-independent compressed binary format. In ROOT the TTree object
container is optimized for statistical data analysis over very large data sets
by using vertical data storage techniques. These containers can span a large
number of files on local disks, the web, or a number of different shared file
systems. In order to analyze this data, the user can chose out of a wide set of
mathematical and statistical functions, including linear algebra classes,
numerical algorithms such as integration and minimization, and various methods
for performing regression analysis (fitting). In particular, ROOT offers
packages for complex data modeling and fitting, as well as multivariate
classification based on machine learning techniques. A central piece in these
analysis tools are the histogram classes which provide binning of one- and
multi-dimensional data. Results can be saved in high-quality graphical formats
like Postscript and PDF or in bitmap formats like JPG or GIF. The result can
also be stored into ROOT macros that allow a full recreation and rework of the
graphics. Users typically create their analysis macros step by step, making use
of the interactive C++ interpreter CINT, while running over small data samples.
Once the development is finished, they can run these macros at full compiled
speed over large data sets, using on-the-fly compilation, or by creating a
stand-alone batch program. Finally, if processing farms are available, the user
can reduce the execution time of intrinsically parallel tasks - e.g. data
mining in HEP - by using PROOF, which will take care of optimally distributing
the work over the available resources in a transparent way
The FZ Strategy to Compress the Bitmap Index for Data Warehouses
Data warehouses contain data consolidated from several operational databases and provide the historical, and summarized data which is more appropriate for analysis than detail, individual records. Fast response time is essential for on-line decision support. A bitmap index could reach this goal in read-mostly environments. For the data with high cardinality in data warehouses, a bitmap index consists of a lot of bitmap vectors, and the size of the bitmap index could be much larger than the capacity of the disk. The WAH strategy has been presented to solve the storage overhead. However, when the bit density and clustering factor of 1\u27s increase, the bit strings of the WAH strategy become less compressible. Therefore, in this paper, we propose the FZ strategy which compresses each bitmap vector to reduce the size of the storage space and provide efficient bitwise operations without decompressing these bitmap vectors. From our performance simulation, the FZ strategy could reduce the storage space more than the WAH strategy
Reordering Rows for Better Compression: Beyond the Lexicographic Order
Sorting database tables before compressing them improves the compression
rate. Can we do better than the lexicographical order? For minimizing the
number of runs in a run-length encoding compression scheme, the best approaches
to row-ordering are derived from traveling salesman heuristics, although there
is a significant trade-off between running time and compression. A new
heuristic, Multiple Lists, which is a variant on Nearest Neighbor that trades
off compression for a major running-time speedup, is a good option for very
large tables. However, for some compression schemes, it is more important to
generate long runs rather than few runs. For this case, another novel
heuristic, Vortex, is promising. We find that we can improve run-length
encoding up to a factor of 3 whereas we can improve prefix coding by up to 80%:
these gains are on top of the gains due to lexicographically sorting the table.
We prove that the new row reordering is optimal (within 10%) at minimizing the
runs of identical values within columns, in a few cases.Comment: to appear in ACM TOD
Accelerating Network Traffic Analytics Using Query-DrivenVisualization
Realizing operational analytics solutions where large and complex data must be analyzed in a time-critical fashion entails integrating many different types of technology. This paper focuses on an interdisciplinary combination of scientific data management and visualization/analysis technologies targeted at reducing the time required for data filtering, querying, hypothesis testing and knowledge discovery in the domain of network connection data analysis. We show that use of compressed bitmap indexing can quickly answer queries in an interactive visual data analysis application, and compare its performance with two alternatives for serial and parallel filtering/querying on 2.5 billion records worth of network connection data collected over a period of 42 weeks. Our approach to visual network connection data exploration centers on two primary factors: interactive ad-hoc and multiresolution query formulation and execution over n dimensions and visual display of then-dimensional histogram results. This combination is applied in a case study to detect a distributed network scan and to then identify the set of remote hosts participating in the attack. Our approach is sufficiently general to be applied to a diverse set of data understanding problems as well as used in conjunction with a diverse set of analysis and visualization tools
The NWRA Classification Infrastructure: Description and Extension to the Discriminant Analysis Flare Forecasting System (DAFFS)
A classification infrastructure built upon Discriminant Analysis has been
developed at NorthWest Research Associates for examining the statistical
differences between samples of two known populations. Originating to examine
the physical differences between flare-quiet and flare-imminent solar active
regions, we describe herein some details of the infrastructure including:
parametrization of large datasets, schemes for handling "null" and "bad" data
in multi-parameter analysis, application of non-parametric multi-dimensional
Discriminant Analysis, an extension through Bayes' theorem to probabilistic
classification, and methods invoked for evaluating classifier success. The
classifier infrastructure is applicable to a wide range of scientific questions
in solar physics. We demonstrate its application to the question of
distinguishing flare-imminent from flare-quiet solar active regions, updating
results from the original publications that were based on different data and
much smaller sample sizes. Finally, as a demonstration of "Research to
Operations" efforts in the space-weather forecasting context, we present the
Discriminant Analysis Flare Forecasting System (DAFFS), a near-real-time
operationally-running solar flare forecasting tool that was developed from the
research-directed infrastructure.Comment: J. Space Weather Space Climate: Accepted / in press; access
supplementary materials through journal; some figures are less than full
resolution for arXi
Hybrid query optimization for hard-to-compress bit-vectors
Bit-vectors are widely used for indexing and summarizing data due to their efficient processing in modern computers. Sparse bit-vectors can be further compressed to reduce their space requirement. Special compression schemes based on run-length encoders have been designed to avoid explicit decompression and minimize the decoding overhead during query execution. Moreover, highly compressed bit-vectors can exhibit a faster query time than the non-compressed ones. However, for hard-to-compress bit-vectors, compression does not speed up queries and can add considerable overhead. In these cases, bit-vectors are often stored verbatim (non-compressed). On the other hand, queries are answered by executing a cascade of bit-wise operations involving indexed bit-vectors and intermediate results. Often, even when the original bit-vectors are hard to compress, the intermediate results become sparse. It could be feasible to improve query performance by compressing these bit-vectors as the query is executed. In this scenario, it would be necessary to operate verbatim and compressed bit-vectors together. In this paper, we propose a hybrid framework where compressed and verbatim bitmaps can coexist and design algorithms to execute queries under this hybrid model. Our query optimizer is able to decide at run time when to compress or decompress a bit-vector. Our heuristics show that the applications using higher-density bitmaps can benefit from using this hybrid model, improving both their query time and memory utilization
- …