1,382 research outputs found
Enabling adaptive scientific workflows via trigger detection
Next generation architectures necessitate a shift away from traditional
workflows in which the simulation state is saved at prescribed frequencies for
post-processing analysis. While the need to shift to in~situ workflows has been
acknowledged for some time, much of the current research is focused on static
workflows, where the analysis that would have been done as a post-process is
performed concurrently with the simulation at user-prescribed frequencies.
Recently, research efforts are striving to enable adaptive workflows, in which
the frequency, composition, and execution of computational and data
manipulation steps dynamically depend on the state of the simulation. Adapting
the workflow to the state of simulation in such a data-driven fashion puts
extremely strict efficiency requirements on the analysis capabilities that are
used to identify the transitions in the workflow. In this paper we build upon
earlier work on trigger detection using sublinear techniques to drive adaptive
workflows. Here we propose a methodology to detect the time when sudden heat
release occurs in simulations of turbulent combustion. Our proposed method
provides an alternative metric that can be used along with our former metric to
increase the robustness of trigger detection. We show the effectiveness of our
metric empirically for predicting heat release for two use cases.Comment: arXiv admin note: substantial text overlap with arXiv:1506.0825
Detection of a signal in linear subspace with bounded mismatch
We consider the problem of detecting a signal of interest in a background of noise with unknown covariance matrix, taking into account a possible mismatch between the actual steering vector and the presumed one. We assume that the former belongs to a known linear subspace, up to a fraction of its energy. When the subspace of interest consists of the presumed steering vector, this amounts to assuming that the angle between the actual steering vector and the presumed steering vector is upper bounded. Within this framework, we derive the generalized likelihood ratio test (GLRT). We show that it involves solving a minimization problem with the constraint that the signal of interest lies inside a cone. We present a computationally efficient algorithm to find the maximum likelihood estimator (MLE) based on the Lagrange multiplier technique. Numerical simulations illustrate the performance and the robustness of this new detector, and compare it with the adaptive coherence estimator which assumes that the steering vector lies entirely in a subspace
A Deep Neural Network for Pixel-Level Electromagnetic Particle Identification in the MicroBooNE Liquid Argon Time Projection Chamber
We have developed a convolutional neural network (CNN) that can make a
pixel-level prediction of objects in image data recorded by a liquid argon time
projection chamber (LArTPC) for the first time. We describe the network design,
training techniques, and software tools developed to train this network. The
goal of this work is to develop a complete deep neural network based data
reconstruction chain for the MicroBooNE detector. We show the first
demonstration of a network's validity on real LArTPC data using MicroBooNE
collection plane images. The demonstration is performed for stopping muon and a
charged current neutral pion data samples
Measurement and modelling of spectrum occupancy
Based on the conception of spectrum sharing, cognitive Radio as a promising technology for optimizing utilization of the radio spectrum has emerged to revolutionize the next generation wireless communications industry. In order to adopt this technology, the current spectrum allocation strategy has to be reformed and the real spectrum occupancy information has to be systemically investigated. To assess the feasibility of cognitive radio technology, the statistical information of the present spectral occupancy needs to be examined thoroughly, which forms the basis of the spectrum occupancy project. We studied the 100-2500 MHz spectrum with the traditional radio monitoring systems whose technical details have been fully recorded in this thesis. In order to detect the frequency agile signals, a channel sounder, which is capable of scanning 300 MHz spectrum within 4 ms with multiple channel inputs, was used as a dedicated radio receiver in our measurements. The conclusion of the statistical information from the spectrum monitoring experiments shows that the spectrum occupancy range from 100-2500 MHz are low indeed in the measuring locations and period. The average occupancies for most bands are less than 20%. Especially, the average occupancies in the 1 GHz to 2.5GHz spectrum are less than 5%. Time series analysis was initially introduced in spectrum occupancy analysis as a tool to model spectrum occupancy variations with time. For instance, the time series Airline model fits well the GSM band occupancy data. In this thesis, generalized linear models were used as complementarily solutions to model occupancy data into other parameters such as signal amplitude. The validation of the direction of arrival algorithms (EM and SAGE) was verified with the anechoic chamber, by which we can determine the spectrum occupancy in space domain
Regularized Covariance Matrix Estimation in Complex Elliptically Symmetric Distributions Using the Expected Likelihood Approach - Part 2: The Under-Sampled Case
In the first part of this series of two papers, we extended the expected likelihood approach originally developed in the Gaussian case, to the broader class of complex elliptically symmetric (CES) distributions and complex angular central Gaussian (ACG) distributions. More precisely, we demonstrated that the probability density function (p.d.f.) of the likelihood ratio (LR) for the (unknown) actual scatter matrix \mSigma_{0} does not depend on the latter: it only depends on the density generator for the CES distribution and is distribution-free in the case of ACG distributed data, i.e., it only depends on the matrix dimension and the number of independent training samples , assuming that . Additionally, regularized scatter matrix estimates based on the EL methodology were derived. In this second part, we consider the under-sampled scenario () which deserves a specific treatment since conventional maximum likelihood estimates do not exist. Indeed, inference about the scatter matrix can only be made in the -dimensional subspace spanned by the columns of the data matrix. We extend the results derived under the Gaussian assumption to the CES and ACG class of distributions. Invariance properties of the under-sampled likelihood ratio evaluated at \mSigma_{0} are presented. Remarkably enough, in the ACG case, the p.d.f. of this LR can be written in a rather simple form as a product of beta distributed random variables. The regularized schemes derived in the first part, based on the EL principle, are extended to the under-sampled scenario and assessed through numerical simulations
NEW BIOINFORMATIC TECHNIQUES FOR THE ANALYSIS OF LARGE DATASETS
A new era of chemical analysis is upon us. In the past, a small number of samples were selected from a population for use as a statistical representation of the entire population. More recently, advancements in data collection rate, computer memory, and processing speed have allowed entire populations to be sampled and analyzed. The result is massive amounts of data that convey relatively little information, even though they may contain a lot of information. These large quantities of data have already begun to cause bottlenecks in areas such as genetics, drug development, and chemical imaging. The problem is straightforward: condense a large quantity of data into only the useful portions without ignoring or discarding anything important. Performing the condensation in the hardware of the instrument, before the data ever reach a computer is even better. The research proposed tests the hypothesis that clusters of data may be rapidly identified by linear fitting of quantile-quantile plots produced from each principal component of principal component analysis. Integrated Sensing and Processing (ISP) is tested as a means of generating clusters of principal component scores from samples in a hyperspectral near-field scanning optical microscope. Distances from the centers of these multidimensional cluster centers to all other points in hyperspace can be calculated. The result is a novel digital staining technique for identifying anomalies in hyperspectral microscopic and nanoscopic imaging of human atherosclerotic tissue. This general method can be applied to other analytical problems as well
An introduction to low-level analysis methods of DNA microarray data
This article gives an overview over the methods used in the low--level analysis of gene expression data generated using DNA microarrays. This type of experiment allows to determine relative levels of nucleic acid abundance in a set of tissues or cell populations for thousands of transcripts or loci simultaneously. Careful statistical design and analysis are essential to improve the efficiency and reliability of microarray experiments throughout the data acquisition and analysis process. This includes the design of probes, the experimental design, the image analysis of microarray scanned images, the normalization of fluorescence intensities, the assessment of the quality of microarray data and incorporation of quality information in subsequent analyses, the combination of information across arrays and across sets of experiments, the discovery and recognition of patterns in expression at the single gene and multiple gene levels, and the assessment of significance of these findings, considering the fact that there is a lot of noise and thus random features in the data. For all of these components, access to a flexible and efficient statistical computing environment is an essential aspect
- …