44,944 research outputs found
Field spectroradiometer data : acquisition, organisation, processing and analysis on the example of New Zealand native plants : a thesis presented in fulfilment of the requirements for the degree of Master of Philosophy in Earth Science at Massey University, Palmerston North, New Zealand
The purpose of this research was to investigate the acquisition, storage, processing and analysis of hyperspectral data for vegetation applications on the example of New Zealand native plants. Data covering the spectral range 350nm-2500nm were collected with a portable spectroradiometer. Hyperspectral data collection results in large datasets that need pre-processing before any analysis can be carried out. A review of the techniques used since the advent of hyperspectral field data showed the following general procedures were followed: 1. Removal of noisy or uncalibrated bands 2. Data smoothing 3. Reduction of dimensionality 4. Transformation into feature space 5. Analysis techniques Steps 1 to 4 which are concerned with the pre-processing of data were found to be repetitive procedures and thus had a high potential for automation. The pre-processing had a major impact on the results gained in the analysis stage. Finding the ideal pre-processing parameters involved repeated processing of the data. Hyperspectral field data should be stored in a structured way. The utilization of a relational database seemed a logical approach. A hierarchical data structure that reflected the real world and the setup of sampling campaigns was designed. This structure was transformed into a logical data model. Furthermore the database also held information needed for pre-processing and statistical analysis. This enabled the calculation of separability measurements such as the JM (Jeffries Matusila) distance or the application of discriminant analysis. Software was written to provide a graphical user interface to the database and implement pre-processing and analysis functionality. The acquisition, processing and analysis steps were applied to New Zealand native vegetation. A high degree of separability between species was achieved and using independent data a classification accuracy of 87.87% was reached. This outcome required smoothing, Hyperion synthesizing and principal components transformation to be applied to the data prior to the classification which used a generalized squared distance discriminant function. The mixed signature problem was addressed in experiments under controlled laboratory conditions and revealed that certain combinations of plants could not be unmixed successfully while mixtures of vegetation and artificial materials resulted in very good abundance estimations. The combination of a relational database with associated software for data processing was found to be highly efficient when dealing with hyperspectral field data
OpenADAM: an open source genome-wide association data management system for Affymetrix SNP arrays
BACKGROUND: Large scale genome-wide association studies have become popular since the introduction of high throughput genotyping platforms. Efficient management of the vast array of data generated poses many challenges. DESCRIPTION: We have developed an open source web-based data management system for the large amount of genotype data generated from the Affymetrix GeneChip Mapping Array and Affymetrix Genome-Wide Human SNP Array platforms. The database supports genotype calling using DM, BRLMM, BRLMM-P or Birdseed algorithms provided by the Affymetrix Power Tools. The genotype and corresponding pedigree data are stored in a relational database for efficient downstream data manipulation and analysis, such as calculation of allele and genotype frequencies, sample identity checking, and export of genotype data in various file formats for analysis using commonly-available software. A novel method for genotyping error estimation is implemented using linkage disequilibrium information from the HapMap project. All functionalities are accessible via a web-based user interface. CONCLUSION: OpenADAM provides an open source database system for management of Affymetrix genome-wide association SNP data.published_or_final_versio
Parallel Processing of Large Graphs
More and more large data collections are gathered worldwide in various IT
systems. Many of them possess the networked nature and need to be processed and
analysed as graph structures. Due to their size they require very often usage
of parallel paradigm for efficient computation. Three parallel techniques have
been compared in the paper: MapReduce, its map-side join extension and Bulk
Synchronous Parallel (BSP). They are implemented for two different graph
problems: calculation of single source shortest paths (SSSP) and collective
classification of graph nodes by means of relational influence propagation
(RIP). The methods and algorithms are applied to several network datasets
differing in size and structural profile, originating from three domains:
telecommunication, multimedia and microblog. The results revealed that
iterative graph processing with the BSP implementation always and
significantly, even up to 10 times outperforms MapReduce, especially for
algorithms with many iterations and sparse communication. Also MapReduce
extension based on map-side join usually noticeably presents better efficiency,
although not as much as BSP. Nevertheless, MapReduce still remains the good
alternative for enormous networks, whose data structures do not fit in local
memories.Comment: Preprint submitted to Future Generation Computer System
Context-Free Path Querying by Matrix Multiplication
Graph data models are widely used in many areas, for example, bioinformatics,
graph databases. In these areas, it is often required to process queries for
large graphs. Some of the most common graph queries are navigational queries.
The result of query evaluation is a set of implicit relations between nodes of
the graph, i.e. paths in the graph. A natural way to specify these relations is
by specifying paths using formal grammars over the alphabet of edge labels. An
answer to a context-free path query in this approach is usually a set of
triples (A, m, n) such that there is a path from the node m to the node n,
whose labeling is derived from a non-terminal A of the given context-free
grammar. This type of queries is evaluated using the relational query
semantics. Another example of path query semantics is the single-path query
semantics which requires presenting a single path from the node m to the node
n, whose labeling is derived from a non-terminal A for all triples (A, m, n)
evaluated using the relational query semantics. There is a number of algorithms
for query evaluation which use these semantics but all of them perform poorly
on large graphs. One of the most common technique for efficient big data
processing is the use of a graphics processing unit (GPU) to perform
computations, but these algorithms do not allow to use this technique
efficiently. In this paper, we show how the context-free path query evaluation
using these query semantics can be reduced to the calculation of the matrix
transitive closure. Also, we propose an algorithm for context-free path query
evaluation which uses relational query semantics and is based on matrix
operations that make it possible to speed up computations by using a GPU.Comment: 9 pages, 11 figures, 2 table
Model-driven performance evaluation for service engineering
Service engineering and service-oriented architecture as an
integration and platform technology is a recent approach to software systems integration. Software quality aspects such as performance are of central importance for the integration of heterogeneous, distributed service-based systems. Empirical performance evaluation is a process of
measuring and calculating performance metrics of the implemented software. We present an approach for the empirical, model-based performance evaluation of services and service compositions in the context of model-driven service engineering. Temporal databases theory is utilised
for the empirical performance evaluation of model-driven developed service systems
Automatic Unbounded Verification of Alloy Specifications with Prover9
Alloy is an increasingly popular lightweight specification language based on
relational logic. Alloy models can be automatically verified within a bounded
scope using off-the-shelf SAT solvers. Since false assertions can usually be
disproved using small counter-examples, this approach suffices for most
applications. Unfortunately, it can sometimes lead to a false sense of
security, and in critical applications a more traditional unbounded proof may
be required. The automatic theorem prover Prover9 has been shown to be
particularly effective for proving theorems of relation algebras [7], a
quantifier-free (or point-free) axiomatization of a fragment of relational
logic. In this paper we propose a translation from Alloy specifications to fork
algebras (an extension of relation algebras with the same expressive power as
relational logic) which enables their unbounded verification in Prover9. This
translation covers not only logic assertions, but also the structural aspects
(namely type declarations), and was successfully implemented and applied to
several examples
AiiDA: Automated Interactive Infrastructure and Database for Computational Science
Computational science has seen in the last decades a spectacular rise in the
scope, breadth, and depth of its efforts. Notwithstanding this prevalence and
impact, it is often still performed using the renaissance model of individual
artisans gathered in a workshop, under the guidance of an established
practitioner. Great benefits could follow instead from adopting concepts and
tools coming from computer science to manage, preserve, and share these
computational efforts. We illustrate here our paradigm sustaining such vision,
based around the four pillars of Automation, Data, Environment, and Sharing. We
then discuss its implementation in the open-source AiiDA platform
(http://www.aiida.net), that has been tuned first to the demands of
computational materials science. AiiDA's design is based on directed acyclic
graphs to track the provenance of data and calculations, and ensure
preservation and searchability. Remote computational resources are managed
transparently, and automation is coupled with data storage to ensure
reproducibility. Last, complex sequences of calculations can be encoded into
scientific workflows. We believe that AiiDA's design and its sharing
capabilities will encourage the creation of social ecosystems to disseminate
codes, data, and scientific workflows.Comment: 30 pages, 7 figure
XQ2P: Efficient XQuery P2P Time Series Processing
In this demonstration, we propose a model for the management of XML time
series (TS), using the new XQuery 1.1 window operator. We argue that
centralized computation is slow, and demonstrate XQ2P, our prototype of
efficient XQuery P2P TS computation in the context of financial analysis of
large data sets (>1M values)
- …