17,054 research outputs found
Metric Learning for Temporal Sequence Alignment
In this paper, we propose to learn a Mahalanobis distance to perform
alignment of multivariate time series. The learning examples for this task are
time series for which the true alignment is known. We cast the alignment
problem as a structured prediction task, and propose realistic losses between
alignments for which the optimization is tractable. We provide experiments on
real data in the audio to audio context, where we show that the learning of a
similarity measure leads to improvements in the performance of the alignment
task. We also propose to use this metric learning framework to perform feature
selection and, from basic audio features, build a combination of these with
better performance for the alignment
Site investigation techniques for DNAPL source and plume zone characterisation
Establishing the location of the Source Area BioREmediation (SABRE)
research cell was a primary objective of the site characterisation
programme. This bulletin describes the development of a two-stage site
characterisation methodology that combined qualitative and
quantitative data to guide and inform an assessment of dense nonaqueous
phase liquid (DNAPL) distribution at the site.
DNAPL site characterisation has traditionally involved multiple phases of
site investigation, characterised by rigid sampling and analysis
programmes, expensive mobilisations and long decision-making
timeframes (Crumbling, 2001a) , resulting in site investigations that are
costly and long in duration. Here we follow the principles of an
innovative framework, termed Triad (Crumbling, 2001a, 2001b;
Crumbling et al., 2001, Crumbling et al. 2003), which describes a
systematic approach for the characterisation and remediation of
contaminated sites. The Triad approach to site characterisation focuses
on three main components: a) systematic planning which is
implemented with a preliminary conceptual site model from existing
data. The desired outcomes are planned and decision uncertainties are
evaluated; b) dynamic work strategies that focus on the need for
flexibility as site characterisation progresses so that new information can
guide the investigation in real-time and c) real-time measurement
technologies that are critical in making dynamic work strategies
possible.
Key to this approach is the selection of suitable measurement
technologies, of which there are two main categories (Crumbling et al.,
2003). The first category provides qualitative, dense spatial data, often
with detection limits over a preset value. These methods are generally of
lower cost, produce real-time data and are primarily used to identify site
areas that require further investigation. Examples of such "decisionquality"
methods are laser induced fluorescence (Kram et al., 2001),
membrane interface probing (McAndrews et al., 2003) and cone
penetrometer testing (Robertson, 1990), all of which produce data in
continuous vertical profiles. Because these methods are rapid, many
profiles can be generated and hence the subsurface data density is
greatly improved. These qualitative results are used to guide the
sampling strategy for the application of the second category of
technologies that generate quantitative, precise data that have low
detection limits and are analyte-specific. These methods tend to be high
cost with long turnaround times that preclude on-site decision making,
hence applying them to quantify rather than produce a conceptual
model facilitates a key cost saving. Examples include instrumental
laboratory analyses such as soil solvent extractions (Parker et al., 2004)and water analyses (USEPA, 1996). Where these two categories of
measurement technologies are used in tandem, a more complete and
accurate dataset is achieved without additional site mobilisations.
The aim of the site characterisation programme at the SABRE site was to
delineate the DNAPL source zone rapidly and identify a location for the
in situ research cell. The site characterisation objectives were to; a) test
whether semi-quantitative measurement techniques could reliably
determine geological interfaces, contaminant mass distribution and
inform the initial site conceptual model; and b) quantitatively determine
DNAPL source zone distribution, guided by the qualitative site
conceptual model
A Survey on Mapping Semi-Structured Data and Graph Data to Relational Data
The data produced by various services should be stored and managed in an appropriate format for gaining valuable knowledge conveniently. This leads to the emergence of various data models, including relational, semi-structured, and graph models, and so on. Considering the fact that the mature relational databases established on relational data models are still predominant in today's market, it has fueled interest in storing and processing semi-structured data and graph data in relational databases so that mature and powerful relational databases' capabilities can all be applied to these various data. In this survey, we review existing methods on mapping semi-structured data and graph data into relational tables, analyze their major features, and give a detailed classification of those methods. We also summarize the merits and demerits of each method, introduce open research challenges, and present future research directions. With this comprehensive investigation of existing methods and open problems, we hope this survey can motivate new mapping approaches through drawing lessons from eachmodel's mapping strategies, aswell as a newresearch topic - mapping multi-model data into relational tables.Peer reviewe
Storage Solutions for Big Data Systems: A Qualitative Study and Comparison
Big data systems development is full of challenges in view of the variety of
application areas and domains that this technology promises to serve.
Typically, fundamental design decisions involved in big data systems design
include choosing appropriate storage and computing infrastructures. In this age
of heterogeneous systems that integrate different technologies for optimized
solution to a specific real world problem, big data system are not an exception
to any such rule. As far as the storage aspect of any big data system is
concerned, the primary facet in this regard is a storage infrastructure and
NoSQL seems to be the right technology that fulfills its requirements. However,
every big data application has variable data characteristics and thus, the
corresponding data fits into a different data model. This paper presents
feature and use case analysis and comparison of the four main data models
namely document oriented, key value, graph and wide column. Moreover, a feature
analysis of 80 NoSQL solutions has been provided, elaborating on the criteria
and points that a developer must consider while making a possible choice.
Typically, big data storage needs to communicate with the execution engine and
other processing and visualization technologies to create a comprehensive
solution. This brings forth second facet of big data storage, big data file
formats, into picture. The second half of the research paper compares the
advantages, shortcomings and possible use cases of available big data file
formats for Hadoop, which is the foundation for most big data computing
technologies. Decentralized storage and blockchain are seen as the next
generation of big data storage and its challenges and future prospects have
also been discussed
An efficient parallel immersed boundary algorithm using a pseudo-compressible fluid solver
We propose an efficient algorithm for the immersed boundary method on
distributed-memory architectures, with the computational complexity of a
completely explicit method and excellent parallel scaling. The algorithm
utilizes the pseudo-compressibility method recently proposed by Guermond and
Minev [Comptes Rendus Mathematique, 348:581-585, 2010] that uses a directional
splitting strategy to discretize the incompressible Navier-Stokes equations,
thereby reducing the linear systems to a series of one-dimensional tridiagonal
systems. We perform numerical simulations of several fluid-structure
interaction problems in two and three dimensions and study the accuracy and
convergence rates of the proposed algorithm. For these problems, we compare the
proposed algorithm against other second-order projection-based fluid solvers.
Lastly, the strong and weak scaling properties of the proposed algorithm are
investigated
Assessment of Semi-Mechanistic Bubble Departure Diameter Modelling for the CFD Simulation of Boiling Flows
In Eulerian-Eulerian two-fluid computational fluid dynamic (CFD) models, increasingly often applied to the prediction of nucleate boiling in nuclear reactor thermal hydraulics, boiling at the wall is usually accounted for by partitioning the heat flux between the different mechanisms of heat transfer involved. Between the numerous closures required, the bubble departure diameter in particular has a significant influence on the predicted interfacial area concentration and void distribution within the flow. In the present work, and following evidence of the limited accuracy and reliability of the empirically-based correlations which are applied normally in CFD models, more mechanistic formulations of bubble departure have been introduced into the STAR-CCM+ code. The performance of these models, based on a balance of the hydrodynamic forces acting on a bubble, and their compatibility with existing implementations in a CFD framework, are assessed against two different data sets for vertically upward subcooled boiling flows. In general, a significant amount of modelling is required by these mechanistic models and some recommendations are made on different modelling choices. The model is extended to include a more physically-consistent coupled calculation of the frequency of bubble departure and the modelling of the local subcooling acting on the bubble cap is analyzed. In general, predictions of void distribution and wall temperature reach a satisfactory accuracy, even if numerous numerical and modelling uncertainties are still present. In view of this, several areas for future work and modelling improvement are identified
- …