3,300 research outputs found
Investigation into Indexing XML Data Techniques
The rapid development of XML technology improves the WWW, since the XML data has many advantages and has become a common technology for transferring data cross the internet. Therefore, the objective of this research is to investigate and study the XML indexing techniques in terms of their structures. The main goal of this investigation is to identify the main limitations of these techniques and any other open issues.
Furthermore, this research considers most common XML indexing techniques and performs a comparison between them. Subsequently, this work makes an argument to find out these limitations. To conclude, the main problem of all the XML indexing techniques is the trade-off between the
size and the efficiency of the indexes. So, all the indexes become large in order to perform well, and none of them is suitable for all users’ requirements. However, each one of these techniques has some advantages in somehow
Tangos: the agile numerical galaxy organization system
We present Tangos, a Python framework and web interface for database-driven
analysis of numerical structure formation simulations. To understand the role
that such a tool can play, consider constructing a history for the absolute
magnitude of each galaxy within a simulation. The magnitudes must first be
calculated for all halos at all timesteps and then linked using a merger tree;
folding the required information into a final analysis can entail significant
effort. Tangos is a generic solution to this information organization problem,
aiming to free users from the details of data management. At the querying
stage, our example of gathering properties over history is reduced to a few
clicks or a simple, single-line Python command. The framework is highly
extensible; in particular, users are expected to define their own properties
which tangos will write into the database. A variety of parallelization options
are available and the raw simulation data can be read using existing libraries
such as pynbody or yt. Finally, tangos-based databases and analysis pipelines
can easily be shared with collaborators or the broader community to ensure
reproducibility. User documentation is provided separately.Comment: Clarified various points and further improved code performance;
accepted for publication in ApJS. Tutorials (including video) at
http://tiny.cc/tango
An introduction to Graph Data Management
A graph database is a database where the data structures for the schema
and/or instances are modeled as a (labeled)(directed) graph or generalizations
of it, and where querying is expressed by graph-oriented operations and type
constructors. In this article we present the basic notions of graph databases,
give an historical overview of its main development, and study the main current
systems that implement them
木を用いた構造化並列プログラミング
High-level abstractions for parallel programming are still immature. Computations on complicated data structures such as pointer structures are considered as irregular algorithms. General graph structures, which irregular algorithms generally deal with, are difficult to divide and conquer. Because the divide-and-conquer paradigm is essential for load balancing in parallel algorithms and a key to parallel programming, general graphs are reasonably difficult. However, trees lead to divide-and-conquer computations by definition and are sufficiently general and powerful as a tool of programming. We therefore deal with abstractions of tree-based computations. Our study has started from Matsuzaki’s work on tree skeletons. We have improved the usability of tree skeletons by enriching their implementation aspect. Specifically, we have dealt with two issues. We first have implemented the loose coupling between skeletons and data structures and developed a flexible tree skeleton library. We secondly have implemented a parallelizer that transforms sequential recursive functions in C into parallel programs that use tree skeletons implicitly. This parallelizer hides the complicated API of tree skeletons and makes programmers to use tree skeletons with no burden. Unfortunately, the practicality of tree skeletons, however, has not been improved. On the basis of the observations from the practice of tree skeletons, we deal with two application domains: program analysis and neighborhood computation. In the domain of program analysis, compilers treat input programs as control-flow graphs (CFGs) and perform analysis on CFGs. Program analysis is therefore difficult to divide and conquer. To resolve this problem, we have developed divide-and-conquer methods for program analysis in a syntax-directed manner on the basis of Rosen’s high-level approach. Specifically, we have dealt with data-flow analysis based on Tarjan’s formalization and value-graph construction based on a functional formalization. In the domain of neighborhood computations, a primary issue is locality. A naive parallel neighborhood computation without locality enhancement causes a lot of cache misses. The divide-and-conquer paradigm is known to be useful also for locality enhancement. We therefore have applied algebraic formalizations and a tree-segmenting technique derived from tree skeletons to the locality enhancement of neighborhood computations.電気通信大学201
Joint Regression and Ranking for Image Enhancement
Research on automated image enhancement has gained momentum in recent years,
partially due to the need for easy-to-use tools for enhancing pictures captured
by ubiquitous cameras on mobile devices. Many of the existing leading methods
employ machine-learning-based techniques, by which some enhancement parameters
for a given image are found by relating the image to the training images with
known enhancement parameters. While knowing the structure of the parameter
space can facilitate search for the optimal solution, none of the existing
methods has explicitly modeled and learned that structure. This paper presents
an end-to-end, novel joint regression and ranking approach to model the
interaction between desired enhancement parameters and images to be processed,
employing a Gaussian process (GP). GP allows searching for ideal parameters
using only the image features. The model naturally leads to a ranking technique
for comparing images in the induced feature space. Comparative evaluation using
the ground-truth based on the MIT-Adobe FiveK dataset plus subjective tests on
an additional data-set were used to demonstrate the effectiveness of the
proposed approach.Comment: WACV 201
XML Schema Clustering with Semantic and Hierarchical Similarity Measures
With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis
Streaming Active Learning Strategies for Real-Life Credit Card Fraud Detection: Assessment and Visualization
Credit card fraud detection is a very challenging problem because of the
specific nature of transaction data and the labeling process. The transaction
data is peculiar because they are obtained in a streaming fashion, they are
strongly imbalanced and prone to non-stationarity. The labeling is the outcome
of an active learning process, as every day human investigators contact only a
small number of cardholders (associated to the riskiest transactions) and
obtain the class (fraud or genuine) of the related transactions. An adequate
selection of the set of cardholders is therefore crucial for an efficient fraud
detection process. In this paper, we present a number of active learning
strategies and we investigate their fraud detection accuracies. We compare
different criteria (supervised, semi-supervised and unsupervised) to query
unlabeled transactions. Finally, we highlight the existence of an
exploitation/exploration trade-off for active learning in the context of fraud
detection, which has so far been overlooked in the literature
- …