45,203 research outputs found
Experimental analysis of the accessibility of drawings with few segments
The visual complexity of a graph drawing is defined as the number of
geometric objects needed to represent all its edges. In particular, one object
may represent multiple edges, e.g., one needs only one line segment to draw two
collinear incident edges. We study the question if drawings with few segments
have a better aesthetic appeal and help the user to asses the underlying graph.
We design an experiment that investigates two different graph types (trees and
sparse graphs), three different layout algorithms for trees, and two different
layout algorithms for sparse graphs. We asked the users to give an aesthetic
ranking on the layouts and to perform a furthest-pair or shortest-path task on
the drawings.Comment: Appears in the Proceedings of the 25th International Symposium on
Graph Drawing and Network Visualization (GD 2017
Competitive versions of vertex ranking and game acquisition, and a problem on proper colorings
In this thesis we study certain functions on graphs. Chapters 2 and 3 deal with variations on vertex ranking, a type of node-labeling scheme that models a parallel processing problem. A k-ranking of a graph G is a labeling of its vertices from {1,...,k} such that any nontrivial path whose endpoints have the same label contains a vertex with a larger label. In Chapter 2, we investigate the problem of list ranking, wherein every vertex of G is assigned a set of possible labels, and a ranking must be constructed by labeling each vertex from its list; the list ranking number of G is the minimum k such that if every vertex is assigned a set of k possible labels, then G is guaranteed to have a ranking from these lists. We compute the list ranking numbers of paths, cycles, and trees with many leaves. In Chapter 3, we investigate the problem of on-line ranking, which asks for an algorithm to rank the vertices of G as they are revealed one at a time in the subgraph of G induced by the vertices revealed so far. The on-line ranking number of G is the minimum over all such labeling algorithms of the largest label that the algorithm can be forced to use. We give algorithmic bounds on the on-line ranking number of trees in terms of maximum degree, diameter, and number of internal vertices.
Chapter 4 is concerned with the connectedness and Hamiltonicity of the graph G^j_k(H), whose vertices are the proper k-colorings of a given graph H, with edges joining colorings that differ only on a set of vertices contained within a connected subgraph of H on at most j vertices. We introduce and study the parameters g_k(H) and h_k(H), which denote the minimum j such that G^j_k(H) is connected or Hamiltonian, respectively. Finally, in Chapter 5 we compute the game acquisition number of complete bipartite graphs. An acquisition move in a weighted graph G consists a vertex v taking all the weight from a neighbor whose weight is at most the weight of v. In the acquisition game on G, each vertex initially has weight 1, and players Min and Max alternate acquisition moves until the set of vertices remaining with positive weight is an independent set. Min seeks to minimize the size of the final independent set, while Max seeks to maximize it; the game acquisition number is the size of the final set under optimal play
How to measure metallicity from five-band photometry with supervised machine learning algorithms
We demonstrate that it is possible to measure metallicity from the SDSS
five-band photometry to better than 0.1 dex using supervised machine learning
algorithms. Using spectroscopic estimates of metallicity as ground truth, we
build, optimize and train several estimators to predict metallicity. We use the
observed photometry, as well as derived quantities such as stellar mass and
photometric redshift, as features, and we build two sample data sets at median
redshifts of 0.103 and 0.218 and median r-band magnitude of 17.5 and 18.3
respectively. We find that ensemble methods, such as Random Forests of Trees
and Extremely Randomized Trees, and Support Vector Machines all perform
comparably well and can measure metallicity with a Root Mean Square Error
(RMSE) of 0.081 and 0.090 for the two data sets when all objects are included.
The fraction of outliers (objects for which |Z_true - Z_pred| > 0.2 dex) is 2.2
and 3.9%, respectively and the RMSE decreases to 0.068 and 0.069 if those
objects are excluded. Because of the ability of these algorithms to capture
complex relationships between data and target, our technique performs better
than previously proposed methods that sought to fit metallicity using an
analytic fitting formula, and has 3x more constraining power than SED
fitting-based methods. Additionally, this method is extremely forgiving of
contamination in the training set, and can be used with very satisfactory
results for training sample sizes of just a few hundred objects. We distribute
all the routines to reproduce our results and apply them to other data sets.Comment: Minor revisions, matching version published in MNRA
Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers
We aim to provide table answers to keyword queries against knowledge bases.
For queries referring to multiple entities, like "Washington cities population"
and "Mel Gibson movies", it is better to represent each relevant answer as a
table which aggregates a set of entities or entity-joins within the same table
scheme or pattern. In this paper, we study how to find highly relevant patterns
in a knowledge base for user-given keyword queries to compose table answers. A
knowledge base can be modeled as a directed graph called knowledge graph, where
nodes represent entities in the knowledge base and edges represent the
relationships among them. Each node/edge is labeled with type and text. A
pattern is an aggregation of subtrees which contain all keywords in the texts
and have the same structure and types on node/edges. We propose efficient
algorithms to find patterns that are relevant to the query for a class of
scoring functions. We show the hardness of the problem in theory, and propose
path-based indexes that are affordable in memory. Two query-processing
algorithms are proposed: one is fast in practice for small queries (with small
patterns as answers) by utilizing the indexes; and the other one is better in
theory, with running time linear in the sizes of indexes and answers, which can
handle large queries better. We also conduct extensive experimental study to
compare our approaches with a naive adaption of known techniques.Comment: VLDB 201
Ensemble of Example-Dependent Cost-Sensitive Decision Trees
Several real-world classification problems are example-dependent
cost-sensitive in nature, where the costs due to misclassification vary between
examples and not only within classes. However, standard classification methods
do not take these costs into account, and assume a constant cost of
misclassification errors. In previous works, some methods that take into
account the financial costs into the training of different algorithms have been
proposed, with the example-dependent cost-sensitive decision tree algorithm
being the one that gives the highest savings. In this paper we propose a new
framework of ensembles of example-dependent cost-sensitive decision-trees. The
framework consists in creating different example-dependent cost-sensitive
decision trees on random subsamples of the training set, and then combining
them using three different combination approaches. Moreover, we propose two new
cost-sensitive combination approaches; cost-sensitive weighted voting and
cost-sensitive stacking, the latter being based on the cost-sensitive logistic
regression method. Finally, using five different databases, from four
real-world applications: credit card fraud detection, churn modeling, credit
scoring and direct marketing, we evaluate the proposed method against
state-of-the-art example-dependent cost-sensitive techniques, namely,
cost-proportionate sampling, Bayes minimum risk and cost-sensitive decision
trees. The results show that the proposed algorithms have better results for
all databases, in the sense of higher savings.Comment: 13 pages, 6 figures, Submitted for possible publicatio
Training linear ranking SVMs in linearithmic time using red-black trees
We introduce an efficient method for training the linear ranking support
vector machine. The method combines cutting plane optimization with red-black
tree based approach to subgradient calculations, and has O(m*s+m*log(m)) time
complexity, where m is the number of training examples, and s the average
number of non-zero features per example. Best previously known training
algorithms achieve the same efficiency only for restricted special cases,
whereas the proposed approach allows any real valued utility scores in the
training data. Experiments demonstrate the superior scalability of the proposed
approach, when compared to the fastest existing RankSVM implementations.Comment: 20 pages, 4 figure
- …