45,190 research outputs found

    Experimental analysis of the accessibility of drawings with few segments

    Get PDF
    The visual complexity of a graph drawing is defined as the number of geometric objects needed to represent all its edges. In particular, one object may represent multiple edges, e.g., one needs only one line segment to draw two collinear incident edges. We study the question if drawings with few segments have a better aesthetic appeal and help the user to asses the underlying graph. We design an experiment that investigates two different graph types (trees and sparse graphs), three different layout algorithms for trees, and two different layout algorithms for sparse graphs. We asked the users to give an aesthetic ranking on the layouts and to perform a furthest-pair or shortest-path task on the drawings.Comment: Appears in the Proceedings of the 25th International Symposium on Graph Drawing and Network Visualization (GD 2017

    Competitive versions of vertex ranking and game acquisition, and a problem on proper colorings

    Get PDF
    In this thesis we study certain functions on graphs. Chapters 2 and 3 deal with variations on vertex ranking, a type of node-labeling scheme that models a parallel processing problem. A k-ranking of a graph G is a labeling of its vertices from {1,...,k} such that any nontrivial path whose endpoints have the same label contains a vertex with a larger label. In Chapter 2, we investigate the problem of list ranking, wherein every vertex of G is assigned a set of possible labels, and a ranking must be constructed by labeling each vertex from its list; the list ranking number of G is the minimum k such that if every vertex is assigned a set of k possible labels, then G is guaranteed to have a ranking from these lists. We compute the list ranking numbers of paths, cycles, and trees with many leaves. In Chapter 3, we investigate the problem of on-line ranking, which asks for an algorithm to rank the vertices of G as they are revealed one at a time in the subgraph of G induced by the vertices revealed so far. The on-line ranking number of G is the minimum over all such labeling algorithms of the largest label that the algorithm can be forced to use. We give algorithmic bounds on the on-line ranking number of trees in terms of maximum degree, diameter, and number of internal vertices. Chapter 4 is concerned with the connectedness and Hamiltonicity of the graph G^j_k(H), whose vertices are the proper k-colorings of a given graph H, with edges joining colorings that differ only on a set of vertices contained within a connected subgraph of H on at most j vertices. We introduce and study the parameters g_k(H) and h_k(H), which denote the minimum j such that G^j_k(H) is connected or Hamiltonian, respectively. Finally, in Chapter 5 we compute the game acquisition number of complete bipartite graphs. An acquisition move in a weighted graph G consists a vertex v taking all the weight from a neighbor whose weight is at most the weight of v. In the acquisition game on G, each vertex initially has weight 1, and players Min and Max alternate acquisition moves until the set of vertices remaining with positive weight is an independent set. Min seeks to minimize the size of the final independent set, while Max seeks to maximize it; the game acquisition number is the size of the final set under optimal play

    How to measure metallicity from five-band photometry with supervised machine learning algorithms

    Full text link
    We demonstrate that it is possible to measure metallicity from the SDSS five-band photometry to better than 0.1 dex using supervised machine learning algorithms. Using spectroscopic estimates of metallicity as ground truth, we build, optimize and train several estimators to predict metallicity. We use the observed photometry, as well as derived quantities such as stellar mass and photometric redshift, as features, and we build two sample data sets at median redshifts of 0.103 and 0.218 and median r-band magnitude of 17.5 and 18.3 respectively. We find that ensemble methods, such as Random Forests of Trees and Extremely Randomized Trees, and Support Vector Machines all perform comparably well and can measure metallicity with a Root Mean Square Error (RMSE) of 0.081 and 0.090 for the two data sets when all objects are included. The fraction of outliers (objects for which |Z_true - Z_pred| > 0.2 dex) is 2.2 and 3.9%, respectively and the RMSE decreases to 0.068 and 0.069 if those objects are excluded. Because of the ability of these algorithms to capture complex relationships between data and target, our technique performs better than previously proposed methods that sought to fit metallicity using an analytic fitting formula, and has 3x more constraining power than SED fitting-based methods. Additionally, this method is extremely forgiving of contamination in the training set, and can be used with very satisfactory results for training sample sizes of just a few hundred objects. We distribute all the routines to reproduce our results and apply them to other data sets.Comment: Minor revisions, matching version published in MNRA

    Finding Patterns in a Knowledge Base using Keywords to Compose Table Answers

    Full text link
    We aim to provide table answers to keyword queries against knowledge bases. For queries referring to multiple entities, like "Washington cities population" and "Mel Gibson movies", it is better to represent each relevant answer as a table which aggregates a set of entities or entity-joins within the same table scheme or pattern. In this paper, we study how to find highly relevant patterns in a knowledge base for user-given keyword queries to compose table answers. A knowledge base can be modeled as a directed graph called knowledge graph, where nodes represent entities in the knowledge base and edges represent the relationships among them. Each node/edge is labeled with type and text. A pattern is an aggregation of subtrees which contain all keywords in the texts and have the same structure and types on node/edges. We propose efficient algorithms to find patterns that are relevant to the query for a class of scoring functions. We show the hardness of the problem in theory, and propose path-based indexes that are affordable in memory. Two query-processing algorithms are proposed: one is fast in practice for small queries (with small patterns as answers) by utilizing the indexes; and the other one is better in theory, with running time linear in the sizes of indexes and answers, which can handle large queries better. We also conduct extensive experimental study to compare our approaches with a naive adaption of known techniques.Comment: VLDB 201

    Ensemble of Example-Dependent Cost-Sensitive Decision Trees

    Get PDF
    Several real-world classification problems are example-dependent cost-sensitive in nature, where the costs due to misclassification vary between examples and not only within classes. However, standard classification methods do not take these costs into account, and assume a constant cost of misclassification errors. In previous works, some methods that take into account the financial costs into the training of different algorithms have been proposed, with the example-dependent cost-sensitive decision tree algorithm being the one that gives the highest savings. In this paper we propose a new framework of ensembles of example-dependent cost-sensitive decision-trees. The framework consists in creating different example-dependent cost-sensitive decision trees on random subsamples of the training set, and then combining them using three different combination approaches. Moreover, we propose two new cost-sensitive combination approaches; cost-sensitive weighted voting and cost-sensitive stacking, the latter being based on the cost-sensitive logistic regression method. Finally, using five different databases, from four real-world applications: credit card fraud detection, churn modeling, credit scoring and direct marketing, we evaluate the proposed method against state-of-the-art example-dependent cost-sensitive techniques, namely, cost-proportionate sampling, Bayes minimum risk and cost-sensitive decision trees. The results show that the proposed algorithms have better results for all databases, in the sense of higher savings.Comment: 13 pages, 6 figures, Submitted for possible publicatio

    Training linear ranking SVMs in linearithmic time using red-black trees

    Full text link
    We introduce an efficient method for training the linear ranking support vector machine. The method combines cutting plane optimization with red-black tree based approach to subgradient calculations, and has O(m*s+m*log(m)) time complexity, where m is the number of training examples, and s the average number of non-zero features per example. Best previously known training algorithms achieve the same efficiency only for restricted special cases, whereas the proposed approach allows any real valued utility scores in the training data. Experiments demonstrate the superior scalability of the proposed approach, when compared to the fastest existing RankSVM implementations.Comment: 20 pages, 4 figure
    corecore