131,041 research outputs found

    Cost-Sensitive Decision Trees with Completion Time Requirements

    Get PDF
    In many classification tasks, managing costs and completion times are the main concerns. In this paper, we assume that the completion time for classifying an instance is determined by its class label, and that a late penalty cost is incurred if the deadline is not met. This time requirement enriches the classification problem but posts a challenge to developing a solution algorithm. We propose an innovative approach for the decision tree induction, which produces multiple candidate trees by allowing more than one splitting attribute at each node. The user can specify the maximum number of candidate trees to control the computational efforts required to produce the final solution. In the tree-induction process, an allocation scheme is used to dynamically distribute the given number of candidate trees to splitting attributes according to their estimated contributions to cost reduction. The algorithm finds the final tree by backtracking. An extensive experiment shows that the algorithm outperforms the top-down heuristic and can effectively obtain the optimal or near-optimal decision trees without an excessive computation time.classification, decision tree, cost and time sensitive learning, late penalty

    Induction of Ordinal Decision Trees

    Get PDF
    This paper focuses on the problem of monotone decision trees from the point of view of the multicriteria decision aid methodology (MCDA). By taking into account the preferences of the decision maker, an attempt is made to bring closer similar research within machine learning and MCDA. The paper addresses the question how to label the leaves of a tree in a way that guarantees the monotonicity of the resulting tree. Two approaches are proposed for that purpose - dynamic and static labeling which are also compared experimentally. The paper further considers the problem of splitting criteria in the con- text of monotone decision trees. Two criteria from the literature are com- pared experimentally - the entropy criterion and the number of con criterion - in an attempt to find out which one fits better the specifics of the monotone problems and which one better handles monotonicity noise.monotone decision trees;noise;multicriteria decision aid;multicriteria sorting;ordinal classication

    Decision Stream: Cultivating Deep Decision Trees

    Full text link
    Various modifications of decision trees have been extensively used during the past years due to their high efficiency and interpretability. Tree node splitting based on relevant feature selection is a key step of decision tree learning, at the same time being their major shortcoming: the recursive nodes partitioning leads to geometric reduction of data quantity in the leaf nodes, which causes an excessive model complexity and data overfitting. In this paper, we present a novel architecture - a Decision Stream, - aimed to overcome this problem. Instead of building a tree structure during the learning process, we propose merging nodes from different branches based on their similarity that is estimated with two-sample test statistics, which leads to generation of a deep directed acyclic graph of decision rules that can consist of hundreds of levels. To evaluate the proposed solution, we test it on several common machine learning problems - credit scoring, twitter sentiment analysis, aircraft flight control, MNIST and CIFAR image classification, synthetic data classification and regression. Our experimental results reveal that the proposed approach significantly outperforms the standard decision tree learning methods on both regression and classification tasks, yielding a prediction error decrease up to 35%

    Hardness of Finding Independent Sets in 2-Colorable Hypergraphs and of Satisfiable CSPs

    Full text link
    This work revisits the PCP Verifiers used in the works of Hastad [Has01], Guruswami et al.[GHS02], Holmerin[Hol02] and Guruswami[Gur00] for satisfiable Max-E3-SAT and Max-Ek-Set-Splitting, and independent set in 2-colorable 4-uniform hypergraphs. We provide simpler and more efficient PCP Verifiers to prove the following improved hardness results: Assuming that NP\not\subseteq DTIME(N^{O(loglog N)}), There is no polynomial time algorithm that, given an n-vertex 2-colorable 4-uniform hypergraph, finds an independent set of n/(log n)^c vertices, for some constant c > 0. There is no polynomial time algorithm that satisfies 7/8 + 1/(log n)^c fraction of the clauses of a satisfiable Max-E3-SAT instance of size n, for some constant c > 0. For any fixed k >= 4, there is no polynomial time algorithm that finds a partition splitting (1 - 2^{-k+1}) + 1/(log n)^c fraction of the k-sets of a satisfiable Max-Ek-Set-Splitting instance of size n, for some constant c > 0. Our hardness factor for independent set in 2-colorable 4-uniform hypergraphs is an exponential improvement over the previous results of Guruswami et al.[GHS02] and Holmerin[Hol02]. Similarly, our inapproximability of (log n)^{-c} beyond the random assignment threshold for Max-E3-SAT and Max-Ek-Set-Splitting is an exponential improvement over the previous bounds proved in [Has01], [Hol02] and [Gur00]. The PCP Verifiers used in our results avoid the use of a variable bias parameter used in previous works, which leads to the improved hardness thresholds in addition to simplifying the analysis substantially. Apart from standard techniques from Fourier Analysis, for the first mentioned result we use a mixing estimate of Markov Chains based on uniform reverse hypercontractivity over general product spaces from the work of Mossel et al.[MOS13].Comment: 23 Page

    Twistless KAM tori, quasi flat homoclinic intersections, and other cancellations in the perturbation series of certain completely integrable hamiltonian systems. A review

    Full text link
    Rotators interacting with a pendulum via small, velocity independent, potentials are considered. If the interaction potential does not depend on the pendulum position then the pendulum and the rotators are decoupled and we study the invariant tori of the rotators system at fixed rotation numbers: we exhibit cancellations, to all orders of perturbation theory, that allow proving the stability and analyticity of the dipohantine tori. We find in this way a proof of the KAM theorem by direct bounds of the kk--th order coefficient of the perturbation expansion of the parametric equations of the tori in terms of their average anomalies: this extends Siegel's approach, from the linearization of analytic maps to the KAM theory; the convergence radius does not depend, in this case, on the twist strength, which could even vanish ({\it "twistless KAM tori"}). The same ideas apply to the case in which the potential couples the pendulum and the rotators: in this case the invariant tori with diophantine rotation numbers are unstable and have stable and unstable manifolds ({\it "whiskers"}): instead of studying the perturbation theory of the invariant tori we look for the cancellations that must be present because the homoclinic intersections of the whiskers are {\it "quasi flat"}, if the rotation velocity of the quasi periodic motion on the tori is large. We rederive in this way the result that, under suitable conditions, the homoclinic splitting is smaller than any power in the period of the forcing and find the exact asymptotics in the two dimensional cases ({\it e.g.} in the case of a periodically forced pendulum). The technique can be applied to study other quantities: we mention, as another example, the {\it homoclinic scattering phase shifts}.}Comment: 46 pages, Plain Tex, generates four figures named f1.ps,f2.ps, f3.ps,f4.ps. This paper replaces a preceding version which contained an error at the last paragraph of section 6, invalidating section 7 (but not the rest of the paper). The error is corrected here. If you already printed the previous paper only p.1,3, p.29 and section 7 with the appendices 3,4 need to be reprinted (ie: p. 30,31,32 and 4

    Chord Diagrams and Gauss Codes for Graphs

    Get PDF
    Chord diagrams on circles and their intersection graphs (also known as circle graphs) have been intensively studied, and have many applications to the study of knots and knot invariants, among others. However, chord diagrams on more general graphs have not been studied, and are potentially equally valuable in the study of spatial graphs. We will define chord diagrams for planar embeddings of planar graphs and their intersection graphs, and prove some basic results. Then, as an application, we will introduce Gauss codes for immersions of graphs in the plane and give algorithms to determine whether a particular crossing sequence is realizable as the Gauss code of an immersed graph.Comment: 20 pages, many figures. This version has been substantially rewritten, and the results are stronge
    • …
    corecore