131,041 research outputs found
Cost-Sensitive Decision Trees with Completion Time Requirements
In many classification tasks, managing costs and completion times are the main concerns. In this paper, we assume that the completion time for classifying an instance is determined by its class label, and that a late penalty cost is incurred if the deadline is not met. This time requirement enriches the classification problem but posts a challenge to developing a solution algorithm. We propose an innovative approach for the decision tree induction, which produces multiple candidate trees by allowing more than one splitting attribute at each node. The user can specify the maximum number of candidate trees to control the computational efforts required to produce the final solution. In the tree-induction process, an allocation scheme is used to dynamically distribute the given number of candidate trees to splitting attributes according to their estimated contributions to cost reduction. The algorithm finds the final tree by backtracking. An extensive experiment shows that the algorithm outperforms the top-down heuristic and can effectively obtain the optimal or near-optimal decision trees without an excessive computation time.classification, decision tree, cost and time sensitive learning, late penalty
Induction of Ordinal Decision Trees
This paper focuses on the problem of monotone decision trees from the point of view of the multicriteria decision aid methodology (MCDA). By taking into account the preferences of the decision maker, an attempt is made to bring closer similar research within machine learning and MCDA. The paper addresses the question how to label the leaves of a tree in a way that guarantees the monotonicity of the resulting tree. Two approaches are proposed for that purpose - dynamic and static labeling which are also compared experimentally. The paper further considers the problem of splitting criteria in the con- text of monotone decision trees. Two criteria from the literature are com- pared experimentally - the entropy criterion and the number of con criterion - in an attempt to find out which one fits better the specifics of the monotone problems and which one better handles monotonicity noise.monotone decision trees;noise;multicriteria decision aid;multicriteria sorting;ordinal classication
Decision Stream: Cultivating Deep Decision Trees
Various modifications of decision trees have been extensively used during the
past years due to their high efficiency and interpretability. Tree node
splitting based on relevant feature selection is a key step of decision tree
learning, at the same time being their major shortcoming: the recursive nodes
partitioning leads to geometric reduction of data quantity in the leaf nodes,
which causes an excessive model complexity and data overfitting. In this paper,
we present a novel architecture - a Decision Stream, - aimed to overcome this
problem. Instead of building a tree structure during the learning process, we
propose merging nodes from different branches based on their similarity that is
estimated with two-sample test statistics, which leads to generation of a deep
directed acyclic graph of decision rules that can consist of hundreds of
levels. To evaluate the proposed solution, we test it on several common machine
learning problems - credit scoring, twitter sentiment analysis, aircraft flight
control, MNIST and CIFAR image classification, synthetic data classification
and regression. Our experimental results reveal that the proposed approach
significantly outperforms the standard decision tree learning methods on both
regression and classification tasks, yielding a prediction error decrease up to
35%
Hardness of Finding Independent Sets in 2-Colorable Hypergraphs and of Satisfiable CSPs
This work revisits the PCP Verifiers used in the works of Hastad [Has01],
Guruswami et al.[GHS02], Holmerin[Hol02] and Guruswami[Gur00] for satisfiable
Max-E3-SAT and Max-Ek-Set-Splitting, and independent set in 2-colorable
4-uniform hypergraphs. We provide simpler and more efficient PCP Verifiers to
prove the following improved hardness results: Assuming that NP\not\subseteq
DTIME(N^{O(loglog N)}),
There is no polynomial time algorithm that, given an n-vertex 2-colorable
4-uniform hypergraph, finds an independent set of n/(log n)^c vertices, for
some constant c > 0.
There is no polynomial time algorithm that satisfies 7/8 + 1/(log n)^c
fraction of the clauses of a satisfiable Max-E3-SAT instance of size n, for
some constant c > 0.
For any fixed k >= 4, there is no polynomial time algorithm that finds a
partition splitting (1 - 2^{-k+1}) + 1/(log n)^c fraction of the k-sets of a
satisfiable Max-Ek-Set-Splitting instance of size n, for some constant c > 0.
Our hardness factor for independent set in 2-colorable 4-uniform hypergraphs
is an exponential improvement over the previous results of Guruswami et
al.[GHS02] and Holmerin[Hol02]. Similarly, our inapproximability of (log
n)^{-c} beyond the random assignment threshold for Max-E3-SAT and
Max-Ek-Set-Splitting is an exponential improvement over the previous bounds
proved in [Has01], [Hol02] and [Gur00]. The PCP Verifiers used in our results
avoid the use of a variable bias parameter used in previous works, which leads
to the improved hardness thresholds in addition to simplifying the analysis
substantially. Apart from standard techniques from Fourier Analysis, for the
first mentioned result we use a mixing estimate of Markov Chains based on
uniform reverse hypercontractivity over general product spaces from the work of
Mossel et al.[MOS13].Comment: 23 Page
Twistless KAM tori, quasi flat homoclinic intersections, and other cancellations in the perturbation series of certain completely integrable hamiltonian systems. A review
Rotators interacting with a pendulum via small, velocity independent,
potentials are considered. If the interaction potential does not depend on the
pendulum position then the pendulum and the rotators are decoupled and we study
the invariant tori of the rotators system at fixed rotation numbers: we exhibit
cancellations, to all orders of perturbation theory, that allow proving the
stability and analyticity of the dipohantine tori. We find in this way a proof
of the KAM theorem by direct bounds of the --th order coefficient of the
perturbation expansion of the parametric equations of the tori in terms of
their average anomalies: this extends Siegel's approach, from the linearization
of analytic maps to the KAM theory; the convergence radius does not depend, in
this case, on the twist strength, which could even vanish ({\it "twistless KAM
tori"}). The same ideas apply to the case in which the potential couples the
pendulum and the rotators: in this case the invariant tori with diophantine
rotation numbers are unstable and have stable and unstable manifolds ({\it
"whiskers"}): instead of studying the perturbation theory of the invariant tori
we look for the cancellations that must be present because the homoclinic
intersections of the whiskers are {\it "quasi flat"}, if the rotation velocity
of the quasi periodic motion on the tori is large. We rederive in this way the
result that, under suitable conditions, the homoclinic splitting is smaller
than any power in the period of the forcing and find the exact asymptotics in
the two dimensional cases ({\it e.g.} in the case of a periodically forced
pendulum). The technique can be applied to study other quantities: we mention,
as another example, the {\it homoclinic scattering phase shifts}.}Comment: 46 pages, Plain Tex, generates four figures named f1.ps,f2.ps,
f3.ps,f4.ps. This paper replaces a preceding version which contained an error
at the last paragraph of section 6, invalidating section 7 (but not the rest
of the paper). The error is corrected here. If you already printed the
previous paper only p.1,3, p.29 and section 7 with the appendices 3,4 need to
be reprinted (ie: p. 30,31,32 and 4
Chord Diagrams and Gauss Codes for Graphs
Chord diagrams on circles and their intersection graphs (also known as circle
graphs) have been intensively studied, and have many applications to the study
of knots and knot invariants, among others. However, chord diagrams on more
general graphs have not been studied, and are potentially equally valuable in
the study of spatial graphs. We will define chord diagrams for planar
embeddings of planar graphs and their intersection graphs, and prove some basic
results. Then, as an application, we will introduce Gauss codes for immersions
of graphs in the plane and give algorithms to determine whether a particular
crossing sequence is realizable as the Gauss code of an immersed graph.Comment: 20 pages, many figures. This version has been substantially
rewritten, and the results are stronge
- …