16,609 research outputs found
Decision Stream: Cultivating Deep Decision Trees
Various modifications of decision trees have been extensively used during the
past years due to their high efficiency and interpretability. Tree node
splitting based on relevant feature selection is a key step of decision tree
learning, at the same time being their major shortcoming: the recursive nodes
partitioning leads to geometric reduction of data quantity in the leaf nodes,
which causes an excessive model complexity and data overfitting. In this paper,
we present a novel architecture - a Decision Stream, - aimed to overcome this
problem. Instead of building a tree structure during the learning process, we
propose merging nodes from different branches based on their similarity that is
estimated with two-sample test statistics, which leads to generation of a deep
directed acyclic graph of decision rules that can consist of hundreds of
levels. To evaluate the proposed solution, we test it on several common machine
learning problems - credit scoring, twitter sentiment analysis, aircraft flight
control, MNIST and CIFAR image classification, synthetic data classification
and regression. Our experimental results reveal that the proposed approach
significantly outperforms the standard decision tree learning methods on both
regression and classification tasks, yielding a prediction error decrease up to
35%
Adaptive content mapping for internet navigation
The Internet as the biggest human library ever assembled keeps on growing. Although all kinds of information carriers (e.g. audio/video/hybrid file formats) are available, text based documents dominate. It is estimated that about 80% of all information worldwide stored electronically exists in (or can be converted into) text form. More and more, all kinds of documents are generated by means of a text processing system and are therefore available electronically. Nowadays, many printed journals are also published online and may even discontinue to appear in print form tomorrow. This development has many convincing advantages: the documents are both available faster (cf. prepress services) and cheaper, they can be searched more easily, the physical storage only needs a fraction of the space previously necessary and the medium will not age. For most people, fast and easy access is the most interesting feature of the new age; computer-aided search for specific documents or Web pages becomes the basic tool for information-oriented work. But this tool has problems. The current keyword based search machines available on the Internet are not really appropriate for such a task; either there are (way) too many documents matching the specified keywords are presented or none at all. The problem lies in the fact that it is often very difficult to choose appropriate terms describing the desired topic in the first place. This contribution discusses the current state-of-the-art techniques in content-based searching (along with common visualization/browsing approaches) and proposes a particular adaptive solution for intuitive Internet document navigation, which not only enables the user to provide full texts instead of manually selected keywords (if available), but also allows him/her to explore the whole database
GENERATION OF TRIANGULAR MESHES FOR COMPLEX DOMAINS ON THE PLANE
Many physical phenomena can bc modelcd by partial diffcrcntial cąuations. The dcvclopmcnt of numcrical methods bascd on the spatial subdivision of a domain into fmitc clcmcnts immcdiatcly cxtcnded interests to the tasks of generating a mesh. With the availability of vcrsatilc field solv- crs and powerful computcrs, the simulations of cver inereasing gcometrical and physical com- plcxity arc attempted. At somc point the main bottleneck becomcs the mesh generation itsclf.The papcr prcsents a dctailcd description of the triangular mcsh gcneration schcmc on the piane bascd upon the Dclaunay triangulation. A mcsh generator should be fully automatic and simplify input data as much as possible. It should offer rapid gradation from smali to large sizes of elcmcnts. The generated mcsh must be always valid and of good quality. Ali thesc rcquiremcnts were taken into account during the selection and elaboration of utilized algorithms.Successive chapters describe procedures connected with the specification of a modeled domain, gcneration and triangulation of boundary vertices, introducing inner nodes, improving the quality of the crcated mcsh, and renumbering of vertices
Multivariate Approaches to Classification in Extragalactic Astronomy
Clustering objects into synthetic groups is a natural activity of any
science. Astrophysics is not an exception and is now facing a deluge of data.
For galaxies, the one-century old Hubble classification and the Hubble tuning
fork are still largely in use, together with numerous mono-or bivariate
classifications most often made by eye. However, a classification must be
driven by the data, and sophisticated multivariate statistical tools are used
more and more often. In this paper we review these different approaches in
order to situate them in the general context of unsupervised and supervised
learning. We insist on the astrophysical outcomes of these studies to show that
multivariate analyses provide an obvious path toward a renewal of our
classification of galaxies and are invaluable tools to investigate the physics
and evolution of galaxies.Comment: Open Access paper.
http://www.frontiersin.org/milky\_way\_and\_galaxies/10.3389/fspas.2015.00003/abstract\>.
\<10.3389/fspas.2015.00003 \&g
Cell Segmentation in 3D Confocal Images using Supervoxel Merge-Forests with CNN-based Hypothesis Selection
Automated segmentation approaches are crucial to quantitatively analyze
large-scale 3D microscopy images. Particularly in deep tissue regions,
automatic methods still fail to provide error-free segmentations. To improve
the segmentation quality throughout imaged samples, we present a new
supervoxel-based 3D segmentation approach that outperforms current methods and
reduces the manual correction effort. The algorithm consists of gentle
preprocessing and a conservative super-voxel generation method followed by
supervoxel agglomeration based on local signal properties and a postprocessing
step to fix under-segmentation errors using a Convolutional Neural Network. We
validate the functionality of the algorithm on manually labeled 3D confocal
images of the plant Arabidopis thaliana and compare the results to a
state-of-the-art meristem segmentation algorithm.Comment: 5 pages, 3 figures, 1 tabl
Using rule extraction to improve the comprehensibility of predictive models.
Whereas newer machine learning techniques, like artifficial neural net-works and support vector machines, have shown superior performance in various benchmarking studies, the application of these techniques remains largely restricted to research environments. A more widespread adoption of these techniques is foiled by their lack of explanation capability which is required in some application areas, like medical diagnosis or credit scoring. To overcome this restriction, various algorithms have been proposed to extract a meaningful description of the underlying `blackbox' models. These algorithms' dual goal is to mimic the behavior of the black box as closely as possible while at the same time they have to ensure that the extracted description is maximally comprehensible. In this research report, we first develop a formal definition of`rule extraction and comment on the inherent trade-off between accuracy and comprehensibility. Afterwards, we develop a taxonomy by which rule extraction algorithms can be classiffied and discuss some criteria by which these algorithms can be evaluated. Finally, an in-depth review of the most important algorithms is given.This report is concluded by pointing out some general shortcomings of existing techniques and opportunities for future research.Models; Model; Algorithms; Criteria; Opportunities; Research; Learning; Neural networks; Networks; Performance; Benchmarking; Studies; Area; Credit; Credit scoring; Behavior; Time;
- …