3,258 research outputs found
Efficient multi-label classification for evolving data streams
Many real world problems involve data which can be considered as multi-label data streams. Efficient methods exist for multi-label classification in non streaming scenarios. However, learning in evolving streaming scenarios is more challenging, as the learners must be able to adapt to change using limited time and memory.
This paper proposes a new experimental framework for studying multi-label evolving stream classification, and new efficient methods that combine the best practices in streaming scenarios with the best practices in multi-label classification. We present a Multi-label Hoeffding Tree with multilabel classifiers at the leaves as a base classifier. We obtain fast and accurate methods, that are well suited for this challenging multi-label classification streaming task. Using the new experimental framework, we test our methodology by performing an evaluation study on synthetic and real-world datasets. In comparison to well-known batch multi-label methods, we obtain encouraging results
Throughput-Distortion Computation Of Generic Matrix Multiplication: Toward A Computation Channel For Digital Signal Processing Systems
The generic matrix multiply (GEMM) function is the core element of
high-performance linear algebra libraries used in many
computationally-demanding digital signal processing (DSP) systems. We propose
an acceleration technique for GEMM based on dynamically adjusting the
imprecision (distortion) of computation. Our technique employs adaptive scalar
companding and rounding to input matrix blocks followed by two forms of packing
in floating-point that allow for concurrent calculation of multiple results.
Since the adaptive companding process controls the increase of concurrency (via
packing), the increase in processing throughput (and the corresponding increase
in distortion) depends on the input data statistics. To demonstrate this, we
derive the optimal throughput-distortion control framework for GEMM for the
broad class of zero-mean, independent identically distributed, input sources.
Our approach converts matrix multiplication in programmable processors into a
computation channel: when increasing the processing throughput, the output
noise (error) increases due to (i) coarser quantization and (ii) computational
errors caused by exceeding the machine-precision limitations. We show that,
under certain distortion in the GEMM computation, the proposed framework can
significantly surpass 100% of the peak performance of a given processor. The
practical benefits of our proposal are shown in a face recognition system and a
multi-layer perceptron system trained for metadata learning from a large music
feature database.Comment: IEEE Transactions on Signal Processing (vol. 60, 2012
Multi-test Decision Tree and its Application to Microarray Data Classification
Objective:
The desirable property of tools used to investigate biological data is
easy to understand models and predictive decisions.
Decision trees are particularly promising in this regard due to their comprehensible nature that resembles the hierarchical process of human decision making. However, existing algorithms for learning decision trees have tendency to underfit gene expression data. The main aim of this work is to improve the performance and stability of decision trees with only a small increase in their complexity.
Methods:
We propose a multi-test decision tree (MTDT); our main contribution is the application of several univariate tests in each non-terminal node of the decision tree. We also search for alternative, lower-ranked features in order to obtain more stable and reliable predictions.
Results:
Experimental validation was performed on several real-life gene expression datasets. Comparison results with eight classifiers show that MTDT has a statistically significantly higher accuracy than popular decision tree classifiers, and it was highly competitive with ensemble learning algorithms. The proposed solution managed to outperform its baseline algorithm on datasets by an average percent. A study performed on one of the datasets showed that the discovered genes used in the MTDT classification model
are supported by biological evidence in the literature.
Conclusion:
This paper introduces a new type of decision tree which is more suitable for solving biological problems.
MTDTs are relatively easy to analyze and much more powerful in modeling high dimensional microarray data than their popular counterparts
- …