344 research outputs found
Linear and Order Statistics Combiners for Pattern Classification
Several researchers have experimentally shown that substantial improvements
can be obtained in difficult pattern recognition problems by combining or
integrating the outputs of multiple classifiers. This chapter provides an
analytical framework to quantify the improvements in classification results due
to combining. The results apply to both linear combiners and order statistics
combiners. We first show that to a first order approximation, the error rate
obtained over and above the Bayes error rate, is directly proportional to the
variance of the actual decision boundaries around the Bayes optimum boundary.
Combining classifiers in output space reduces this variance, and hence reduces
the "added" error. If N unbiased classifiers are combined by simple averaging,
the added error rate can be reduced by a factor of N if the individual errors
in approximating the decision boundaries are uncorrelated. Expressions are then
derived for linear combiners which are biased or correlated, and the effect of
output correlations on ensemble performance is quantified. For order statistics
based non-linear combiners, we derive expressions that indicate how much the
median, the maximum and in general the ith order statistic can improve
classifier performance. The analysis presented here facilitates the
understanding of the relationships among error rates, classifier boundary
distributions, and combining in output space. Experimental results on several
public domain data sets are provided to illustrate the benefits of combining
and to support the analytical results.Comment: 31 page
Ensemble learning for blending gridded satellite and gauge-measured precipitation data
Regression algorithms are regularly used for improving the accuracy of
satellite precipitation products. In this context, ground-based measurements
are the dependent variable and the satellite data are the predictor variables,
together with topography factors. Alongside this, it is increasingly recognised
in many fields that combinations of algorithms through ensemble learning can
lead to substantial predictive performance improvements. Still, a sufficient
number of ensemble learners for improving the accuracy of satellite
precipitation products and their large-scale comparison are currently missing
from the literature. In this work, we fill this specific gap by proposing 11
new ensemble learners in the field and by extensively comparing them for the
entire contiguous United States and for a 15-year period. We use monthly data
from the PERSIANN (Precipitation Estimation from Remotely Sensed Information
using Artificial Neural Networks) and IMERG (Integrated Multi-satellitE
Retrievals for GPM) gridded datasets. We also use gauge-measured precipitation
data from the Global Historical Climatology Network monthly database, version 2
(GHCNm). The ensemble learners combine the predictions by six regression
algorithms (base learners), namely the multivariate adaptive regression splines
(MARS), multivariate adaptive polynomial splines (poly-MARS), random forests
(RF), gradient boosting machines (GBM), extreme gradient boosting (XGBoost) and
Bayesian regularized neural networks (BRNN), and each of them is based on a
different combiner. The combiners include the equal-weight combiner, the median
combiner, two best learners and seven variants of a sophisticated stacking
method. The latter stacks a regression algorithm on the top of the base
learners to combine their independent predictions...Comment: arXiv admin note: text overlap with arXiv:2301.0125
Recommended from our members
Predicting business failure using artificial intelligence system
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonPredicting business insolvency is considered one of the main supportive sources of information
for decision making for financial institutions, investors, creditors, and other participants in the
business market. Financial reporting systems provide relevant information that can be used to
assess the financial position of firms. It is crucial to have classification and prediction models
that can analyse this financial information and provide accurate assurance for users about
business health. Recent studies have explored the use of machine learning tools as substitute
for traditional statistical methods to develop classification models to classify firm insolvency
according to financial statement information. However, these models have no ideal classifier,
since each provides a certain percentage of wrong outputs, which is a crucial consideration;
every percentage of wrong response can mean massive financial losses for stakeholders.
Therefore, this study proposes new insolvency classification and perdition models based on
machine learning modelling techniques to develop an improved classifier.
Individual modelling techniques using statistical methods and machine learning were used to
develop the classification model of business insolvency. The results showed that machine
learning method outperformed statistical methods. Deep Learning (DPL) achieved the highest
performance based on all performance measurements used in the study, and it was the best
individual classifier, with average accuracy of 97.2% using all-years dataset. Ensemble-
Boosted Decision Tree classifier ranked second, followed by Decision Tree classifier. Thus, it
has been proven that DPL modelling approach is useful for business insolvency classification.
A key contribution in enhancing individual classifier outputs is the use of traditional combining
methods with two new aggregation methods in business insolvency (Fuzzy Logic and
Consensus Approach). The Consensus Approach showed the best improvement in the results
of all individual classifiers with average accuracy of 97.7%, and it is considered the best
classification method not only in comparison with individual classifiers, but also with
traditional combiners.
This study pioneers the development of a time series business insolvency prediction model
with Big Data for UK businesses. The aim of the model is to provide early prediction about a
business health. Three prediction models were developed based on Nonlinear Autoregressive
with Exogenous Input models (NARX), Nonlinear Autoregressive Neural Network (NAR),
and Deep Learning Time-series model (DPL-SA) and achieved average accuracy rates of
83.6%, 89.5%, and 91.35%, respectively. The results show relatively high performance in
comparison with the best individual classifier (deep learning)
A Network Topology for Composable Infrastructures
This paper proposes a passive optical backplane as a new network topology for composable computing infrastructures. The topology provides a high capacity, low-latency and flexible fabric that interconnects disaggregated resource components. The network topology is dedicated to inter-resource communication between composed logical hosts to ensure effective performance. We formulated a mixed integer linear programming (MILP) model that dynamically creates logical networks to support intra logical host communication over the physical network topology. The MILP performs energy efficient logical network instantiation given each application's resource demand. The topology can achieve 1 Tbps capacity per resource node given appropriate wavelength transmission data rate and the right number of wavelengths per node
Neural-Based Ensembles and Unorganized Machines to Predict Streamflow Series from Hydroelectric Plants
Estimating future streamflows is a key step in producing electricity for countries with
hydroelectric plants. Accurate predictions are particularly important due to environmental and economic impact they lead. In order to analyze the forecasting capability of models regarding monthly seasonal streamflow series, we realized an extensive investigation considering: six versions of unorganized machines—extreme learning machines (ELM) with and without regularization coefficient (RC), and echo state network (ESN) using the reservoirs from Jaeger’s and Ozturk et al., with and without RC. Additionally, we addressed the ELM as the combiner of a neural-based ensemble, an investigation not yet accomplished in such context. A comparative analysis was performed utilizing two linear approaches (autoregressive model (AR) and autoregressive and moving average model (ARMA)), four artificial neural networks (multilayer perceptron, radial basis function, Elman network, and Jordan network), and four ensembles. The tests were conducted at five hydroelectric plants, using horizons of 1, 3, 6, and 12 steps ahead. The results indicated that the unorganized machines and the ELM ensembles performed better than the linear models in all simulations. Moreover, the errors showed that the unorganized machines and the ELM-based ensembles reached the best general performances
Deep Space Network information system architecture study
The purpose of this article is to describe an architecture for the Deep Space Network (DSN) information system in the years 2000-2010 and to provide guidelines for its evolution during the 1990s. The study scope is defined to be from the front-end areas at the antennas to the end users (spacecraft teams, principal investigators, archival storage systems, and non-NASA partners). The architectural vision provides guidance for major DSN implementation efforts during the next decade. A strong motivation for the study is an expected dramatic improvement in information-systems technologies, such as the following: computer processing, automation technology (including knowledge-based systems), networking and data transport, software and hardware engineering, and human-interface technology. The proposed Ground Information System has the following major features: unified architecture from the front-end area to the end user; open-systems standards to achieve interoperability; DSN production of level 0 data; delivery of level 0 data from the Deep Space Communications Complex, if desired; dedicated telemetry processors for each receiver; security against unauthorized access and errors; and highly automated monitor and control
Improving binary classification using filtering based on k-NN proximity graphs
© 2020, The Author(s). One of the ways of increasing recognition ability in classification problem is removing outlier entries as well as redundant and unnecessary features from training set. Filtering and feature selection can have large impact on classifier accuracy and area under the curve (AUC), as noisy data can confuse classifier and lead it to catch wrong patterns in training data. The common approach in data filtering is using proximity graphs. However, the problem of the optimal filtering parameters selection is still insufficiently researched. In this paper filtering procedure based on k-nearest neighbours proximity graph was used. Filtering parameters selection was adopted as the solution of outlier minimization problem: k-NN proximity graph, power of distance and threshold parameters are selected in order to minimize outlier percentage in training data. Then performance of six commonly used classifiers (Logistic Regression, Naïve Bayes, Neural Network, Random Forest, Support Vector Machine and Decision Tree) and one heterogeneous classifiers combiner (DES-LA) are compared with and without filtering. Dynamic ensemble selection (DES) systems work by estimating the level of competence of each classifier from a pool of classifiers. Only the most competent ones are selected to classify a given test sample. This is achieved by defining a criterion to measure the level of competence of base classifiers, such as, its accuracy in local regions of the feature space around the query instance. In our case the combiner is based on the local accuracy of single classifiers and its output is a linear combination of single classifiers ranking. As results of filtering, accuracy of DES-LA combiner shows big increase for low-accuracy datasets. But filtering doesn’t have sufficient impact on DES-LA performance while working with high-accuracy datasets. The results are discussed, and classifiers, which performance was highly affected by pre-processing filtering step, are defined. The main contribution of the paper is introducing modifications to the DES-LA combiner, as well as comparative analysis of filtering impact on the classifiers of various type. Testing the filtering algorithm on real case dataset (Taiwan default credit card dataset) confirmed the efficiency of automatic filtering approach
Revisiting the Energy-Efficient Hybrid D-A Precoding and Combining Design For mm-Wave Systems
Hybrid digital to analog (D-A) precoding is widely used in millimeter wave systems to reduce the power consumption and implementation complexity incurred by the number of radio frequency (RF) chains that consume a lot of the transmitted power in this system. In this paper, an optimal number of RF chains is proposed to achieve the desired energy efficiency (EE). Here, the optimization problem is formulated in terms of fractional programming maximization, resulting in a method with a twofold novelty: First, the optimal number of RF chains is determined by the proposed bisection algorithm, which results in an optimized number of data streams. Second, the optimal analog precoders/combiners are designed by eigenvalue decomposition and a power iteration algorithm, followed by the digital precoders/combiners which are designed based on the singular value decomposition of the proposed effective uplink and downlink channel gains. Furthermore, the proposed D-A systems are designed carefully to attain a lower complexity than the existing D-A algorithms while achieving reasonable performance. Finally, the impact of utilizing a different number of quantized bits of resolution on the EE is investigated. Simulation results show that the proposed algorithms outperform existing algorithms in terms of EE, spectral efficiency, and computational complexity
- …