3,482 research outputs found
Forecasting bus passenger flows by using a clustering-based support vector regression approach
As a significant component of the intelligent transportation system, forecasting bus passenger
flows plays a key role in resource allocation, network planning, and frequency setting. However, it remains
challenging to recognize high fluctuations, nonlinearity, and periodicity of bus passenger flows due to
varied destinations and departure times. For this reason, a novel forecasting model named as affinity
propagation-based support vector regression (AP-SVR) is proposed based on clustering and nonlinear
simulation. For the addressed approach, a clustering algorithm is first used to generate clustering-based
intervals. A support vector regression (SVR) is then exploited to forecast the passenger flow for each
cluster, with the use of particle swarm optimization (PSO) for obtaining the optimized parameters. Finally,
the prediction results of the SVR are rearranged by chronological order rearrangement. The proposed model
is tested using real bus passenger data from a bus line over four months. Experimental results demonstrate
that the proposed model performs better than other peer models in terms of absolute percentage error and
mean absolute percentage error. It is recommended that the deterministic clustering technique with stable
cluster results (AP) can improve the forecasting performance significantly.info:eu-repo/semantics/publishedVersio
Neural Networks for Complex Data
Artificial neural networks are simple and efficient machine learning tools.
Defined originally in the traditional setting of simple vector data, neural
network models have evolved to address more and more difficulties of complex
real world problems, ranging from time evolving data to sophisticated data
structures such as graphs and functions. This paper summarizes advances on
those themes from the last decade, with a focus on results obtained by members
of the SAMM team of Universit\'e Paris
Graphs in machine learning: an introduction
Graphs are commonly used to characterise interactions between objects of
interest. Because they are based on a straightforward formalism, they are used
in many scientific fields from computer science to historical sciences. In this
paper, we give an introduction to some methods relying on graphs for learning.
This includes both unsupervised and supervised methods. Unsupervised learning
algorithms usually aim at visualising graphs in latent spaces and/or clustering
the nodes. Both focus on extracting knowledge from graph topologies. While most
existing techniques are only applicable to static graphs, where edges do not
evolve through time, recent developments have shown that they could be extended
to deal with evolving networks. In a supervised context, one generally aims at
inferring labels or numerical values attached to nodes using both the graph
and, when they are available, node characteristics. Balancing the two sources
of information can be challenging, especially as they can disagree locally or
globally. In both contexts, supervised and un-supervised, data can be
relational (augmented with one or several global graphs) as described above, or
graph valued. In this latter case, each object of interest is given as a full
graph (possibly completed by other characteristics). In this context, natural
tasks include graph clustering (as in producing clusters of graphs rather than
clusters of nodes in a single graph), graph classification, etc. 1 Real
networks One of the first practical studies on graphs can be dated back to the
original work of Moreno [51] in the 30s. Since then, there has been a growing
interest in graph analysis associated with strong developments in the modelling
and the processing of these data. Graphs are now used in many scientific
fields. In Biology [54, 2, 7], for instance, metabolic networks can describe
pathways of biochemical reactions [41], while in social sciences networks are
used to represent relation ties between actors [66, 56, 36, 34]. Other examples
include powergrids [71] and the web [75]. Recently, networks have also been
considered in other areas such as geography [22] and history [59, 39]. In
machine learning, networks are seen as powerful tools to model problems in
order to extract information from data and for prediction purposes. This is the
object of this paper. For more complete surveys, we refer to [28, 62, 49, 45].
In this section, we introduce notations and highlight properties shared by most
real networks. In Section 2, we then consider methods aiming at extracting
information from a unique network. We will particularly focus on clustering
methods where the goal is to find clusters of vertices. Finally, in Section 3,
techniques that take a series of networks into account, where each network i
Two-Stage Bagging Pruning for Reducing the Ensemble Size and Improving the Classification Performance
Ensemble methods, such as the traditional bagging algorithm, can usually improve the performance of a single classifier. However, they usually require large storage space as well as relatively time-consuming predictions. Many approaches were developed to reduce the ensemble size and improve the classification performance by pruning the traditional bagging algorithms. In this article, we proposed a two-stage strategy to prune the traditional bagging algorithm by combining two simple approaches: accuracy-based pruning (AP) and distance-based pruning (DP). These two methods, as well as their two combinations, “AP+DP” and “DP+AP” as the two-stage pruning strategy, were all examined. Comparing with the single pruning methods, we found that the two-stage pruning methods can furthermore reduce the ensemble size and improve the classification. “AP+DP” method generally performs better than the “DP+AP” method when using four base classifiers: decision tree, Gaussian naive Bayes, K-nearest neighbor, and logistic regression. Moreover, as compared to the traditional bagging, the two-stage method “AP+DP” improved the classification accuracy by 0.88%, 4.06%, 1.26%, and 0.96%, respectively, averaged over 28 datasets under the four base classifiers. It was also observed that “AP+DP” outperformed other three existing algorithms Brag, Nice, and TB assessed on 8 common datasets. In summary, the proposed two-stage pruning methods are simple and promising approaches, which can both reduce the ensemble size and improve the classification accuracy
Extended morphometric analysis of neuronal cells with Minkowski valuations
Minkowski valuations provide a systematic framework for quantifying different
aspects of morphology. In this paper we apply vector- and tensor-valued
Minkowski valuations to neuronal cells from the cat's retina in order to
describe their morphological structure in a comprehensive way. We introduce the
framework of Minkowski valuations, discuss their implementation for neuronal
cells and show how they can discriminate between cells of different types.Comment: 14 pages, 18 postscript figure
Gaussian mixture model based probabilistic modeling of images for medical image segmentation
In this paper, we propose a novel image segmentation algorithm that is based on the probability distributions of the object and background. It uses the variational level sets formulation with a novel region based term in addition to the edge-based term giving a complementary functional, that can potentially result in a robust segmentation of the images. The main theme of the method is that in most of the medical imaging scenarios, the objects are characterized by some typical characteristics such a color, texture, etc. Consequently, an image can be modeled as a Gaussian mixture of distributions corresponding to the object and background. During the procedure of curve evolution, a novel term is incorporated in the segmentation framework which is based on the maximization of the distance between the GMM corresponding to the object and background. The maximization of this distance using differential calculus potentially leads to the desired segmentation results. The proposed method has been used for segmenting images from three distinct imaging modalities i.e. magnetic resonance imaging (MRI), dermoscopy and chromoendoscopy. Experiments show the effectiveness of the proposed method giving better qualitative and quantitative results when compared with the current state-of-the-art. INDEX TERMS Gaussian Mixture Model, Level Sets, Active Contours, Biomedical Engineerin
- …