721 research outputs found
On the Suitability of Genetic-Based Algorithms for Data Mining
Data mining has as goal to extract knowledge from large databases. A database may be considered as a search space consisting of an enormous number of elements, and a mining algorithm as a search strategy. In general, an exhaustive search of the space is infeasible. Therefore, efficient search strategies are of vital importance. Search strategies on genetic-based algorithms have been applied successfully in a wide range of applications. We focus on the suitability of genetic-based algorithms for data mining. We discuss the design and implementation of a genetic-based algorithm for data mining and illustrate its potentials
Scalable Deep Traffic Flow Neural Networks for Urban Traffic Congestion Prediction
Tracking congestion throughout the network road is a critical component of
Intelligent transportation network management systems. Understanding how the
traffic flows and short-term prediction of congestion occurrence due to
rush-hour or incidents can be beneficial to such systems to effectively manage
and direct the traffic to the most appropriate detours. Many of the current
traffic flow prediction systems are designed by utilizing a central processing
component where the prediction is carried out through aggregation of the
information gathered from all measuring stations. However, centralized systems
are not scalable and fail provide real-time feedback to the system whereas in a
decentralized scheme, each node is responsible to predict its own short-term
congestion based on the local current measurements in neighboring nodes.
We propose a decentralized deep learning-based method where each node
accurately predicts its own congestion state in real-time based on the
congestion state of the neighboring stations. Moreover, historical data from
the deployment site is not required, which makes the proposed method more
suitable for newly installed stations. In order to achieve higher performance,
we introduce a regularized Euclidean loss function that favors high congestion
samples over low congestion samples to avoid the impact of the unbalanced
training dataset. A novel dataset for this purpose is designed based on the
traffic data obtained from traffic control stations in northern California.
Extensive experiments conducted on the designed benchmark reflect a successful
congestion prediction
Parallel sampling of decomposable graphs using Markov chain on junction trees
Bayesian inference for undirected graphical models is mostly restricted to
the class of decomposable graphs, as they enjoy a rich set of properties making
them amenable to high-dimensional problems. While parameter inference is
straightforward in this setup, inferring the underlying graph is a challenge
driven by the computational difficultly in exploring the space of decomposable
graphs. This work makes two contributions to address this problem. First, we
provide sufficient and necessary conditions for when multi-edge perturbations
maintain decomposability of the graph. Using these, we characterize a simple
class of partitions that efficiently classify all edge perturbations by whether
they maintain decomposability. Second, we propose a new parallel non-reversible
Markov chain Monte Carlo sampler for distributions over junction tree
representations of the graph, where at every step, all edge perturbations
within a partition are executed simultaneously. Through simulations, we
demonstrate the efficiency of our new edge perturbation conditions and class of
partitions. We find that our parallel sampler yields improved mixing properties
in comparison to the single-move variate, and outperforms current methods. The
implementation of our work is available in a Python package.Comment: 20 pages, 10 figures, with appendix and supplementary materia
A hierarchical Bayesian model for predicting ecological interactions using scaled evolutionary relationships
Identifying undocumented or potential future interactions among species is a
challenge facing modern ecologists. Recent link prediction methods rely on
trait data, however large species interaction databases are typically sparse
and covariates are limited to only a fraction of species. On the other hand,
evolutionary relationships, encoded as phylogenetic trees, can act as proxies
for underlying traits and historical patterns of parasite sharing among hosts.
We show that using a network-based conditional model, phylogenetic information
provides strong predictive power in a recently published global database of
host-parasite interactions. By scaling the phylogeny using an evolutionary
model, our method allows for biological interpretation often missing from
latent variable models. To further improve on the phylogeny-only model, we
combine a hierarchical Bayesian latent score framework for bipartite graphs
that accounts for the number of interactions per species with the host
dependence informed by phylogeny. Combining the two information sources yields
significant improvement in predictive accuracy over each of the submodels
alone. As many interaction networks are constructed from presence-only data, we
extend the model by integrating a correction mechanism for missing
interactions, which proves valuable in reducing uncertainty in unobserved
interactions.Comment: To appear in the Annals of Applied Statistic
A Skew-Normal Copula-Driven Generalized Linear Mixed Model for Longitudinal Data
Using the advancements of Arellano-Valle et al. [2005], which characterize the likelihood function of a linear mixed model (LMM) under a skew-normal distribution for the random effects, this thesis attempt to construct a copula-driven generalized linear mixed model (GLMM). Assuming a multivariate distribution from the exponential family for the response variable and a skew-normal copula, we drive a complete characterization of the general likelihood function. For estimation, we apply a Monte Carlo expectation maximization (MC-EM) algorithm. Some special cases are discussed, in particular, the exponential and gamma distributions. Simulations with multiple link functions are shown alongside a real data example from the Framingham Heart
Study
- …