156 research outputs found
Opinion mining of text documents written in Macedonian language
The ability to extract public opinion from web portals such as review sites,
social networks and blogs will enable companies and individuals to form a view,
an attitude and make decisions without having to do lengthy and costly
researches and surveys. In this paper machine learning techniques are used for
determining the polarity of forum posts on kajgana which are written in
Macedonian language. The posts are classified as being positive, negative or
neutral. We test different feature metrics and classifiers and provide detailed
evaluation of their participation in improving the overall performance on a
manually generated dataset. By achieving 92% accuracy, we show that the
performance of systems for automated opinion mining is comparable to a human
evaluator, thus making it a viable option for text data analysis. Finally, we
present a few statistics derived from the forum posts using the developed
system.Comment: In press, MASA proceeding
Random walks on networks: cumulative distribution of cover time
We derive an exact closed-form analytical expression for the distribution of
the cover time for a random walk over an arbitrary graph. In special case, we
derive simplified exact expressions for the distributions of cover time for a
complete graph, a cycle graph, and a path graph. An accurate approximation for
the cover time distribution, with computational complexity of O(2n), is also
presented. The approximation is numerically tested only for graphs with n<=1000
nodes
Analytically solvable processes on networks
We introduce a broad class of analytically solvable processes on networks. In
the special case, they reduce to random walk and consensus process - two most
basic processes on networks. Our class differs from previous models of
interactions (such as stochastic Ising model, cellular automata, infinite
particle system, and voter model) in several ways, two most important being:
(i) the model is analytically solvable even when the dynamical equation for
each node may be different and the network may have an arbitrary finite graph
and influence structure; and (ii) in addition, when local dynamic is described
by the same evolution equation, the model is decomposable: the equilibrium
behavior of the system can be expressed as an explicit function of network
topology and node dynamicsComment: 10 pages, 3 figure
A unifying definition of synchronization for dynamical systems
We propose a unified definition for synchronization. By example we show that
the synchronization phenomena discussed in the dynamical systems literature can
be described within the framework of this definition.Comment: 4 pages, no figures, submitted for publicatio
Tunneling of electrons via rotor-stator molecular interfaces: combined ab initio and model study
Tunneling of electrons through rotor-stator anthracene aldehyde molecular
interfaces is studied with a combined ab initio and model approach. Molecular
electronic structure calculated from first principles is utilized to model
different shapes of tunneling barriers. Together with a rectangular barrier, we
also consider a sinusoidal shape that captures the effects of the molecular
internal structure more realistically. Quasiclassical approach with the
Simmons' formula for current density is implemented. Special attention is paid
on conformational dependence of the tunneling current. Our results confirm that
the presence of the side aldehyde group enhances the interesting electronic
properties of the pure anthracene molecule, making it a bistable system with
geometry dependent transport properties. We also investigate the transition
voltage and we show that confirmation dependent field emission could be
observed in these molecular interfaces at realistically low voltages. The
present study accompanies our previous work where we investigated the coherent
transport via strongly coupled delocalized orbital by application of
Non-equilibrium Green's Function Formalism
Modeling the Spread of Multiple Contagions on Multilayer Networks
A susceptible-infected-susceptible (SIS) model of multiple contagions on
multilayer networks is developed to incorporate different spreading channels
and disease mutations. The basic reproduction number for this model is
estimated analytically. In a special case when considering only compartmental
models, we analytically analyze an example of a model with a mutation driven
strain persistence characterized by the absence of an epidemic threshold. This
model is not related to the network topology and can be observed in both
compartmental models and models on networks. The novel multiple-contagion SIS
model on a multilayer network could help in the understanding of other
spreading phenomena including communicable diseases, cultural characteristics,
addictions, or information spread through e-mail messages, web blogs, and
computer networks
Stacking and stability
Stacking is a general approach for combining multiple models toward greater
predictive accuracy. It has found various application across different domains,
ensuing from its meta-learning nature. Our understanding, nevertheless, on how
and why stacking works remains intuitive and lacking in theoretical insight. In
this paper, we use the stability of learning algorithms as an elemental
analysis framework suitable for addressing the issue. To this end, we analyze
the hypothesis stability of stacking, bag-stacking, and dag-stacking and
establish a connection between bag-stacking and weighted bagging. We show that
the hypothesis stability of stacking is a product of the hypothesis stability
of each of the base models and the combiner. Moreover, in bag-stacking and
dag-stacking, the hypothesis stability depends on the sampling strategy used to
generate the training set replicates. Our findings suggest that 1) subsampling
and bootstrap sampling improve the stability of stacking, and 2) stacking
improves the stability of both subbagging and bagging.Comment: 15 pages, 1 figur
On the structure of the world economy: An absorbing Markov chain approach
The expansion of global production networks has raised many important
questions about the interdependence among countries and how future changes in
the world economy are likely to affect the countries' positioning in global
value chains. We are approaching the structure and lengths of value chains from
a completely different perspective than has been available so far. By assigning
a random endogenous variable to a network linkage representing the number of
intermediate sales/purchases before absorption (final use or value added), the
discrete-time absorbing Markov chains proposed here shed new light on the world
input/output networks. The variance of this variable can help assess the risk
when shaping the chain length and optimize the level of production. Contrary to
what might be expected simply on the basis of comparative advantage, the
results reveal that both the input and output chains exhibit the same
quasi-stationary product distribution. Put differently, the expected proportion
of time spent in a state before absorption is invariant to changes of the
network type. Finally, the several global metrics proposed here, including the
probability distribution of global value added/final output, provide guidance
for policy makers when estimating the resilience of world trading system and
forecasting the macroeconomic developments
Stability of decision trees and logistic regression
Decision trees and logistic regression are one of the most popular and
well-known machine learning algorithms, frequently used to solve a variety of
real-world problems. Stability of learning algorithms is a powerful tool to
analyze their performance and sensitivity and subsequently allow researchers to
draw reliable conclusions. The stability of these two algorithms has remained
obscure. To that end, in this paper, we derive two stability notions for
decision trees and logistic regression: hypothesis and pointwise hypothesis
stability. Additionally, we derive these notions for L2-regularized logistic
regression and confirm existing findings that it is uniformly stable. We show
that the stability of decision trees depends on the number of leaves in the
tree, i.e., its depth, while for logistic regression, it depends on the
smallest eigenvalue of the Hessian matrix of the cross-entropy loss. We show
that logistic regression is not a stable learning algorithm. We construct the
upper bounds on the generalization error of all three algorithms. Moreover, we
present a novel stability measuring framework that allows one to measure the
aforementioned notions of stability. The measures are equivalent to estimates
of expected loss differences at an input example and then leverage bootstrap
sampling to yield statistically reliable estimates. Finally, we apply this
framework to the three algorithms analyzed in this paper to confirm our
theoretical findings and, in addition, we discuss the possibilities of
developing new training techniques to optimize the stability of logistic
regression, and hence decrease its generalization error.Comment: 13 page
Beyond network structure: How heterogenous susceptibility modulates the spread of epidemics
The compartmental models used to study epidemic spreading often assume the
same susceptibility for all individuals, and are therefore, agnostic about the
effects that differences in susceptibility can have on epidemic spreading. Here
we show that--for the SIS model--differential susceptibility can make networks
more vulnerable to the spread of diseases when the correlation between a node's
degree and susceptibility are positive, and less vulnerable when this
correlation is negative. Moreover, we show that networks become more likely to
contain a pocket of infection when individuals are more likely to connect with
others that have similar susceptibility (the network is segregated). These
results show that the failure to include differential susceptibility to
epidemic models can lead to a systematic over/under estimation of fundamental
epidemic parameters when the structure of the networks is not independent from
the susceptibility of the nodes or when there are correlations between the
susceptibility of connected individuals.Comment: 13 pages, 2 figure
- …