1,182 research outputs found
Triadic Measures on Graphs: The Power of Wedge Sampling
Graphs are used to model interactions in a variety of contexts, and there is
a growing need to quickly assess the structure of a graph. Some of the most
useful graph metrics, especially those measuring social cohesion, are based on
triangles. Despite the importance of these triadic measures, associated
algorithms can be extremely expensive. We propose a new method based on wedge
sampling. This versatile technique allows for the fast and accurate
approximation of all current variants of clustering coefficients and enables
rapid uniform sampling of the triangles of a graph. Our methods come with
provable and practical time-approximation tradeoffs for all computations. We
provide extensive results that show our methods are orders of magnitude faster
than the state-of-the-art, while providing nearly the accuracy of full
enumeration. Our results will enable more wide-scale adoption of triadic
measures for analysis of extremely large graphs, as demonstrated on several
real-world examples
Wedge Sampling for Computing Clustering Coefficients and Triangle Counts on Large Graphs
Graphs are used to model interactions in a variety of contexts, and there is
a growing need to quickly assess the structure of such graphs. Some of the most
useful graph metrics are based on triangles, such as those measuring social
cohesion. Algorithms to compute them can be extremely expensive, even for
moderately-sized graphs with only millions of edges. Previous work has
considered node and edge sampling; in contrast, we consider wedge sampling,
which provides faster and more accurate approximations than competing
techniques. Additionally, wedge sampling enables estimation local clustering
coefficients, degree-wise clustering coefficients, uniform triangle sampling,
and directed triangle counts. Our methods come with provable and practical
probabilistic error estimates for all computations. We provide extensive
results that show our methods are both more accurate and faster than
state-of-the-art alternatives.Comment: Full version of SDM 2013 paper "Triadic Measures on Graphs: The Power
of Wedge Sampling" (arxiv:1202.5230
Exchange Rate Forecasting: Evidence from the Emerging Central and Eastern European Economies
There is a vast literature on exchange rate forecasting focusing on developed economies. Since the early 1990s, many developing economies have liberalized their financial accounts, and become an integral part of the international financial system. A series of financial crises experienced by these emerging market economies ed them to switch to some form of a flexible exchange rate regime, coupled with inflation targeting. These developments, in turn, accentuate the need for exchange rate forecasting in such economies. This paper is a first attempt to compile data from the emerging Central and Eastern European (CEE) economies, to evaluate the performance of versions of the monetary model of exchange rate determination, and time series models for forecasting exchange rates. Forecast performance of these models at various horizons are evaluated against that of a random walk, which, overwhelmingly, was found to be the best exchange rate predictor for developed economies in the previous literature. Following Clark and West (2006, 2007) for forecast performance analysis, we report that in short horizons, structural models and time series models outperform the random walk for the six CEE countries in the data set
A Scalable Null Model for Directed Graphs Matching All Degree Distributions: In, Out, and Reciprocal
Degree distributions are arguably the most important property of real world
networks. The classic edge configuration model or Chung-Lu model can generate
an undirected graph with any desired degree distribution. This serves as a good
null model to compare algorithms or perform experimental studies. Furthermore,
there are scalable algorithms that implement these models and they are
invaluable in the study of graphs. However, networks in the real-world are
often directed, and have a significant proportion of reciprocal edges. A
stronger relation exists between two nodes when they each point to one another
(reciprocal edge) as compared to when only one points to the other (one-way
edge). Despite their importance, reciprocal edges have been disregarded by most
directed graph models.
We propose a null model for directed graphs inspired by the Chung-Lu model
that matches the in-, out-, and reciprocal-degree distributions of the real
graphs. Our algorithm is scalable and requires random numbers to
generate a graph with edges. We perform a series of experiments on real
datasets and compare with existing graph models.Comment: Camera ready version for IEEE Workshop on Network Science; fixed some
typos in tabl
Degree Relations of Triangles in Real-world Networks and Models
Triangles are an important building block and distinguishing feature of
real-world networks, but their structure is still poorly understood. Despite
numerous reports on the abundance of triangles, there is very little
information on what these triangles look like. We initiate the study of
degree-labeled triangles -- specifically, degree homogeneity versus
heterogeneity in triangles. This yields new insight into the structure of
real-world graphs. We observe that networks coming from social and
collaborative situations are dominated by homogeneous triangles, i.e., degrees
of vertices in a triangle are quite similar to each other. On the other hand,
information networks (e.g., web graphs) are dominated by heterogeneous
triangles, i.e., the degrees in triangles are quite disparate. Surprisingly,
nodes within the top 1% of degrees participate in the vast majority of
triangles in heterogeneous graphs. We also ask the question of whether or not
current graph models reproduce the types of triangles that are observed in real
data and showed that most models fail to accurately capture these salient
features
Dynamics of Trust Reciprocation in Heterogenous MMOG Networks
Understanding the dynamics of reciprocation is of great interest in sociology
and computational social science. The recent growth of Massively Multi-player
Online Games (MMOGs) has provided unprecedented access to large-scale data
which enables us to study such complex human behavior in a more systematic
manner. In this paper, we consider three different networks in the EverQuest2
game: chat, trade, and trust. The chat network has the highest level of
reciprocation (33%) because there are essentially no barriers to it. The trade
network has a lower rate of reciprocation (27%) because it has the obvious
barrier of requiring more goods or money for exchange; morever, there is no
clear benefit to returning a trade link except in terms of social connections.
The trust network has the lowest reciprocation (14%) because this equates to
sharing certain within-game assets such as weapons, and so there is a high
barrier for such connections because they require faith in the players that are
granted such high access. In general, we observe that reciprocation rate is
inversely related to the barrier level in these networks. We also note that
reciprocation has connections across the heterogeneous networks. Our
experiments indicate that players make use of the medium-barrier reciprocations
to strengthen a relationship. We hypothesize that lower-barrier interactions
are an important component to predicting higher-barrier ones. We verify our
hypothesis using predictive models for trust reciprocations using features from
trade interactions. Using the number of trades (both before and after the
initial trust link) boosts our ability to predict if the trust will be
reciprocated up to 11% with respect to the AUC
Malicious code detection in android : the role of sequence characteristics and disassembling methods
The acceptance and widespread use of the Android operating system drew the attention of both legitimate developers and malware authors, which resulted in a significant number of benign and malicious applications available on various online markets. Since the signature-based methods fall short for detecting malicious software effectively considering the vast number of applications, machine learning techniques in this field have also become widespread. In this context, stating the acquired
accuracy values in the contingency tables in malware detection studies has become a popular and efficient method and enabled researchers to evaluate their methodologies comparatively. In this study, we wanted to investigate and emphasize the factors that may affect the accuracy values of the models managed by researchers, particularly the disassembly method and the input data characteristics. Firstly, we developed a model that tackles the malware detection problem from a Natural Language Processing (NLP) perspective using Long Short-Term Memory (LSTM). Then, we experimented with different base units (instruction, basic block, method, and class) and representations of source code obtained from three commonly used disassembling tools (JEB, IDA, and Apktool) and examined the results. Our findings exhibit that the disassembly method and different input representations affect the model results. More specifically, the datasets collected by the Apktool achieved better results compared to the other two disassemblers
- …