12,707 research outputs found
Language in Our Time: An Empirical Analysis of Hashtags
Hashtags in online social networks have gained tremendous popularity during
the past five years. The resulting large quantity of data has provided a new
lens into modern society. Previously, researchers mainly rely on data collected
from Twitter to study either a certain type of hashtags or a certain property
of hashtags. In this paper, we perform the first large-scale empirical analysis
of hashtags shared on Instagram, the major platform for hashtag-sharing. We
study hashtags from three different dimensions including the temporal-spatial
dimension, the semantic dimension, and the social dimension. Extensive
experiments performed on three large-scale datasets with more than 7 million
hashtags in total provide a series of interesting observations. First, we show
that the temporal patterns of hashtags can be categorized into four different
clusters, and people tend to share fewer hashtags at certain places and more
hashtags at others. Second, we observe that a non-negligible proportion of
hashtags exhibit large semantic displacement. We demonstrate hashtags that are
more uniformly shared among users, as quantified by the proposed hashtag
entropy, are less prone to semantic displacement. In the end, we propose a
bipartite graph embedding model to summarize users' hashtag profiles, and rely
on these profiles to perform friendship prediction. Evaluation results show
that our approach achieves an effective prediction with AUC (area under the ROC
curve) above 0.8 which demonstrates the strong social signals possessed in
hashtags.Comment: WWW 201
edge2vec: Representation learning using edge semantics for biomedical knowledge discovery
Representation learning provides new and powerful graph analytical approaches
and tools for the highly valued data science challenge of mining knowledge
graphs. Since previous graph analytical methods have mostly focused on
homogeneous graphs, an important current challenge is extending this
methodology for richly heterogeneous graphs and knowledge domains. The
biomedical sciences are such a domain, reflecting the complexity of biology,
with entities such as genes, proteins, drugs, diseases, and phenotypes, and
relationships such as gene co-expression, biochemical regulation, and
biomolecular inhibition or activation. Therefore, the semantics of edges and
nodes are critical for representation learning and knowledge discovery in real
world biomedical problems. In this paper, we propose the edge2vec model, which
represents graphs considering edge semantics. An edge-type transition matrix is
trained by an Expectation-Maximization approach, and a stochastic gradient
descent model is employed to learn node embedding on a heterogeneous graph via
the trained transition matrix. edge2vec is validated on three biomedical domain
tasks: biomedical entity classification, compound-gene bioactivity prediction,
and biomedical information retrieval. Results show that by considering
edge-types into node embedding learning in heterogeneous graphs,
\textbf{edge2vec}\ significantly outperforms state-of-the-art models on all
three tasks. We propose this method for its added value relative to existing
graph analytical methodology, and in the real world context of biomedical
knowledge discovery applicability.Comment: 10 page
Free associations mirroring self- and world-related concepts: Implications for personal construct theory, psycholinguistics and philosophical psychology
People construe reality by using words as basic units of meaningful categorization. The present theory-driven study applied the method of a free association task to explore how people express the concepts of the world and the self in words. The respondents were asked to recall any five words relating with the word world. Afterwards they were asked to recall any five words relating with the word self. The method of free association provided the respondents with absolute freedom to choose any words they wanted. Such free recall task is suggested as being a relatively direct approach to the respondents’ self- and world-related conceptual categories, without enormous rational processing. The results provide us, first, with associative ranges for constructs of the world and the self, where some associative dimensions are defined by semantic polarities in the meanings of peripheral categories (e.g., Nature vs. Culture). Second, our analysis showed that some groups of verbal categories that were associated with the words world and self are central, while others are peripheral with respect to the central position. Third, the analysis of category networks revealed that some categories play the role of a transmitter, mediating the pathway between other categories in the network
Easing Embedding Learning by Comprehensive Transcription of Heterogeneous Information Networks
Heterogeneous information networks (HINs) are ubiquitous in real-world
applications. In the meantime, network embedding has emerged as a convenient
tool to mine and learn from networked data. As a result, it is of interest to
develop HIN embedding methods. However, the heterogeneity in HINs introduces
not only rich information but also potentially incompatible semantics, which
poses special challenges to embedding learning in HINs. With the intention to
preserve the rich yet potentially incompatible information in HIN embedding, we
propose to study the problem of comprehensive transcription of heterogeneous
information networks. The comprehensive transcription of HINs also provides an
easy-to-use approach to unleash the power of HINs, since it requires no
additional supervision, expertise, or feature engineering. To cope with the
challenges in the comprehensive transcription of HINs, we propose the HEER
algorithm, which embeds HINs via edge representations that are further coupled
with properly-learned heterogeneous metrics. To corroborate the efficacy of
HEER, we conducted experiments on two large-scale real-words datasets with an
edge reconstruction task and multiple case studies. Experiment results
demonstrate the effectiveness of the proposed HEER model and the utility of
edge representations and heterogeneous metrics. The code and data are available
at https://github.com/GentleZhu/HEER.Comment: 10 pages. In Proceedings of the 24th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, London, United Kingdom,
ACM, 201
- …