Search CORE

57 research outputs found

Data-driven Computational Social Science: A Survey

Author: Lin Yu-Ru
Tong Hanghang
Wang Wei
Xia Feng
Zhang Jun
Publication venue: 'Elsevier BV'
Publication date: 01/01/2020
Field of study

Social science concerns issues on individuals, relationships, and the whole society. The complexity of research topics in social science makes it the amalgamation of multiple disciplines, such as economics, political science, and sociology, etc. For centuries, scientists have conducted many studies to understand the mechanisms of the society. However, due to the limitations of traditional research methods, there exist many critical social issues to be explored. To solve those issues, computational social science emerges due to the rapid advancements of computation technologies and the profound studies on social science. With the aids of the advanced research techniques, various kinds of data from diverse areas can be acquired nowadays, and they can help us look into social problems with a new eye. As a result, utilizing various data to reveal issues derived from computational social science area has attracted more and more attentions. In this paper, to the best of our knowledge, we present a survey on data-driven computational social science for the first time which primarily focuses on reviewing application domains involving human dynamics. The state-of-the-art research on human dynamics is reviewed from three aspects: individuals, relationships, and collectives. Specifically, the research methodologies used to address research challenges in aforementioned application domains are summarized. In addition, some important open challenges with respect to both emerging research topics and research methods are discussed.Comment: 28 pages, 8 figure

arXiv.org e-Print Archive

Federation ResearchOnline

Network-based ranking in social systems: three challenges

Author: Lü Linyuan
Mariani Manuel S.
Publication venue: 'IOP Publishing'
Publication date: 29/05/2020
Field of study

Ranking algorithms are pervasive in our increasingly digitized societies, with important real-world applications including recommender systems, search engines, and influencer marketing practices. From a network science perspective, network-based ranking algorithms solve fundamental problems related to the identification of vital nodes for the stability and dynamics of a complex system. Despite the ubiquitous and successful applications of these algorithms, we argue that our understanding of their performance and their applications to real-world problems face three fundamental challenges: (i) Rankings might be biased by various factors; (2) their effectiveness might be limited to specific problems; and (3) agents' decisions driven by rankings might result in potentially vicious feedback mechanisms and unhealthy systemic consequences. Methods rooted in network science and agent-based modeling can help us to understand and overcome these challenges.Comment: Perspective article. 9 pages, 3 figure

arXiv.org e-Print Archive

ZORA

ComLittee: Literature Discovery with Personal Elected Author Committees

Author: Bragg Jonathan
Chang Joseph Chee
Kang Hyeonsu B.
Latzke Matt
Soliman Nouran
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 13/02/2023
Field of study

In order to help scholars understand and follow a research topic, significant research has been devoted to creating systems that help scholars discover relevant papers and authors. Recent approaches have shown the usefulness of highlighting relevant authors while scholars engage in paper discovery. However, these systems do not capture and utilize users' evolving knowledge of authors. We reflect on the design space and introduce ComLittee, a literature discovery system that supports author-centric exploration. In contrast to paper-centric interaction in prior systems, ComLittee's author-centric interaction supports curation of research threads from individual authors, finding new authors and papers with combined signals from a paper recommender and the curated authors' authorship graphs, and understanding them in the context of those signals. In a within-subjects experiment that compares to an author-highlighting approach, we demonstrate how ComLittee leads to a higher efficiency, quality, and novelty in author discovery that also improves paper discovery

arXiv.org e-Print Archive

Representation Learning for Texts and Graphs: A Unified Perspective on Efficiency, Multimodality, and Adaptability

Author: Galke Lukas Paul Achatius
Publication venue: Universitatsbibliothek Kiel
Publication date: 01/01/2023
Field of study

[...] This thesis is situated between natural language processing and graph representation learning and investigates selected connections. First, we introduce matrix embeddings as an efficient text representation sensitive to word order. [...] Experiments with ten linguistic probing tasks, 11 supervised, and five unsupervised downstream tasks reveal that vector and matrix embeddings have complementary strengths and that a jointly trained hybrid model outperforms both. Second, a popular pretrained language model, BERT, is distilled into matrix embeddings. [...] The results on the GLUE benchmark show that these models are competitive with other recent contextualized language models while being more efficient in time and space. Third, we compare three model types for text classification: bag-of-words, sequence-, and graph-based models. Experiments on five datasets show that, surprisingly, a wide multilayer perceptron on top of a bag-of-words representation is competitive with recent graph-based approaches, questioning the necessity of graphs synthesized from the text. [...] Fourth, we investigate the connection between text and graph data in document-based recommender systems for citations and subject labels. Experiments on six datasets show that the title as side information improves the performance of autoencoder models. [...] We find that the meaning of item co-occurrence is crucial for the choice of input modalities and an appropriate model. Fifth, we introduce a generic framework for lifelong learning on evolving graphs in which new nodes, edges, and classes appear over time. [...] The results show that by reusing previous parameters in incremental training, it is possible to employ smaller history sizes with only a slight decrease in accuracy compared to training with complete history. Moreover, weighting the binary cross-entropy loss function is crucial to mitigate the problem of class imbalance when detecting newly emerging classes. [...

MACAU: Open Access Repository of Kiel University

NLP Driven Models for Automatically Generating Survey Articles for Scientific Topics.

Author: Jha Rahul Kumar
Publication venue
Publication date: 01/01/2015
Field of study

This thesis presents new methods that use natural language processing (NLP) driven models for summarizing research in scientific fields. Given a topic query in the form of a text string, we present methods for finding research articles relevant to the topic as well as summarization algorithms that use lexical and discourse information present in the text of these articles to generate coherent and readable extractive summaries of past research on the topic. In addition to summarizing prior research, good survey articles should also forecast future trends. With this motivation, we present work on forecasting future impact of scientific publications using NLP driven features.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113407/1/rahuljha_1.pd

Deep Blue Documents at the University of Michigan

Structure-oriented prediction in complex networks

Author: Acuna
Adamic
Albert
Almeida-Neto
Altarelli
Al Hasan
Al Hasan
An Zeng
Angelini
Asur
Azar
Balassa
Barabási
Barabási
Barabási
Barabâsi
Barrat
Barzel
Barzel
Bastolla
Beckett
Bianconi
Blei
Blumenstock
Bobadilla
Boers
Bollen
Bond
Bonneau
Borgatti
Borghol
Borghol
Bornholdt
Box
Braun
Breese
Brin
Bringmann
Brockmann
Brockmann
Brockmann
Brody
Burrell
Bustos
Caldarelli
Caldarelli
Carmi
Castellano
Castillo
Cha
Chakraborty
Chatfield
Chen
Chen
Chen
Cherry
Choi
Choi
Chowdhury
Cimini
Clauset
Clauset
Cohen
Colizza
Colizza
Coscia
Crane
Cristelli
Cristelli
Cristelli
Cristelli
Dangalchev
Dasgupta
da Fontoura Costa
Da Silva
Demetrius
De Domenico
De Domenico
Donges
Dorogovtsev
Dorogovtsev
Dunlavy
Eagle
Egghe
Eom
Ercsey-Ravasz
Ercsey-Ravasz
Erdos
Eysenbach
Fan
Farkas
Felipe
Fiasconaro
Fortunato
Fortunato
Fortunato
Foti
Fouss
Furlaneto
Furney
Gao
Gayo-Avello
Gayo Avello
Gfeller
Ghoshal
Gilbert
Ginsberg
Glattfelder
Goh
Goldberg
Gonzalez
Grady
Guimerà
Guimerà
Guo
Guo
Guo
Güell
Hamilton
Han
Hartmann
Hausmann
Heckerman
Hidalgo
Hidalgo
Hidalgo
Hirsch
Hirsch
Hofman
Holme
Hopcroft
Hope
Hric
Hsing
Hu
Hulovatyy
Hébert-Dufresne
Ideker
Iles
Itzkovitz
Jaccard
Jasny
Jia
Kann
Kantor
Karan
Kashima
Katz
Ke
Keeling
Kendall
Keshavan
Kim
Kim
Kim
Kitsak
Koren
Kossinets
Krapivsky
König
Lagomarsino
Latif
Lazer
Lee
Leicht
Lerman
Li
Li
Li
Li
Liao
Liben-Nowell
Liebig
Lim
Liu
Liu
Liu
Liu
Liu
Liu
Loecher
Lu
Lu
Luo
Lynch
Lü
Lü
Lü
Lü
Lü
Lü
Lü
Macdonald
Malliaros
Manshour
Mariani
Mariani
Mariani
Marotta
Mazloumian
Medo
Medo
Medo
Menon
Mestyán
Mewes
Miotto
Mirshahvalad
Moreno-Bote
Motter
Mucha
Newman
Newman
Newman
Newman
Ni
Niu
Oghina
Onnela
Orsini
Ou
Palla
Pan
Parthasarathy
Pearl
Peel
Pei
Peng
Penner
Pereira-Leal
Podsiadlo
Porter
Poulin
Pugliese
Pugliese
Pugliese
Qi
Radicchi
Ramasco
Ratkiewicz
Ravasz
Ren
Ren
Ren
Resnick
Rohr
Rossi
Rual
Sabidussi
Salganik
Sarigöl
Sarwar
Scellato
Schaeffer
Schafer
Schiff
Schweitzer
Sekara
Serrano
Shang
Shang
Sharda
Shen
Shen
Sikdar
Sinatra
Slater
Snijder
Song
Song
Sreenivasan
Stark
Stojkoski
Stojmirovic
Su
Su
Sun
Sun
Szabo
Sørensen
Tacchella
Tacchella
Takaguchi
Tan
Tang
Tsagkias
Tsagkias
Tsonis
Tsonis
Tumasjan
Tumminello
Tzekina
Ugander
Ulrich
Ungar
Van Raan
Vasconcelos
Vespignani
Vidmer
Walker
Wang
Wang
Wang
Wang
Wang
Wang
Wang
Wang
Wang
Wang
Wang
Wasserman
Watts
Weng
West
Witthaut
Wu
Wu
Wu
Xiao
Xie
Yan
Yang
Yao
Yeh
Yi-Cheng Zhang
Yin
Young
Yu
Yu
Yu
Yu
Zaccaria
Zaccaria
Zeng
Zeng
Zeng
Zeng
Zeng
Zeng
Zeng
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhang
Zhao
Zhong
Zhou
Zhou
Zhou
Zhou
Zhou
Zhou
Zhuo-Ming Ren
Publication venue
Publication date: 23/11/2018
Field of study

Complex systems are extremely hard to predict due to its highly nonlinear interactions and rich emergent properties. Thanks to the rapid development of network science, our understanding of the structure of real complex systems and the dynamics on them has been remarkably deepened, which meanwhile largely stimulates the growth of effective prediction approaches on these systems. In this article, we aim to review different network-related prediction problems, summarize and classify relevant prediction methods, analyze their advantages and disadvantages, and point out the forefront as well as critical challenges of the field

Crossref

RERO DOC Digital Library

Statistical Tools for Network Data: Prediction and Resampling

Author: Li Tianxi
Publication venue
Publication date: 01/01/2018
Field of study

Advances in data collection and social media have led to more and more network data appearing in diverse areas, such as social sciences, internet, transportation and biology. This thesis develops new principled statistical tools for network analysis, with emphasis on both appealing statistical properties and computational efficiency. Our first project focuses on building prediction models for network-linked data. Prediction algorithms typically assume the training data are independent samples, but in many modern applications samples come from individuals connected by a network. For example, in adolescent health studies of risk-taking behaviors, information on the subjects' social network is often available and plays an important role through network cohesion, the empirically observed phenomenon of friends behaving similarly. Taking cohesion into account in prediction models should allow us to improve their performance. We propose a network-based penalty on individual node effects to encourage similarity between predictions for linked nodes, and show that incorporating it into prediction leads to improvement over traditional models both theoretically and empirically when network cohesion is present. The penalty can be used with many loss-based prediction methods, such as regression, generalized linear models, and Cox's proportional hazard model. Applications to predicting levels of recreational activity and marijuana usage among teenagers from the AddHealth study based on both demographic covariates and friendship networks are discussed in detail. Our approach to taking friendships into account can significantly improve predictions of behavior while providing interpretable estimates of covariate effects. Resampling, data splitting, and cross-validation are powerful general strategies in statistical inference, but resampling from a network remains a challenging problem. Many statistical models and methods for networks need model selection and tuning parameters, which could be done by cross-validation if we had a good method for splitting network data; however, splitting network nodes into groups requires deleting edges and destroys some of the structure. Here we propose a new network cross-validation strategy based on splitting edges rather than nodes, which avoids losing information and is applicable to a wide range of network models. We provide a theoretical justification for our method in a general setting and demonstrate how our method can be used in a number of specific model selection and parameter tuning tasks, with extensive numerical results on simulated networks. We also apply the method to analysis of a citation network of statisticians and obtain meaningful research communities. Finally, we consider the problem of community detection on partially observed networks. However, in practice, network data are often collected through sampling mechanisms, such as survey questionnaires, instead of direct observation. The noise and bias introduced by such sampling mechanisms can obscure the community structure and invalidate the assumptions of standard community detection methods. We propose a model to incorporate neighborhood sampling, through a model reflective of survey designs, into community detection for directed networks, since friendship networks obtained from surveys are naturally directed. We model the edge sampling probabilities as a function of both individual preferences and community parameters, and fit the model by a combination of spectral clustering and the method of moments. The algorithm is computationally efficient and comes with a theoretical guarantee of consistency. We evaluate the proposed model in extensive simulation studies and applied it to a faculty hiring dataset, discovering a meaningful hierarchy of communities among US business schools.PHDStatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145894/1/tianxili_1.pd

Deep Blue Documents at the University of Michigan

Mining and Analyzing the Academic Network

Author: Yang Zaihan
Publication venue: Lehigh Preserve
Publication date
Field of study

Social Network research has attracted the interests of many researchers, not only in analyzing the online social networking applications, such as Facebook and Twitter, but also in providing comprehensive services in scientific research domain. We define an Academic Network as a social network which integrates scientific factors, such as authors, papers, affiliations, publishing venues, and their relationships, such as co-authorship among authors and citations among papers. By mining and analyzing the academic network, we can provide users comprehensive services as searching for research experts, published papers, conferences, as well as detecting research communities or the evolutions hot research topics. We can also provide recommendations to users on with whom to collaborate, whom to cite and where to submit.In this dissertation, we investigate two main tasks that have fundamental applications in the academic network research. In the first, we address the problem of expertise retrieval, also known as expert finding or ranking, in which we identify and return a ranked list of researchers, based upon their estimated expertise or reputation, to user-specified queries. In the second, we address the problem of research action recommendation (prediction), specifically, the tasks of publishing venue recommendation, citation recommendation and coauthor recommendation. For both tasks, to effectively mine and integrate heterogeneous information and therefore develop well-functioning ranking or recommender systems is our principal goal. For the task of expertise retrieval, we first proposed or applied three modified versions of PageRank-like algorithms into citation network analysis; we then proposed an enhanced author-topic model by simultaneously modeling citation and publishing venue information; we finally incorporated the pair-wise learning-to-rank algorithm into traditional topic modeling process, and further improved the model by integrating groups of author-specific features. For the task of research action recommendation, we first proposed an improved neighborhood-based collaborative filtering approach for publishing venue recommendation; we then applied our proposed enhanced author-topic model and demonstrated its effectiveness in both cited author prediction and publishing venue prediction; finally we proposed an extended latent factor model that can jointly model several relations in an academic environment in a unified way and verified its performance in four recommendation tasks: the recommendation on author-co-authorship, author-paper citation, paper-paper citation and paper-venue submission. Extensive experiments conducted on large-scale real-world data sets demonstrated the superiority of our proposed models over other existing state-of-the-art methods

Lehigh University: Lehigh Preserve

Higher-Order Networks in Complex Systems: Temporality and Interconnectivity

Author: Wider Nicolas
Publication venue: ETH Zürich
Publication date: 01/01/2016
Field of study

Repository for Publications and Research Data

Learning representations for graph-structured socio-technical systems

Author: Piaggesi Simone <1992>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 16/06/2023
Field of study

The recent widespread use of social media platforms and web services has led to a vast amount of behavioral data that can be used to model socio-technical systems. A significant part of this data can be represented as graphs or networks, which have become the prevalent mathematical framework for studying the structure and the dynamics of complex interacting systems. However, analyzing and understanding these data presents new challenges due to their increasing complexity and diversity. For instance, the characterization of real-world networks includes the need of accounting for their temporal dimension, together with incorporating higher-order interactions beyond the traditional pairwise formalism. The ongoing growth of AI has led to the integration of traditional graph mining techniques with representation learning and low-dimensional embeddings of networks to address current challenges. These methods capture the underlying similarities and geometry of graph-shaped data, generating latent representations that enable the resolution of various tasks, such as link prediction, node classification, and graph clustering. As these techniques gain popularity, there is even a growing concern about their responsible use. In particular, there has been an increased emphasis on addressing the limitations of interpretability in graph representation learning. This thesis contributes to the advancement of knowledge in the field of graph representation learning and has potential applications in a wide range of complex systems domains. We initially focus on forecasting problems related to face-to-face contact networks with time-varying graph embeddings. Then, we study hyperedge prediction and reconstruction with simplicial complex embeddings. Finally, we analyze the problem of interpreting latent dimensions in node embeddings for graphs. The proposed models are extensively evaluated in multiple experimental settings and the results demonstrate their effectiveness and reliability, achieving state-of-the-art performances and providing valuable insights into the properties of the learned representations

AMS Tesi di Dottorato