558 research outputs found

    The Interdependence of Scientists in the Era of Team Science: An Exploratory Study Using Temporal Network Analysis

    Get PDF
    How is the rise in team science and the emergence of the research group as the fundamental unit of organization of science affecting scientists’ opportunities to collaborate? Are the majority of scientists becoming dependent on a select subset of their peers to organize the intergroup collaborations that are becoming the norm in science? This dissertation set out to explore the evolving nature of scientists’ interdependence in team-based research environments. The research was motivated by the desire to reconcile emerging views on the organization of scientific collaboration with the theoretical and methodological tendencies to think about and study scientists as autonomous actors who negotiate collaboration in a dyadic manner. Complex Adaptive Social Systems served as the framework for understanding the dynamics involved in the formation of collaborative relationships. Temporal network analysis at the mesoscopic level was used to study the collaboration dynamics of a specific research community, in this case the genomic research community emerging around GenBank, the international nucleotide sequence databank. The investigation into the dynamics of the mesoscopic layer of a scientific collaboration networked revealed the following—(1) there is a prominent half-life to collaborative relationships; (2) the half-life can be used to construct weighted decay networks for extracting the group structure influencing collaboration; (3) scientists across all levels of status are becoming increasingly interdependent, with the qualification that interdependence is highly asymmetrical, and (4) the group structure is increasingly influential on the collaborative interactions of scientists. The results from this study advance theoretical and empirical understanding of scientific collaboration in team-based research environments and methodological approaches to studying temporal networks at the mesoscopic level. The findings also have implications for policy researchers interested in the career cycles of scientists and the maintenance and building of scientific capacity in research areas of national interest

    User identification and community exploration via mining big personal data in online platforms

    Get PDF
    User-generated big data mining is vital important for large online platforms in terms of security, profits improvement, products recommendation and system management. Personal attributes recognition, user behavior prediction, user identification, and community detection are the most critical and interesting issues that remain as challenges in many real applications in terms of accuracy, efficiency and data security. For an online platform with tens of thousands of users, it is always vulnerable to malicious users who pose a threat to other innocent users and consume unnecessary resources, where accurate user identification is urgently required to prevent corresponding malicious attempts. Meanwhile, accurate prediction of user behavior will help large platforms provide satisfactory recommendations to users and efficiently allocate different amounts of resources to different users. In addition to individual identification, community exploration of large social networks that formed by online databases could also help managers gain knowledge of how a community evolves. And such large scale and diverse social networks can be used to validate network theories, which are previously developed from synthetic networks or small real networks. In this thesis, we study several specific cases to address some key challenges that remain in different types of large online platforms, such as user behavior prediction for cold-start users, privacy protection for user-generated data, and large scale and diverse social community analysis. In the first case, as an emerging business, online education has attracted tens of thousands users as it can provide diverse courses that can exactly satisfy whatever demands of the students. Due to the limitation of public school systems, many students pursue private supplementary tutoring for improving their academic performance. Similar to online shopping platform, online education system is also a user-product based service, where users usually have to select and purchase the courses that meet their demands. It is important to construct a course recommendation and user behavior prediction system based on user attributes or user-generated data. Item recommendation in current online shopping systems is usually based on the interactions between users and products, since most of the personal attributes are unnecessary for online shopping services, and users often provide false information during registration. Therefore, it is not possible to recommend items based on personal attributes by exploiting the similarity of attributes among users, such as education level, age, school, gender, etc. Different from most online shopping platforms, online education platforms have access to a large number of credible personal attributes since accurate personal information is important in education service, and user behaviors could be predicted with just user attribute. Moreover, previous works on learning individual attributes are based primarily on panel survey data, which ensures its credibility but lacks efficiency. Therefore, most works simply include hundreds or thousands of users in the study. With more than 200,000 anonymous K-12 students' 3-year learning data from one of the world's largest online extra-curricular education platforms, we uncover students' online learning behaviors and infer the impact of students' home location, family socioeconomic situation and attended school's reputation/rank on the students' private tutoring course participation and learning outcomes. Further analysis suggests that such impact may be largely attributed to the inequality of access to educational resources in different cities and the inequality in family socioeconomic status. Finally, we study the predictability of students' performance and behaviors using machine learning algorithms with different groups of features, showing students' online learning performance can be predicted based on personal attributes and user-generated data with MAE<10%<10\%. As mentioned above, user attributes are usually fake information in most online platforms, and online platforms are usually vulnerable of malicious users. It is very important to identify the users or verify their attributes. Many researches have used user-generated mobile phone data (which includes sensitive information) to identify diverse user attributes, such as social economic status, ages, education level, professions, etc. Most of these approaches leverage original sensitive user data to build feature-rich models that take private information as input, such as exact locations, App usages and call detailed records. However, accessing users' mobile phone raw data may violate the more and more strict private data protection policies and regulations (e.g. GDPR). We observe that appropriate statistical methods can offer an effective means to eliminate private information and preserve personal characteristics, thus enabling the identification of the user attributes without privacy concern. Typically, identifying an unfamiliar caller's profession is important to protect citizens' personal safety and property. Due to limited data protection of various popular online services in some countries such as taxi hailing or takeouts ordering, many users nowadays encounter an increasing number of phone calls from strangers. The situation may be aggravated when criminals pretend to be such service delivery staff, bringing threats to the user individuals as well as the society. Additionally, more and more people suffer from excessive digital marketing and fraud phone calls because of personal information leakage. Therefore, a real time identification of unfamiliar caller is urgently needed. We explore the feasibility of user identification with privacy-preserved user-generated mobile, and we develop CPFinder, a system which implements automatic user identification callers on end devices. The system could mainly identify four categories of users: taxi drivers, delivery and takeouts staffs, telemarketers and fraudsters, and normal users (other professions). Our evaluation over an anonymized dataset of 1,282 users with a period of 3 months in Shanghai City shows that the CPFinder can achieve an accuracy of 75+\% for multi-class classification and 92.35+\% for binary classification. In addition to the mining of personal attributes and behaviors, the community mining of a large group of people based on online big data also attracts lots of attention due to the accessibility of large scale social network in online platforms. As one of the very important branch of social network, scientific collaboration network has been studied for decades as online big publication databases are easy to access and many user attribute are available. Academic collaborations become regular and the connections among researchers become closer due to the prosperity of globalized academic communications. It has been found that many computer science conferences are closed communities in terms of the acceptance of newcomers' papers, especially are the well-regarded conferences~\cite{cabot2018cs}. However, an in-depth study on the difference in the closeness and structural features of different conferences and what caused these differences is still missing. %Also, reviewing the strong and weak tie theories, there are multifaceted influences exerted by the combination of this two types of ties in different context. More analysis is needed to determine whether the network is closed or has other properties. We envision that social connections play an increasing role in the academic society and influence the paper selection process. The influences are not only restricted within visible links, but also extended to weak ties that connect two distanced node. Previous studies of coauthor networks did not adequately consider the central role of some authors in the publication venues, such as \ac{PC} chairs of the conferences. Such people could influence the evolutionary patterns of coauthor networks due to their authorities and trust for members to select accepted papers and their core positions in the community. Thus, in addition to the ratio of newcomers' papers it would be interesting if the PC chairs' relevant metrics could be quantified to measure the closure of a conference from the perspective of old authors' papers. Additionally, the analysis of the differences among different conferences in terms of the evolution of coauthor networks and degree of closeness may disclose the formation of closed communities. Therefore, we will introduce several different outcomes due to the various structural characteristics of several typical conferences. In this paper, using the DBLP dataset of computer science publications and a PC chair dataset, we show the evidence of the existence of strong and weak ties in coauthor networks and the PC chairs' influences are also confirmed to be related with the tie strength and network structural properties. Several PC chair relevant metrics based on coauthor networks are introduced to measure the closure and efficiency of a conference.2021-10-2

    Gender Disparities in Science? Dropout, Productivity, Collaborations and Success of Male and Female Computer Scientists

    Get PDF
    Scientific collaborations shape ideas as well as innovations and are both the substrate for, and the outcome of, academic careers. Recent studies show that gender inequality is still present in many scientific practices ranging from hiring to peer-review processes and grant applications. In this work, we investigate gender-specific differences in collaboration patterns of more than one million computer scientists over the course of 47 years. We explore how these patterns change over years and career ages and how they impact scientific success. Our results highlight that successful male and female scientists reveal the same collaboration patterns: compared to scientists in the same career age, they tend to collaborate with more colleagues than other scientists, seek innovations as brokers and establish longer-lasting and more repetitive collaborations. However, women are on average less likely to adapt the collaboration patterns that are related with success, more likely to embed into ego networks devoid of structural holes, and they exhibit stronger gender homophily as well as a consistently higher dropout rate than men in all career ages

    The Social Structure of the Information Systems Collaboration Network: Centers of Influence and Antecedents of Tie Formation

    Get PDF
    In this study, we examine the historical information systems research collaboration network. We build the network using coauthorship information in the Senior Scholars’ basket of eight journals from the publication of MISQ’s first issue in April, 1977, to November, 2015. The different journals vary widely in their network configurations. We examine the influence of gender homophily, geographic homophily, and field tenure heterophily on coauthorship in the network. From using exponential random graph modeling (ERGM) on a randomly selected subset of the network, we present preliminary evidence that suggests that ties in the IS collaboration network exhibit homophily according to gender and geography. Conversely, coauthorship seems to exhibit heterophily along the temporal dimension: short-tenured researchers in the field prefer to collaborate with long-tenured researchers. ERGM enables one to make statistical inferences concerning the influence of node attributes and structural variables on network formation, which is hard to do with logistical regression because network relationships violate the independence of observations assumption. We also reveal the current center of the IS collaboration network. Based on this center, we propose a metric to measure a researcher’s connectedness in the network

    Attention-Based Deep Learning Model for Predicting Collaborations Between Different Research Affiliations

    Get PDF
    It is challenging but important to predict the collaborations between different entities which in academia, for example, would enable finding evaluating trends of scientific research collaboration and the provision of decision support for policy formulation and incentive measures. In this paper, we propose an attention-based Long Short-Term Memory Convolutional Neural Network (LSTM-CNN) model to predict the collaborations between different research affiliations, which takes both the influence of research articles and time (year) relationships into consideration. The experimental results show that the proposed model outperforms the competitive Support Vector Machine (SVM), CNN and LSTM methods. It significantly improves the prediction precision by a minimum of 3.23 percent points and up to 10.80 percent points when compared with the mentioned competitive methods, while in terms of the F1-score, the performance is improved by 13.48, 4.85 and 4.24 percent points, respectively.This work was supported in part by the Humanities and Social Science Research Project of the Ministry of Education in China under Grant 17YJCZH262 and Grant 18YJAZH136, in part by the National Natural Science Foundation of China under Grant 61303167, Grant 61702306, Grant 61433012, Grant U1435215, and Grant 71772107, in part by the Natural Science Foundation of Shandong Province under Grant ZR2018BF013 and Grant ZR2017BF015, in part by the Innovative Research Foundation of Qingdao under Grant 18-2-2-41-jch, in part by the Key Project of Industrial Transformation and Upgrading in China under Grant TC170A5SW, and in part by the Scientific Research Foundation of SDUST for Innovative Team under Grant 2015TDJH102
    • …
    corecore