35,730 research outputs found

    All liaisons are dangerous when all your friends are known to us

    Get PDF
    Online Social Networks (OSNs) are used by millions of users worldwide. Academically speaking, there is little doubt about the usefulness of demographic studies conducted on OSNs and, hence, methods to label unknown users from small labeled samples are very useful. However, from the general public point of view, this can be a serious privacy concern. Thus, both topics are tackled in this paper: First, a new algorithm to perform user profiling in social networks is described, and its performance is reported and discussed. Secondly, the experiments --conducted on information usually considered sensitive-- reveal that by just publicizing one's contacts privacy is at risk and, thus, measures to minimize privacy leaks due to social graph data mining are outlined.Comment: 10 pages, 5 table

    Controlling for Unobserved Confounds in Classification Using Correlational Constraints

    Full text link
    As statistical classifiers become integrated into real-world applications, it is important to consider not only their accuracy but also their robustness to changes in the data distribution. In this paper, we consider the case where there is an unobserved confounding variable zz that influences both the features x\mathbf{x} and the class variable yy. When the influence of zz changes from training to testing data, we find that the classifier accuracy can degrade rapidly. In our approach, we assume that we can predict the value of zz at training time with some error. The prediction for zz is then fed to Pearl's back-door adjustment to build our model. Because of the attenuation bias caused by measurement error in zz, standard approaches to controlling for zz are ineffective. In response, we propose a method to properly control for the influence of zz by first estimating its relationship with the class variable yy, then updating predictions for zz to match that estimated relationship. By adjusting the influence of zz, we show that we can build a model that exceeds competing baselines on accuracy as well as on robustness over a range of confounding relationships.Comment: 9 page

    Workplace Segregation in the United States: Race, Ethnicity, and Skill

    Get PDF
    We study workplace segregation in the United States using a unique matched employer-employee data set that we have created. We present measures of workplace segregation by education and language%u2013as skilled workers may be more complementary with other skilled workers than with unskilled workers%u2013and by race and ethnicity, using simulation methods to measure segregation beyond what would occur randomly as workers are distributed across establishments. We also assess the role of education- and language-related skill differentials in generating workplace segregation by race and ethnicity, as skill is often correlated with race and ethnicity. Finally, we attempt to distinguish between segregation by skill based on general crowding of unskilled poor English speakers into a narrow set of jobs, and segregation based on common language for reasons such as complementarity among workers speaking the same language. Our results indicate that there is considerable segregation by education and language in the workplace. Racial segregation in the workplace is of the same order of magnitude as education segregation, and segregation between Hispanics and whites is larger yet. Only a tiny portion of racial segregation in the workplace is driven by education differences between blacks and whites, but a substantial fraction of ethnic segregation in the workplace can be attributed to differences in language proficiency.

    Workplace Segregation in the United States: Race, Ethnicity, and Skill

    Get PDF
    We study workplace segregation in the United States using a unique matched employer-employee data set that we have created. We present measures of workplace segregation by education and language, and by race and ethnicity, and ­ since skill is often correlated with race and ethnicity ­ we assess the role of education- and language-related skill differentials in generating workplace segregation by race and ethnicity. We define segregation based on the extent to which workers are more or less likely to be in workplaces with members of the same group, and we measure segregation as the observed percentage relative to maximum segregation. Our results indicate that there is considerable segregation by education and language in the workplace. Among whites, for example, observed segregation by education is 17% (of the maximum), and for Hispanics, observed segregation by language ability is 29 percent. Racial (black-white) segregation in the workplace is of a similar magnitude to education segregation (14%), and ethnic (Hispanic-white) segregation is somewhat higher (20%). Only a tiny portion (3%) of racial segregation in the workplace is driven by education differences between blacks and whites, but a substantial fraction of ethnic segregation in the workplace (32 percent) can be attributed to differences in language proficiency. Finally, additional evidence suggests that segregation by language likely reflects complementarity among workers speaking the same languageSegregation; Language; Skill; Race; Ethnicity

    Changes in Workplace Segregation in the United States between 1990 and 2000: Evidence from Matched Employer-Employee Data

    Get PDF
    We present evidence on changes in workplace segregation by education, race, ethnicity, and sex, from 1990 to 2000. The evidence indicates that racial and ethnic segregation at the workplace level remained quite pervasive in 2000. At the same time, there was fairly substantial segregation by skill, as measured by education. Putting together the 1990 and 2000 data, we find no evidence of declines in workplace segregation by race and ethnicity; indeed, black-white segregation increased. Over this decade, segregation by education also increased. In contrast, workplace segregation by sex fell over the decade, and would have fallen by more had the services industry - a heavily female industry in which sex segregation is relatively high - not experienced rapid employment growth.

    Examining Scientific Writing Styles from the Perspective of Linguistic Complexity

    Full text link
    Publishing articles in high-impact English journals is difficult for scholars around the world, especially for non-native English-speaking scholars (NNESs), most of whom struggle with proficiency in English. In order to uncover the differences in English scientific writing between native English-speaking scholars (NESs) and NNESs, we collected a large-scale data set containing more than 150,000 full-text articles published in PLoS between 2006 and 2015. We divided these articles into three groups according to the ethnic backgrounds of the first and corresponding authors, obtained by Ethnea, and examined the scientific writing styles in English from a two-fold perspective of linguistic complexity: (1) syntactic complexity, including measurements of sentence length and sentence complexity; and (2) lexical complexity, including measurements of lexical diversity, lexical density, and lexical sophistication. The observations suggest marginal differences between groups in syntactical and lexical complexity.Comment: 6 figure
    corecore