35,730 research outputs found
All liaisons are dangerous when all your friends are known to us
Online Social Networks (OSNs) are used by millions of users worldwide.
Academically speaking, there is little doubt about the usefulness of
demographic studies conducted on OSNs and, hence, methods to label unknown
users from small labeled samples are very useful. However, from the general
public point of view, this can be a serious privacy concern. Thus, both topics
are tackled in this paper: First, a new algorithm to perform user profiling in
social networks is described, and its performance is reported and discussed.
Secondly, the experiments --conducted on information usually considered
sensitive-- reveal that by just publicizing one's contacts privacy is at risk
and, thus, measures to minimize privacy leaks due to social graph data mining
are outlined.Comment: 10 pages, 5 table
Controlling for Unobserved Confounds in Classification Using Correlational Constraints
As statistical classifiers become integrated into real-world applications, it
is important to consider not only their accuracy but also their robustness to
changes in the data distribution. In this paper, we consider the case where
there is an unobserved confounding variable that influences both the
features and the class variable . When the influence of
changes from training to testing data, we find that the classifier accuracy can
degrade rapidly. In our approach, we assume that we can predict the value of
at training time with some error. The prediction for is then fed to
Pearl's back-door adjustment to build our model. Because of the attenuation
bias caused by measurement error in , standard approaches to controlling for
are ineffective. In response, we propose a method to properly control for
the influence of by first estimating its relationship with the class
variable , then updating predictions for to match that estimated
relationship. By adjusting the influence of , we show that we can build a
model that exceeds competing baselines on accuracy as well as on robustness
over a range of confounding relationships.Comment: 9 page
Workplace Segregation in the United States: Race, Ethnicity, and Skill
We study workplace segregation in the United States using a unique matched employer-employee data set that we have created. We present measures of workplace segregation by education and language%u2013as skilled workers may be more complementary with other skilled workers than with unskilled workers%u2013and by race and ethnicity, using simulation methods to measure segregation beyond what would occur randomly as workers are distributed across establishments. We also assess the role of education- and language-related skill differentials in generating workplace segregation by race and ethnicity, as skill is often correlated with race and ethnicity. Finally, we attempt to distinguish between segregation by skill based on general crowding of unskilled poor English speakers into a narrow set of jobs, and segregation based on common language for reasons such as complementarity among workers speaking the same language. Our results indicate that there is considerable segregation by education and language in the workplace. Racial segregation in the workplace is of the same order of magnitude as education segregation, and segregation between Hispanics and whites is larger yet. Only a tiny portion of racial segregation in the workplace is driven by education differences between blacks and whites, but a substantial fraction of ethnic segregation in the workplace can be attributed to differences in language proficiency.
Workplace Segregation in the United States: Race, Ethnicity, and Skill
We study workplace segregation in the United States using a unique matched employer-employee data set that we have created. We present measures of workplace segregation by education and language, and by race and ethnicity, and  since skill is often correlated with race and ethnicity  we assess the role of education- and language-related skill differentials in generating workplace segregation by race and ethnicity. We define segregation based on the extent to which workers are more or less likely to be in workplaces with members of the same group, and we measure segregation as the observed percentage relative to maximum segregation. Our results indicate that there is considerable segregation by education and language in the workplace. Among whites, for example, observed segregation by education is 17% (of the maximum), and for Hispanics, observed segregation by language ability is 29 percent. Racial (black-white) segregation in the workplace is of a similar magnitude to education segregation (14%), and ethnic (Hispanic-white) segregation is somewhat higher (20%). Only a tiny portion (3%) of racial segregation in the workplace is driven by education differences between blacks and whites, but a substantial fraction of ethnic segregation in the workplace (32 percent) can be attributed to differences in language proficiency. Finally, additional evidence suggests that segregation by language likely reflects complementarity among workers speaking the same languageSegregation; Language; Skill; Race; Ethnicity
Changes in Workplace Segregation in the United States between 1990 and 2000: Evidence from Matched Employer-Employee Data
We present evidence on changes in workplace segregation by education, race, ethnicity, and sex, from 1990 to 2000. The evidence indicates that racial and ethnic segregation at the workplace level remained quite pervasive in 2000. At the same time, there was fairly substantial segregation by skill, as measured by education. Putting together the 1990 and 2000 data, we find no evidence of declines in workplace segregation by race and ethnicity; indeed, black-white segregation increased. Over this decade, segregation by education also increased. In contrast, workplace segregation by sex fell over the decade, and would have fallen by more had the services industry - a heavily female industry in which sex segregation is relatively high - not experienced rapid employment growth.
Examining Scientific Writing Styles from the Perspective of Linguistic Complexity
Publishing articles in high-impact English journals is difficult for scholars
around the world, especially for non-native English-speaking scholars (NNESs),
most of whom struggle with proficiency in English. In order to uncover the
differences in English scientific writing between native English-speaking
scholars (NESs) and NNESs, we collected a large-scale data set containing more
than 150,000 full-text articles published in PLoS between 2006 and 2015. We
divided these articles into three groups according to the ethnic backgrounds of
the first and corresponding authors, obtained by Ethnea, and examined the
scientific writing styles in English from a two-fold perspective of linguistic
complexity: (1) syntactic complexity, including measurements of sentence length
and sentence complexity; and (2) lexical complexity, including measurements of
lexical diversity, lexical density, and lexical sophistication. The
observations suggest marginal differences between groups in syntactical and
lexical complexity.Comment: 6 figure
- …