125 research outputs found

    An algorithmic approach to social networks

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (p. 109-120).Social networks consist of a set of individuals and some form of social relationship that ties the individuals together. In this thesis, we use algorithmic techniques to study three aspects of social networks: (1) we analyze the "small-world" phenomenon by examining the geographic patterns of friendships in a large-scale social network, showing how this linkage pattern can itself explain the small-world results; (2) using existing patterns of friendship in a social network and a variety of graph-theoretic techniques, we show how to predict new relationships that will form in the network in the near future; and (3) we show how to infer social connections over which information flows in a network, by examining the times at which individuals in the network exhibit certain pieces of information, or interest in certain topics. Our approach is simultaneously theoretical and data-driven, and our results are based upon real experiments on real social-network data in addition to theoretical investigations of mathematical models of social networks.by David Liben-Nowell.Ph.D

    Measuring internet activity: a (selective) review of methods and metrics

    Get PDF
    Two Decades after the birth of the World Wide Web, more than two billion people around the world are Internet users. The digital landscape is littered with hints that the affordances of digital communications are being leveraged to transform life in profound and important ways. The reach and influence of digitally mediated activity grow by the day and touch upon all aspects of life, from health, education, and commerce to religion and governance. This trend demands that we seek answers to the biggest questions about how digitally mediated communication changes society and the role of different policies in helping or hindering the beneficial aspects of these changes. Yet despite the profusion of data the digital age has brought upon us—we now have access to a flood of information about the movements, relationships, purchasing decisions, interests, and intimate thoughts of people around the world—the distance between the great questions of the digital age and our understanding of the impact of digital communications on society remains large. A number of ongoing policy questions have emerged that beg for better empirical data and analyses upon which to base wider and more insightful perspectives on the mechanics of social, economic, and political life online. This paper seeks to describe the conceptual and practical impediments to measuring and understanding digital activity and highlights a sample of the many efforts to fill the gap between our incomplete understanding of digital life and the formidable policy questions related to developing a vibrant and healthy Internet that serves the public interest and contributes to human wellbeing. Our primary focus is on efforts to measure Internet activity, as we believe obtaining robust, accurate data is a necessary and valuable first step that will lead us closer to answering the vitally important questions of the digital realm. Even this step is challenging: the Internet is difficult to measure and monitor, and there is no simple aggregate measure of Internet activity—no GDP, no HDI. In the following section we present a framework for assessing efforts to document digital activity. The next three sections offer a summary and description of many of the ongoing projects that document digital activity, with two final sections devoted to discussion and conclusions

    Application of Common Sense Computing for the Development of a Novel Knowledge-Based Opinion Mining Engine

    Get PDF
    The ways people express their opinions and sentiments have radically changed in the past few years thanks to the advent of social networks, web communities, blogs, wikis and other online collaborative media. The distillation of knowledge from this huge amount of unstructured information can be a key factor for marketers who want to create an image or identity in the minds of their customers for their product, brand, or organisation. These online social data, however, remain hardly accessible to computers, as they are specifically meant for human consumption. The automatic analysis of online opinions, in fact, involves a deep understanding of natural language text by machines, from which we are still very far. Hitherto, online information retrieval has been mainly based on algorithms relying on the textual representation of web-pages. Such algorithms are very good at retrieving texts, splitting them into parts, checking the spelling and counting their words. But when it comes to interpreting sentences and extracting meaningful information, their capabilities are known to be very limited. Existing approaches to opinion mining and sentiment analysis, in particular, can be grouped into three main categories: keyword spotting, in which text is classified into categories based on the presence of fairly unambiguous affect words; lexical affinity, which assigns arbitrary words a probabilistic affinity for a particular emotion; statistical methods, which calculate the valence of affective keywords and word co-occurrence frequencies on the base of a large training corpus. Early works aimed to classify entire documents as containing overall positive or negative polarity, or rating scores of reviews. Such systems were mainly based on supervised approaches relying on manually labelled samples, such as movie or product reviews where the opinionist’s overall positive or negative attitude was explicitly indicated. However, opinions and sentiments do not occur only at document level, nor they are limited to a single valence or target. Contrary or complementary attitudes toward the same topic or multiple topics can be present across the span of a document. In more recent works, text analysis granularity has been taken down to segment and sentence level, e.g., by using presence of opinion-bearing lexical items (single words or n-grams) to detect subjective sentences, or by exploiting association rule mining for a feature-based analysis of product reviews. These approaches, however, are still far from being able to infer the cognitive and affective information associated with natural language as they mainly rely on knowledge bases that are still too limited to efficiently process text at sentence level. In this thesis, common sense computing techniques are further developed and applied to bridge the semantic gap between word-level natural language data and the concept-level opinions conveyed by these. In particular, the ensemble application of graph mining and multi-dimensionality reduction techniques on two common sense knowledge bases was exploited to develop a novel intelligent engine for open-domain opinion mining and sentiment analysis. The proposed approach, termed sentic computing, performs a clause-level semantic analysis of text, which allows the inference of both the conceptual and emotional information associated with natural language opinions and, hence, a more efficient passage from (unstructured) textual information to (structured) machine-processable data. The engine was tested on three different resources, namely a Twitter hashtag repository, a LiveJournal database and a PatientOpinion dataset, and its performance compared both with results obtained using standard sentiment analysis techniques and using different state-of-the-art knowledge bases such as Princeton’s WordNet, MIT’s ConceptNet and Microsoft’s Probase. Differently from most currently available opinion mining services, the developed engine does not base its analysis on a limited set of affect words and their co-occurrence frequencies, but rather on common sense concepts and the cognitive and affective valence conveyed by these. This allows the engine to be domain-independent and, hence, to be embedded in any opinion mining system for the development of intelligent applications in multiple fields such as Social Web, HCI and e-health. Looking ahead, the combined novel use of different knowledge bases and of common sense reasoning techniques for opinion mining proposed in this work, will, eventually, pave the way for development of more bio-inspired approaches to the design of natural language processing systems capable of handling knowledge, retrieving it when necessary, making analogies and learning from experience

    Application of Common Sense Computing for the Development of a Novel Knowledge-Based Opinion Mining Engine

    Get PDF
    The ways people express their opinions and sentiments have radically changed in the past few years thanks to the advent of social networks, web communities, blogs, wikis and other online collaborative media. The distillation of knowledge from this huge amount of unstructured information can be a key factor for marketers who want to create an image or identity in the minds of their customers for their product, brand, or organisation. These online social data, however, remain hardly accessible to computers, as they are specifically meant for human consumption. The automatic analysis of online opinions, in fact, involves a deep understanding of natural language text by machines, from which we are still very far. Hitherto, online information retrieval has been mainly based on algorithms relying on the textual representation of web-pages. Such algorithms are very good at retrieving texts, splitting them into parts, checking the spelling and counting their words. But when it comes to interpreting sentences and extracting meaningful information, their capabilities are known to be very limited. Existing approaches to opinion mining and sentiment analysis, in particular, can be grouped into three main categories: keyword spotting, in which text is classified into categories based on the presence of fairly unambiguous affect words; lexical affinity, which assigns arbitrary words a probabilistic affinity for a particular emotion; statistical methods, which calculate the valence of affective keywords and word co-occurrence frequencies on the base of a large training corpus. Early works aimed to classify entire documents as containing overall positive or negative polarity, or rating scores of reviews. Such systems were mainly based on supervised approaches relying on manually labelled samples, such as movie or product reviews where the opinionist’s overall positive or negative attitude was explicitly indicated. However, opinions and sentiments do not occur only at document level, nor they are limited to a single valence or target. Contrary or complementary attitudes toward the same topic or multiple topics can be present across the span of a document. In more recent works, text analysis granularity has been taken down to segment and sentence level, e.g., by using presence of opinion-bearing lexical items (single words or n-grams) to detect subjective sentences, or by exploiting association rule mining for a feature-based analysis of product reviews. These approaches, however, are still far from being able to infer the cognitive and affective information associated with natural language as they mainly rely on knowledge bases that are still too limited to efficiently process text at sentence level. In this thesis, common sense computing techniques are further developed and applied to bridge the semantic gap between word-level natural language data and the concept-level opinions conveyed by these. In particular, the ensemble application of graph mining and multi-dimensionality reduction techniques on two common sense knowledge bases was exploited to develop a novel intelligent engine for open-domain opinion mining and sentiment analysis. The proposed approach, termed sentic computing, performs a clause-level semantic analysis of text, which allows the inference of both the conceptual and emotional information associated with natural language opinions and, hence, a more efficient passage from (unstructured) textual information to (structured) machine-processable data. The engine was tested on three different resources, namely a Twitter hashtag repository, a LiveJournal database and a PatientOpinion dataset, and its performance compared both with results obtained using standard sentiment analysis techniques and using different state-of-the-art knowledge bases such as Princeton’s WordNet, MIT’s ConceptNet and Microsoft’s Probase. Differently from most currently available opinion mining services, the developed engine does not base its analysis on a limited set of affect words and their co-occurrence frequencies, but rather on common sense concepts and the cognitive and affective valence conveyed by these. This allows the engine to be domain-independent and, hence, to be embedded in any opinion mining system for the development of intelligent applications in multiple fields such as Social Web, HCI and e-health. Looking ahead, the combined novel use of different knowledge bases and of common sense reasoning techniques for opinion mining proposed in this work, will, eventually, pave the way for development of more bio-inspired approaches to the design of natural language processing systems capable of handling knowledge, retrieving it when necessary, making analogies and learning from experience

    The structural determinants of media contagion

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2005.Includes bibliographical references (p. 157-166).Informal exchanges between friends, family and acquaintances play a crucial role in the dissemination of news and opinion. These casual interactions are embedded in a network of communication that spans our society, allowing information to spread from any one person to another via some set of intermediary ties. Weblogs have recently emerged as a part of our media ecology and incidentally engender this process of media contagion; because weblog authors are tied by social networks of readership, contagious media events happen frequently, and in a form that is immediately measurable. The generally accepted notion of media diffusion is that it occurs through two channels: externally, as applied by a constant force such as the mass media, and internally through socio-structural means. Sitting between our traditional notions of mass media and the public, weblogs problematize this classical theory of mass media influence. This thesis aims to elucidate the role of weblogs in media contagion through a sociological study of this community in two parts: First, I will address the issues of modeling the social structure of weblogs as observed through their readership network, and the various media events that occur therein.(cont.) Using a large weblog corpus collected over a one-month period, I have constructed a model describing the structure of popularity and influence from the extracted readership network, and will show that this model more accurately describes the weblog network. I will also derive a typology of media events from collected examples using features of structural and non-structural diffusion. Second, the extent to which these data are reflective of actual social processes as opposed to artifacts of data collection and aggregation will be explored. To validate the models presented in part one, I have conducted a survey of randomly selected authors to examine their social behaviors, both in weblog use and otherwise. I will characterize the range of weblog uses and practices, presenting an analysis of personal influence in the blogging community.by Cameron Alexander Marlow.Ph.D

    Semantic Assistance for Data Utilization and Curation

    Get PDF
    We propose that most data stores for large organizations are ill-designed for the future, due to limited searchability of the databases. The study of the Semantic Web has been an emerging technology since first proposed by Berners-Lee. New vocabularies have emerged, such as FOAF, Dublin Core, and PROV-O ontologies. These vocabularies, combined, can relate people, places, things, and events. Technologies developed for the Semantic Web, namely the standardized vocabularies for expressing metadata, will make data easier to utilize. We gathered use cases for various data sources, from human resources to big enterprise. Most of our use cases reflect real-world data. We developed a software package for transforming data into these semantic vocabularies, and developed a method of querying via graphical constructs. The development and testing proved itself to be useful. We conclude that data can be preserved or revived through the use of the metadata techniques for the Semantic Web

    Semantic Assistance for Data Utilization and Curation

    Get PDF
    We propose that most data stores for large organizations are ill-designed for the future, due to limited searchability of the databases. The study of the Semantic Web has been an emerging technology since first proposed by Berners-Lee. New vocabularies have emerged, such as FOAF, Dublin Core, and PROV-O ontologies. These vocabularies, combined, can relate people, places, things, and events. Technologies developed for the Semantic Web, namely the standardized vocabularies for expressing metadata, will make data easier to utilize. We gathered use cases for various data sources, from human resources to big enterprise. Most of our use cases reflect real-world data. We developed a software package for transforming data into these semantic vocabularies, and developed a method of querying via graphical constructs. The development and testing proved itself to be useful. We conclude that data can be preserved or revived through the use of the metadata techniques for the Semantic Web

    The right to privacy in a Big Data society. Merits and limits of the GDPR

    Get PDF
    With the non-stop development of technology, Big Data generation has seen a rise like no other. The rise of Big Data has given a possibility to numerous ways in which personal data of consumers could be used leaving the people vulnerable. The European Union came up with GDPR as the latest way of protecting the rights of citizens. In this paper, we analyze different aspects of Big Data such as legal framework, consent, and anonymization and see in what ways GDPR has benefitted in protecting personal data and what its limitations are

    Computational Sociolinguistics: A Survey

    Get PDF
    Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimension of language. In this article we present a survey of the emerging field of "Computational Sociolinguistics" that reflects this increased interest. We aim to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction and multilingual communication. Moreover, we demonstrate the potential for synergy between the research communities involved, by showing how the large-scale data-driven methods that are widely used in CL can complement existing sociolinguistic studies, and how sociolinguistics can inform and challenge the methods and assumptions employed in CL studies. We hope to convey the possible benefits of a closer collaboration between the two communities and conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication: 18th February, 201
    • …
    corecore