23 research outputs found

    From sparse to dense and from assortative to disassortative in online social networks

    Full text link
    Inspired by the analysis of several empirical online social networks, we propose a simple reaction-diffusion-like coevolving model, in which individuals are activated to create links based on their states, influenced by local dynamics and their own intention. It is shown that the model can reproduce the remarkable properties observed in empirical online social networks; in particular, the assortative coefficients are neutral or negative, and the power law exponents are smaller than 2. Moreover, we demonstrate that, under appropriate conditions, the model network naturally makes transition(s) from assortative to disassortative, and from sparse to dense in their characteristics. The model is useful in understanding the formation and evolution of online social networks.Comment: 10 pages, 7 figures and 2 table

    Reliable online social network data collection

    Get PDF
    Large quantities of information are shared through online social networks, making them attractive sources of data for social network research. When studying the usage of online social networks, these data may not describe properly users’ behaviours. For instance, the data collected often include content shared by the users only, or content accessible to the researchers, hence obfuscating a large amount of data that would help understanding users’ behaviours and privacy concerns. Moreover, the data collection methods employed in experiments may also have an effect on data reliability when participants self-report inacurrate information or are observed while using a simulated application. Understanding the effects of these collection methods on data reliability is paramount for the study of social networks; for understanding user behaviour; for designing socially-aware applications and services; and for mining data collected from such social networks and applications. This chapter reviews previous research which has looked at social network data collection and user behaviour in these networks. We highlight shortcomings in the methods used in these studies, and introduce our own methodology and user study based on the Experience Sampling Method; we claim our methodology leads to the collection of more reliable data by capturing both those data which are shared and not shared. We conclude with suggestions for collecting and mining data from online social networks.Postprin

    The biophysical effects of deuterium oxide on biomolecules and living cells through open notebook science.

    Get PDF
    This dissertation explores various effects of deuterium oxide (D2O also known as heavy water) in nature. Water is everywhere and interacts with just about everything. As such, it would be quite a daunting task to characterize every effect that water exhibits on everything in the universe. This research is a small piece of the puzzle, and provides some fundamental understanding of how water interacts with other molecules. This is done from two viewpoints: (1) the effects of heavy water on living cells and (2) the effects of heavy water on molecules. Varying concentrations of deuterium oxide were used as the growing solvent for four different organisms: S. cerevisiae, E. coli, A. thaliana, and N. tabacum. In each case growth rates and morphology was assessed and compared to the wild type. Organisms were surveyed for potential phenotypes exhibited in the presence of extremely low and high concentrations of D2O. In every organism, growth is increasingly inhibited in higher concentrations of D2O compared to lower concentrations of D2O. In the case of tobacco, a root hair phenotype was exhibited in the presence of deuterium depleted water (atoms). Roots also grew faster in 1% D2O and DDW, compared to natural water. For Arabidopsis, root germination is statistically indistinguishable between DI water and 33% D2O. Growth of the plant in 10% D2O is identical to that of natural water, and potentially healthier. Meanwhile, plants grown in 60% D2O exhibit slower growth and leaf discoloration. Tests on E. coli reveal inconsistent growth rates, but exhibit increased growth in DDW when adapted to D2O. Cellular and colonial morphology is also very distinguished from the wt. Cells appear to remain joined after cellular fission, while colonies exhibit brainy structures. Yeast morphology is quite different. Yeast cells remain joined after mitosis in 99% D2O, causing large cellular aggregates, while colonies become slightly asymmetric. Adaptation of yeast to D2O was not possible. Molecular effects were examined using a variety of tools including: dynamic light spectroscopy, Fourier transform-infrared spectroscopy, cavity ring-down spectroscopy, and optical tweezers. Heat induced protein aggregation was possible in H2O, but prevented in the presence of D2O and analyzed via DLS. Deuterium exchange and replacement was observed and quantified using both FT-IR and CRDS. With FT-IR it was possible to identify differences between solvents, while the time-scale of hydrogen-deuterium exchange was quantified for bulk water with CRDS. Using optical tweezers, DNA was overstretched in both H2O and D2O. The average force for DNA overstretching was found to be ~2.5pN higher in D2O compared to H2O. Deuterium oxide has a stabilizing force on biomolecules, which prevents protein denaturing and can affect the timing for cellular processes. It is because of this molecular property that D2O is observed to affect organisms grown with D2O instead of H2O. Despite this, there seems to be an optimal concentration of deuterium which is above the natural concentration of 155.6ppm. In the presence of deuterium depleted water, cells exhibit signs of stress, further demonstrating that deuterium isnt merely tolerated in solution, but actually required as hypothesized by Gilbert N. Lewis in 1934

    Construction de corpus généraux et spécialisés à partir du Web

    Get PDF
    At the beginning of the first chapter the interdisciplinary setting between linguistics, corpus linguistics, and computational linguistics is introduced. Then, the notion of corpus is put into focus. Existing corpus and text definitions are discussed. Several milestones of corpus design are presented, from pre-digital corpora at the end of the 1950s to web corpora in the 2000s and 2010s. The continuities and changes between the linguistic tradition and web native corpora are exposed.In the second chapter, methodological insights on automated text scrutiny in computer science, computational linguistics and natural language processing are presented. The state of the art on text quality assessment and web text filtering exemplifies current interdisciplinary research trends on web texts. Readability studies and automated text classification are used as a paragon of methods to find salient features in order to grasp text characteristics. Text visualization exemplifies corpus processing in the digital humanities framework. As a conclusion, guiding principles for research practice are listed, and reasons are given to find a balance between quantitative analysis and corpus linguistics, in an environment which is spanned by technological innovation and artificial intelligence techniques.Third, current research on web corpora is summarized. I distinguish two main approaches to web document retrieval: restricted retrieval and web crawling. The notion of web corpus preprocessing is introduced and salient steps are discussed. The impact of the preprocessing phase on research results is assessed. I explain why the importance of preprocessing should not be underestimated and why it is an important task for linguists to learn new skills in order to confront the whole data gathering and preprocessing phase.I present my work on web corpus construction in the fourth chapter. My analyses concern two main aspects, first the question of corpus sources (or prequalification), and secondly the problem of including valid, desirable documents in a corpus (or document qualification). Last, I present work on corpus visualization consisting of extracting certain corpus characteristics in order to give indications on corpus contents and quality.Le premier chapitre s'ouvre par un description du contexte interdisciplinaire. Ensuite, le concept de corpus est présenté en tenant compte de l'état de l'art. Le besoin de disposer de preuves certes de nature linguistique mais embrassant différentes disciplines est illustré par plusieurs scénarios de recherche. Plusieurs étapes clés de la construction de corpus sont retracées, des corpus précédant l'ère digitale à la fin des années 1950 aux corpus web des années 2000 et 2010. Les continuités et changements entre la tradition en linguistique et les corpus tirés du web sont exposés.Le second chapitre rassemble des considérations méthodologiques. L'état de l'art concernant l'estimation de la qualité de textes est décrit. Ensuite, les méthodes utilisées par les études de lisibilité ainsi que par la classification automatique de textes sont résumées. Des dénominateurs communs sont isolés. Enfin, la visualisation de textes démontre l'intérêt de l'analyse de corpus pour les humanités numériques. Les raisons de trouver un équilibre entre analyse quantitative et linguistique de corpus sont abordées.Le troisième chapitre résume l'apport de la thèse en ce qui concerne la recherche sur les corpus tirés d'internet. La question de la collection des données est examinée avec une attention particulière, tout spécialement le cas des URLs sources. La notion de prétraitement des corpus web est introduite, ses étapes majeures sont brossées. L'impact des prétraitements sur le résultat est évalué. La question de la simplicité et de la reproducibilité de la construction de corpus est mise en avant.La quatrième partie décrit l'apport de la thèse du point de vue de la construction de corpus proprement dite, à travers la question des sources et le problèmes des documents invalides ou indésirables. Une approche utilisant un éclaireur léger pour préparer le parcours du web est présentée. Ensuite, les travaux concernant la sélection de documents juste avant l'inclusion dans un corpus sont résumés : il est possible d'utiliser les apports des études de lisibilité ainsi que des techniques d'apprentissage artificiel au cours de la construction du corpus. Un ensemble de caractéristiques textuelles testées sur des échantillons annotés évalue l'efficacité du procédé. Enfin, les travaux sur la visualisation de corpus sont abordés : extraction de caractéristiques à l'échelle d'un corpus afin de donner des indications sur sa composition et sa qualité

    Full Issue: vol. 65, no. 2

    Get PDF

    Webometrics benefitting from web mining? An investigation of methods and applications of two research fields

    Full text link
    Webometrics and web mining are two fields where research is focused on quantitative analyses of the web. This literature review outlines definitions of the fields, and then focuses on their methods and applications. It also discusses the potential of closer contact and collaboration between them. A key difference between the fields is that webometrics has focused on exploratory studies, whereas web mining has been dominated by studies focusing on development of methods and algorithms. Differences in type of data can also be seen, with webometrics more focused on analyses of the structure of the web and web mining more focused on web content and usage, even though both fields have been embracing the possibilities of user generated content. It is concluded that research problems where big data is needed can benefit from collaboration between webometricians, with their tradition of exploratory studies, and web miners, with their tradition of developing methods and algorithms

    Networks, Epidemics and Collective Behavior: from Physics to Data Science

    Get PDF
    In the final quarter of the XX century the classical reductionist approach that had been driving the development of physics was questioned. Instead, it was proposed that systems were arranged in hierarchies so that the upper level had to convey to the rules of the lower level, but at the same time it could also exhibit its own laws that could not be inferred from the ones of its fundamental constituents. This observation led to the creation of a new field known as complex systems. This novel view was, however, not restricted to purely physical systems. It was soon noticed that very different systems covering a huge array of fields, from ecology to sociology or economics, could also be analyzed as complex systems. Furthermore, it allowed physicists to contribute with their knowledge and methods in the development of research in those areas. In this thesis we tackle problems covering three areas of complex systems: networks, which are one of the main mathematical tools used to study complex systems; epidemic spreading, which is one of the fields in which the application of a complex systems perspective has been more successful; and the study of collective behavior, which has attracted a lot of attention since data from human behavior in huge amounts has been made available thanks to social networks. In fact, data is also the main driver of our discussion of the other two areas. In particular, we use novel sources of data to challenge some of the classical assumptions that have been made in the study of networks as well as in the development of models of epidemic spreading. In the case of networks, the problem of null models is addressed using tools coming from statistical physics. We show that anomalies in networks can be just a consequence of model oversimplification. Then, we extend the framework to generate contact networks for the spreading of diseases in populations in which both the contact structure and the age distribution of the population are important. Next, we follow the historical development of mathematical epidemiology and revisit the assumptions that were made when there was no data about the real behavior of this kind of systems. We show that one of the most important quantities used in this kind of studies, the basic reproduction number, is not properly defined for real systems. Similarly, we extend the theoretical framework of epidemic spreading on directed networks to multilayer systems. Furthermore, we show that the challenge of incorporating data to models is not only restricted to the problem of obtaining it, but that it is also really important to be aware of its characteristics to do it properly.Lastly, we conclude the thesis studying two examples of collective behavior using data extracted from online systems. We do so using techniques that were originally developed for other purposes, such as earthquake prediction. Yet, we demonstrate that they can also be used to study this new type of systems. Furthermore, we show that, despite their unique characteristics, they possess properties similar to the ones that have been observed in the offline world. This not only means that modern societies are intertwined with the online world, but it also signals that if we aim to understand socio-technical systems a holistic approach, as the one proposed by complex systems, is indispensable.<br /

    Integração e divulgação de informação desportiva em redes sociais através de dispositivos móveis

    Get PDF
    A necessidade de acesso a informação é um factor marcante da sociedade actual. Existem diversas formas pelas quais as pessoas podem garantir essa informação, sendo os dispositivos móveis com acesso à Internet uma dessas possibilidades. Assim, o segmento dos dispositivos móveis está em crescendo e as aplicações para smartphones têm cada vez mais utilizadores que se tornam autênticos dependentes das mesmas. Nesta perspectiva, o sector do Desporto tem verdadeiros adeptos, sedentos de informação a toda a hora. Nesta dissertação pretende-se desenvolver uma solução baseada em dispositivos móveis inteligentes e redes sociais que permita aos seus utilizadores uma experiência mais rica e diversificada relativa a conteúdos desportivos, possibilitando que estes subscrevam e acedam a canais específicos de informação. Será ainda concebida a infraestrutura de suporte aos processos de agregação e categorização de conteúdos à medida que os mesmos forem sendo criados. Importa referir aqui o papel fundamental das redes sociais, como o Twitter ou o Facebook, como fonte de dados. Uma das mais-valias da solução desenvolvida é permitir aos utilizadores decidir e/ou personalizar os conteúdos recebidos, tanto porque eles próprios decidem a panóplia de conteúdos do seu interesse, como porque existe a possibilidade de definir um conjunto de regras que filtram a informação apresentada. Com o intuito de validar a solução desenvolvida foram realizados um conjunto de testes em eventos desportivos reais referentes a duas modalidades desportivas, ténis e futebol, nos quais foram obtidos resultados bastante satisfatórios que deram valor e sentido ao trabalho realizado.The need to access information is a striking factor of today's society. There are several ways in which people can gather information, and mobile devices with Internet access are one of those possibilities. Thus, the segment of mobile devices is growing and applications for smartphones have an increasing group of users who become totally dependent on them. Accordingly, the Sports sector has real fans, hungry for information all the time. This thesis intends to develop a solution based on intelligent mobile devices that allow its users a richer and diverse experience related to sports content, enabling them to access and subscribe specific channels of information. Furthermore, it will be designed the infrastructure to support the processes of aggregation and categorization of content as those are created. For this process it’s important to note the central role of social networks like Twitter and Facebook, as data sources. The implemented approach presents an added value solution to its users, giving them the possibility to decide or personalize the information they wish to receive, both providing themwith the opportunity to choose the content of interest, as by allowing the definition of a set of rules which filters the information presented. In order to validate the developed solution, a set of tests were performed on real sport events related to two main sports, tennis and football. The results obtained were very satisfying, which gave value and meaning to the work done
    corecore