7,066 research outputs found

    On cross-domain social semantic learning

    Get PDF
    Approximately 2.4 billion people are now connected to the Internet, generating massive amounts of data through laptops, mobile phones, sensors and other electronic devices or gadgets. Not surprisingly then, ninety percent of the world's digital data was created in the last two years. This massive explosion of data provides tremendous opportunity to study, model and improve conceptual and physical systems from which the data is produced. It also permits scientists to test pre-existing hypotheses in various fields with large scale experimental evidence. Thus, developing computational algorithms that automatically explores this data is the holy grail of the current generation of computer scientists. Making sense of this data algorithmically can be a complex process, specifically due to two reasons. Firstly, the data is generated by different devices, capturing different aspects of information and resides in different web resources/ platforms on the Internet. Therefore, even if two pieces of data bear singular conceptual similarity, their generation, format and domain of existence on the web can make them seem considerably dissimilar. Secondly, since humans are social creatures, the data often possesses inherent but murky correlations, primarily caused by the causal nature of direct or indirect social interactions. This drastically alters what algorithms must now achieve, necessitating intelligent comprehension of the underlying social nature and semantic contexts within the disparate domain data and a quantifiable way of transferring knowledge gained from one domain to another. Finally, the data is often encountered as a stream and not as static pages on the Internet. Therefore, we must learn, and re-learn as the stream propagates. The main objective of this dissertation is to develop learning algorithms that can identify specific patterns in one domain of data which can consequently augment predictive performance in another domain. The research explores existence of specific data domains which can function in synergy with another and more importantly, proposes models to quantify the synergetic information transfer among such domains. We include large-scale data from various domains in our study: social media data from Twitter, multimedia video data from YouTube, video search query data from Bing Videos, Natural Language search queries from the web, Internet resources in form of web logs (blogs) and spatio-temporal social trends from Twitter. Our work presents a series of solutions to address the key challenges in cross-domain learning, particularly in the field of social and semantic data. We propose the concept of bridging media from disparate sources by building a common latent topic space, which represents one of the first attempts toward answering sociological problems using cross-domain (social) media. This allows information transfer between social and non-social domains, fostering real-time socially relevant applications. We also engineer a concept network from the semantic web, called semNet, that can assist in identifying concept relations and modeling information granularity for robust natural language search. Further, by studying spatio-temporal patterns in this data, we can discover categorical concepts that stimulate collective attention within user groups.Includes bibliographical references (pages 210-214)

    TweeProfiles4: a weighted multidimensional stream clustering algorithm

    Get PDF
    O aparecimento das redes sociais abriu aos utilizadores a possibilidade de facilmente partilharem as suas ideias a respeito de diferentes temas, o que constitui uma fonte de informação enriquecedora para diversos campos. As plataformas de microblogging sofreram um grande crescimento e de forma constante nos últimos anos. O Twitter é o site de microblogging mais popular, tornando-se uma fonte de dados interessante para extração de conhecimento. Um dos principais desafios na análise de dados provenientes de redes sociais é o seu fluxo, o que dificulta a aplicação de processos tradicionais de data mining. Neste sentido, a extração de conhecimento sobre fluxos de dados tem recebido um foco significativo recentemente. O TweeProfiles é a uma ferramenta de data mining para análise e visualização de dados do Twitter sobre quatro dimensões: espacial (a localização geográfica do tweet), temporal (a data de publicação do tweet), de conteúdo (o texto do tweet) e social (o grafo dos relacionamentos). Este é um projeto em desenvolvimento que ainda possui muitos aspetos que podem ser melhorados. Uma das recentes melhorias inclui a substituição do algoritmo de clustering original, o qual não suportava o fluxo contínuo dos dados, por um método de streaming. O objetivo desta dissertação passa pela continuação do desenvolvimento do TweeProfiles. Em primeiro lugar, será proposto um novo algoritmo de clustering para fluxos de dados com o objetivo de melhorar o existente. Para esse efeito será desenvolvido um algoritmo incremental com suporte para fluxos de dados multi-dimensionais. Esta abordagem deve permitir ao utilizador alterar dinamicamente a importância relativa de cada dimensão do processo de clustering. Adicionalmente, a avaliação empírica dos resultados será alvo de melhoramento através da identificação e implementação de medidas adequadas de avaliação dos padrões extraídos. O estudo empírico será realizado através de tweets georreferenciados obtidos pelo SocialBus.The emergence of social media made it possible for users to easily share their thoughts on different topics, which constitutes a rich source of information for many fields. Microblogging platforms experienced a large and steady growth over the last few years. Twitter is the most popular microblogging site, making it an interesting source of data for pattern extraction. One of the main challenges of analyzing social media data is its continuous nature, which makes it hard to use traditional data mining. Therefore, mining stream data has also received a lot of attention recently.TweeProfiles is a data mining tool for analyzing and visualizing Twitter data over four dimensions: spatial (the location of the tweet), temporal (the timestamp of the tweet), content (the text of the tweet) and social (relationship graph). This is an ongoing project which still has many aspects that can be improved. For instance, it was recently improved by replacing the original clustering algorithm which could not handle the continuous flow of data with a streaming method. The goal of this dissertation is to continue the development of TweeProfiles. First, the stream clustering process will be improved by proposing a new algorithm. This will be achieved by developing an incremental algorithm with support for multi-dimensional streaming data. Moreover, it should make it possible for the user to dynamically change the relative importance of each dimension in the clustering. Additionally, the empirical evaluation of the results will also be improved.Suitable measures to evaluate the extracted patterns will be identified and implemented. An empirical study will be done using data consisting of georeferenced tweets from SocialBus

    Understanding the bi-directional relationship between analytical processes and interactive visualization systems

    Get PDF
    Interactive visualizations leverage the human visual and reasoning systems to increase the scale of information with which we can effectively work, therefore improving our ability to explore and analyze large amounts of data. Interactive visualizations are often designed with target domains in mind, such as analyzing unstructured textual information, which is a main thrust in this dissertation. Since each domain has its own existing procedures of analyzing data, a good start to a well-designed interactive visualization system is to understand the domain experts' workflow and analysis processes. This dissertation recasts the importance of understanding domain users' analysis processes and incorporating such understanding into the design of interactive visualization systems. To meet this aim, I first introduce considerations guiding the gathering of general and domain-specific analysis processes in text analytics. Two interactive visualization systems are designed by following the considerations. The first system is Parallel-Topics, a visual analytics system supporting analysis of large collections of documents by extracting semantically meaningful topics. Based on lessons learned from Parallel-Topics, this dissertation further presents a general visual text analysis framework, I-Si, to present meaningful topical summaries and temporal patterns, with the capability to handle large-scale textual information. Both systems have been evaluated by expert users and deemed successful in addressing domain analysis needs. The second contribution lies in preserving domain users' analysis process while using interactive visualizations. Our research suggests the preservation could serve multiple purposes. On the one hand, it could further improve the current system. On the other hand, users often need help in recalling and revisiting their complex and sometimes iterative analysis process with an interactive visualization system. This dissertation introduces multiple types of evidences available for capturing a user's analysis process within an interactive visualization and analyzes cost/benefit ratios of the capturing methods. It concludes that tracking interaction sequences is the most un-intrusive and feasible way to capture part of a user's analysis process. To validate this claim, a user study is presented to theoretically analyze the relationship between interactions and problem-solving processes. The results indicate that constraining the way a user interacts with a mathematical puzzle does have an effect on the problemsolving process. As later evidenced in an evaluative study, a fair amount of high-level analysis can be recovered through merely analyzing interaction logs

    A planetary nervous system for social mining and collective awareness

    Get PDF
    We present a research roadmap of a Planetary Nervous System (PNS), capable of sensing and mining the digital breadcrumbs of human activities and unveiling the knowledge hidden in the big data for addressing the big questions about social complexity. We envision the PNS as a globally distributed, self-organizing, techno-social system for answering analytical questions about the status of world-wide society, based on three pillars: social sensing, social mining and the idea of trust networks and privacy-aware social mining. We discuss the ingredients of a science and a technology necessary to build the PNS upon the three mentioned pillars, beyond the limitations of their respective state-of-art. Social sensing is aimed at developing better methods for harvesting the big data from the techno-social ecosystem and make them available for mining, learning and analysis at a properly high abstraction level. Social mining is the problem of discovering patterns and models of human behaviour from the sensed data across the various social dimensions by data mining, machine learning and social network analysis. Trusted networks and privacy-aware social mining is aimed at creating a new deal around the questions of privacy and data ownership empowering individual persons with full awareness and control on own personal data, so that users may allow access and use of their data for their own good and the common good. The PNS will provide a goal-oriented knowledge discovery framework, made of technology and people, able to configure itself to the aim of answering questions about the pulse of global society. Given an analytical request, the PNS activates a process composed by a variety of interconnected tasks exploiting the social sensing and mining methods within the transparent ecosystem provided by the trusted network. The PNS we foresee is the key tool for individual and collective awareness for the knowledge society. We need such a tool for everyone to become fully aware of how powerful is the knowledge of our society we can achieve by leveraging our wisdom as a crowd, and how important is that everybody participates both as a consumer and as a producer of the social knowledge, for it to become a trustable, accessible, safe and useful public good.Seventh Framework Programme (European Commission) (grant agreement No. 284709

    A history and theory of textual event detection and recognition

    Get PDF
    corecore