87 research outputs found

    Big data analytics for large-scale wireless networks: Challenges and opportunities

    Full text link
    © 2019 Association for Computing Machinery. The wide proliferation of various wireless communication systems and wireless devices has led to the arrival of big data era in large-scale wireless networks. Big data of large-scale wireless networks has the key features of wide variety, high volume, real-time velocity, and huge value leading to the unique research challenges that are different from existing computing systems. In this article, we present a survey of the state-of-art big data analytics (BDA) approaches for large-scale wireless networks. In particular, we categorize the life cycle of BDA into four consecutive stages: Data Acquisition, Data Preprocessing, Data Storage, and Data Analytics. We then present a detailed survey of the technical solutions to the challenges in BDA for large-scale wireless networks according to each stage in the life cycle of BDA. Moreover, we discuss the open research issues and outline the future directions in this promising area

    Web Data Extraction, Applications and Techniques: A Survey

    Full text link
    Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System

    Introduction to the second international symposium of platial information science

    Get PDF
    People ‘live’ and constitute places every day through recurrent practices and experience. Our everyday lives, however, are complex, and so are places. In contrast to abstract space, the way people experience places includes a range of aspects like physical setting, meaning, and emotional attachment. This inherent complexity requires researchers to investigate the concept of place from a variety of viewpoints. The formal representation of place – a major goal in GIScience related to place – is no exception and can only be successfully addressed if we consider geographical, psychological, anthropological, sociological, cognitive, and other perspectives. This year’s symposium brings together place-based researchers from different disciplines to discuss the current state of platial research. Therefore, this volume contains contributions from a range of fields including geography, psychology, cognitive science, linguistics, and cartography

    Inferring user interests in microblogging social networks: a survey

    Get PDF
    With the growing popularity of microblogging services such as Twitter in recent years, an increasing number of users are using these services in their daily lives. The huge volume of information generated by users raises new opportunities in various applications and areas. Inferring user interests plays a significant role in providing personalized recommendations on microblogging services, and also on third-party applications providing social logins via these services, especially in cold-start situations. In this survey, we review user modeling strategies with respect to inferring user interests from previous studies. To this end, we focus on four dimensions of inferring user interest profiles: (1) data collection, (2) representation of user interest profiles, (3) construction and enhancement of user interest profiles, and (4) the evaluation of the constructed profiles. Through this survey, we aim to provide an overview of state-of-the-art user modeling strategies for inferring user interest profiles on microblogging social networks with respect to the four dimensions. For each dimension, we review and summarize previous studies based on specified criteria. Finally, we discuss some challenges and opportunities for future work in this research domain

    Trustworthiness in Social Big Data Incorporating Semantic Analysis, Machine Learning and Distributed Data Processing

    Get PDF
    This thesis presents several state-of-the-art approaches constructed for the purpose of (i) studying the trustworthiness of users in Online Social Network platforms, (ii) deriving concealed knowledge from their textual content, and (iii) classifying and predicting the domain knowledge of users and their content. The developed approaches are refined through proof-of-concept experiments, several benchmark comparisons, and appropriate and rigorous evaluation metrics to verify and validate their effectiveness and efficiency, and hence, those of the applied frameworks

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    Measures of Privacy Protection on Social Environments

    Full text link
    Tesis por compendio[EN] Nowadays, online social networks (OSNs) have become a mainstream cultural phenomenon for millions of Internet users. Social networks are an ideal environment for generating all kinds of social benefits for users. Users share experiences, keep in touch with their family, friends and acquaintances, and earn economic benefits from the power of their influence (which is translated into new job opportunities). However, the use of social networks and the action of sharing information imply the loss of the users’ privacy. Recently, a great interest in protecting the privacy of users has emerged. This situation has been due to documented cases of regrets in users’ actions, company scandals produced by misuse of personal information, and the biases introduced by privacy mechanisms. Social network providers have included improvements in their systems to reduce users’ privacy risks; for example, restricting privacy policies by default, adding new privacy settings, and designing quick and easy shortcuts to configure user privacy settings. In the privacy researcher area, new advances are proposed to improve privacy mechanisms, most of them focused on automation, fine-grained systems, and the usage of features extracted from the user’s profile information and interactions to recommend the best privacy policy for the user. Despite these advances, many studies have shown that users’ concern for privacy does not match the decisions they ultimately make in social networks. This misalignment in the users’ behavior might be due to the complexity of the privacy concept itself. This drawback causes users to disregard privacy risks, or perceive them as temporarily distant. Another cause of users’ behavior misalignment might be due to the complexity of the privacy decision-making process. This is because users should consider all possible scenarios and the factors involved (e.g., the number of friends, the relationship type, the context of the information, etc.) to make an appropriate privacy decision. The main contributions of this thesis are the development of metrics to assess privacy risks, and the proposal of explainable privacy mechanisms (using the developed metrics) to assist and raise awareness among users during the privacy decision process. Based on the definition of the concept of privacy, the dimensions of information scope and information sensitivity have been considered in this thesis to assess privacy risks. For explainable privacy mechanisms, soft paternalism techniques and gamification elements that make use of the proposed metrics have been designed. These mechanisms have been integrated into the social network PESEDIA and evaluated in experiments with real users. PESEDIA is a social network developed in the framework of the Master’s thesis of the Ph.D. student [15], this thesis, and the national projects “Privacy in Social Educational Environments during Childhood and Adolescence” (TIN2014-55206- R) and “Intelligent Agents for Privacy Advice in Social Networks” (TIN2017-89156-R). The findings confirm the validity of the proposed metrics for computing the users’ scope and the sensitivity of social network publications. For the scope metric, the results also showed the possibility of estimating it through local and social centrality metrics for scenarios with limited information access. For the sensitivity metric, the results also remarked the users’ misalignment for some information types and the consensus for a majority of them. The usage of these metrics as part of messages about potential consequences of privacy policy choices and information sharing actions to users showed positive effects on users’ behavior regarding privacy. Furthermore, the findings of exploring the users’ trade-off between costs and benefits during disclosure actions of personal information showed significant relationships with the usual social circles (family members, friends, coworkers, and unknown users) and their properties. This allowed designing better privacy mechanisms that appropriately restrict access to information and reduce regrets. Finally, gamification elements applied to social networks and users’ privacy showed a positive effect on the users’ behavior towards privacy and safe practices in social networks.[ES] En la actualidad, las redes sociales se han convertido en un fenómeno cultural dominante para millones de usuarios de Internet. Las redes sociales son un entorno ideal para la generación de todo tipo de beneficios sociales para los usuarios. Los usuarios comparten experiencias, mantienen el contacto con sus familiares, amigos y conocidos, y obtienen beneficios económicos gracias al poder de su influencia (lo que se traduce en nuevas oportunidades de trabajo). Sin embargo, el uso de las redes sociales y la acción de compartir información implica la perdida de la privacidad de los usuarios. Recientemente ha emergido un gran interés en proteger la privacidad de los usuarios. Esta situación se ha debido a los casos de arrepentimientos documentados en las acciones de los usuarios, escándalos empresariales producidos por usos indebidos de la información personal, y a los sesgos que introducen los mecanismos de privacidad. Los proveedores de redes sociales han incluido mejoras en sus sistemas para reducir los riesgos en privacidad de los usuarios; por ejemplo, restringiendo las políticas de privacidad por defecto, añadiendo nuevos elementos de configuración de la privacidad, y diseñando accesos fáciles y directos para configurar la privacidad de los usuarios. En el campo de la investigación de la privacidad, nuevos avances se proponen para mejorar los mecanismos de privacidad la mayoría centrados en la automatización, selección de grano fino, y uso de características extraídas de la información y sus interacciones para recomendar la mejor política de privacidad para el usuario. A pesar de estos avances, muchos estudios han demostrado que la preocupación de los usuarios por la privacidad no se corresponde con las decisiones que finalmente toman en las redes sociales. Este desajuste en el comportamiento de los usuarios podría deberse a la complejidad del propio concepto de privacidad. Este inconveniente hace que los usuarios ignoren los riesgos de privacidad, o los perciban como temporalmente distantes. Otra causa del desajuste en el comportamiento de los usuarios podría deberse a la complejidad del proceso de toma de decisiones sobre la privacidad. Esto se debe a que los usuarios deben considerar todos los escenarios posibles y los factores involucrados (por ejemplo, el número de amigos, el tipo de relación, el contexto de la información, etc.) para tomar una decisión apropiada sobre la privacidad. Las principales contribuciones de esta tesis son el desarrollo de métricas para evaluar los riesgos de privacidad, y la propuesta de mecanismos de privacidad explicables (haciendo uso de las métricas desarrolladas) para asistir y concienciar a los usuarios durante el proceso de decisión sobre la privacidad. Atendiendo a la definición del concepto de la privacidad, las dimensiones del alcance de la información y la sensibilidad de la información se han considerado en esta tesis para evaluar los riesgos de privacidad. En cuanto a los mecanismos de privacidad explicables, se han diseñado utilizando técnicas de paternalismo blando y elementos de gamificación que hacen uso de las métricas propuestas. Estos mecanismos se han integrado en la red social PESEDIA y evaluado en experimentos con usuarios reales. PESEDIA es una red social desarrollada en el marco de la tesina de Master del doctorando [15], esta tesis y los proyectos nacionales “Privacidad en Entornos Sociales Educativos durante la Infancia y la Adolescencia” (TIN2014-55206-R) y “Agentes inteligentes para asesorar en privacidad en redes sociales” (TIN2017-89156-R). Los resultados confirman la validez de las métricas propuestas para calcular el alcance de los usuarios y la sensibilidad de las publicaciones de las redes sociales. En cuanto a la métrica del alcance, los resultados también mostraron la posibilidad de estimarla mediante métricas de centralidad local y social para escenarios con acceso limitado a la información. En cuanto a la métrica de sensibilidad, los resultados también pusieron de manifiesto la falta de concordancia de los usuarios en el caso de algunos tipos de información y el consenso en el caso de la mayoría de ellos. El uso de estas métricas como parte de los mensajes sobre las posibles consecuencias de las opciones de política de privacidad y las acciones de intercambio de información a los usuarios mostró efectos positivos en el comportamiento de los usuarios con respecto a la privacidad. Además, los resultados de la exploración de la compensación de los usuarios entre los costos y los beneficios durante las acciones de divulgación de información personal mostraron relaciones significativas con los círculos sociales habituales (familiares, amigos, compañeros de trabajo y usuarios desconocidos) y sus propiedades. Esto permitió diseñar mejores mecanismos de privacidad que restringen adecuadamente el acceso a la información y reducen los arrepentimientos. Por último, los elementos de gamificación aplicados a las redes sociales y a la privacidad de los usuarios mostraron un efecto positivo en el comportamiento de los usuarios hacia la privacidad y las prácticas seguras en las redes sociales.[CA] En l’actualitat, les xarxes socials s’han convertit en un fenomen cultural dominant per a milions d’usuaris d’Internet. Les xarxes socials són un entorn ideal per a la generació de tota mena de beneficis socials per als usuaris. Els usuaris comparteixen experiències, mantenen el contacte amb els seus familiars, amics i coneguts, i obtenen beneficis econòmics gràcies al poder de la seva influència (el que es tradueix en noves oportunitats de treball). No obstant això, l’ús de les xarxes socials i l’acció de compartir informació implica la perduda de la privacitat dels usuaris. Recentment ha emergit un gran interès per protegir la privacitat dels usuaris. Aquesta situació s’ha degut als casos de penediments documentats en les accions dels usuaris, escàndols empresarials produïts per usos indeguts de la informació personal, i als caires que introdueixen els mecanismes de privacitat. Els proveïdors de xarxes socials han inclòs millores en els seus sistemes per a reduir els riscos en privacitat dels usuaris; per exemple, restringint les polítiques de privacitat per defecte, afegint nous elements de configuració de la privacitat, i dissenyant accessos fàcils i directes per a configurar la privacitat dels usuaris. En el camp de la recerca de la privacitat, nous avanços es proposen per a millorar els mecanismes de privacitat la majoria centrats en l’automatització, selecció de gra fi, i ús de característiques extretes de la informació i les seues interaccions per a recomanar la millor política de privacitat per a l’usuari. Malgrat aquests avanços, molts estudis han demostrat que la preocupació dels usuaris per la privacitat no es correspon amb les decisions que finalment prenen en les xarxes socials. Aquesta desalineació en el comportament dels usuaris podria deure’s a la complexitat del propi concepte de privacitat. Aquest inconvenient fa que els usuaris ignorin els riscos de privacitat, o els percebin com temporalment distants. Una altra causa de la desalineació en el comportament dels usuaris podria deure’s a la complexitat del procés de presa de decisions sobre la privacitat. Això es deu al fet que els usuaris han de considerar tots els escenaris possibles i els factors involucrats (per exemple, el nombre d’amics, el tipus de relació, el context de la informació, etc.) per a prendre una decisió apropiada sobre la privacitat. Les principals contribucions d’aquesta tesi són el desenvolupament de mètriques per a avaluar els riscos de privacitat, i la proposta de mecanismes de privacitat explicables (fent ús de les mètriques desenvolupades) per a assistir i conscienciar als usuaris durant el procés de decisió sobre la privacitat. Atesa la definició del concepte de la privacitat, les dimensions de l’abast de la informació i la sensibilitat de la informació s’han considerat en aquesta tesi per a avaluar els riscos de privacitat. Respecte als mecanismes de privacitat explicables, aquests s’han dissenyat utilitzant tècniques de paternalisme bla i elements de gamificació que fan ús de les mètriques propostes. Aquests mecanismes s’han integrat en la xarxa social PESEDIA i avaluat en experiments amb usuaris reals. PESEDIA és una xarxa social desenvolupada en el marc de la tesina de Màster del doctorant [15], aquesta tesi i els projectes nacionals “Privacitat en Entorns Socials Educatius durant la Infància i l’Adolescència” (TIN2014-55206-R) i “Agents Intel·ligents per a assessorar en Privacitat en xarxes socials” (TIN2017-89156-R). Els resultats confirmen la validesa de les mètriques propostes per a calcular l’abast de les accions dels usuaris i la sensibilitat de les publicacions de les xarxes socials. Respecte a la mètrica de l’abast, els resultats també van mostrar la possibilitat d’estimarla mitjançant mètriques de centralitat local i social per a escenaris amb accés limitat a la informació. Respecte a la mètrica de sensibilitat, els resultats també van posar de manifest la falta de concordança dels usuaris en el cas d’alguns tipus d’informació i el consens en el cas de la majoria d’ells. L’ús d’aquestes mètriques com a part dels missatges sobre les possibles conseqüències de les opcions de política de privacitat i les accions d’intercanvi d’informació als usuaris va mostrar efectes positius en el comportament dels usuaris respecte a la privacitat. A més, els resultats de l’exploració de la compensació dels usuaris entre els costos i els beneficis durant les accions de divulgació d’informació personal van mostrar relacions significatives amb els cercles socials habituals (familiars, amics, companys de treball i usuaris desconeguts) i les seves propietats. Això ha permés dissenyar millors mecanismes de privacitat que restringeixen adequadament l’accés a la informació i redueixen els penediments. Finalment, els elements de gamificació aplicats a les xarxes socials i a la privacitat dels usuaris van mostrar un efecte positiu en el comportament dels usuaris cap a la privacitat i les pràctiques segures en les xarxes socials.Alemany Bordera, J. (2020). Measures of Privacy Protection on Social Environments [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/151456TESISCompendi

    Data quality measures for identity resolution

    Get PDF
    The explosion in popularity of online social networks has led to increased interest in identity resolution from security practitioners. Being able to connect together the multiple online accounts of a user can be of use in verifying identity attributes and in tracking the activity of malicious users. At the same time, privacy researchers are exploring the same phenomenon with interest in identifying privacy risks caused by re-identification attacks. Existing literature has explored how particular components of an online identity may be used to connect profiles, but few if any studies have attempted to assess the comparative value of information attributes. In addition, few of the methods being reported are easily comparable, due to difficulties with obtaining and sharing ground- truth data. Attempts to gain a comprehensive understanding of the identifiability of profile attributes are hindered by these issues. With a focus on overcoming these hurdles to effective research, this thesis first develops a methodology for sampling ground-truth data from online social networks. Building on this with reference to both existing literature and samples of real profile data, this thesis describes and grounds a comprehensive matching schema of profile attributes. The work then defines data quality measures which are important for identity resolution, and measures the availability, consistency and uniqueness of the schema’s contents. The developed measurements are then applied in a feature selection scheme to reduce the impact of missing data issues common in identity resolution. Finally, this thesis addresses the purposes to which identity resolution may be applied, defining the further application-oriented data quality measurements of novelty, veracity and relevance, and demonstrating their calculation and application for a particular use case: evaluating the social engineering vulnerability of an organisation
    corecore