62 research outputs found

    Unsupervised learning on social data

    Get PDF

    Algorithms and Models for the Web Graph

    Get PDF

    Digital traces and urban research : Barcelona through social media data

    No full text
    Most of the world’s population now resides in urban areas, and it is expected that almost all of the planet’s growth will be concentrated in them for the next 30 years, making the improvement of the quality of life in the cities one of the big challenges of this century. To that end, it is crucial to have information on how people use the spaces in the city, and allows urban planning to successfully respond to their needs. This dissertation proposes using data shared voluntarily by the millions of users that make up social network’s communities as a valuable tool for the study of the complexity of the city, because of its capacity of providing an unprecedented volume of urban information, with geographic, temporal, semantic and multimedia components. However, the volume and variety of data raises important challenges regarding its retrieval, manipulation, analysis and representation, requiring the adoption of the best practices in data science, using a multi-faceted approach in the field of urban studies with a strong emphasis in the reproducibility of the developed methodologies. This research focuses in the case of study of the city of Barcelona, using the public data collected from Panoramio, Flickr, Twitter and Instagram. After a literature review, the methods to access the different services are discussed, along with their available data and limitations. Next, the retrieved data is analyzed at different spatial and temporal scales. The first approximation to data focuses on the origins of users who took geotagged pictures of Barcelona, geocoding the hometowns that appear in their Flickr public profiles, allowing the identification of the regions, countries and cities with the largest influx of visitors, and relating the results with multiple indicators at a global scale. The next scale of analysis discusses the city as a whole, developing methodologies for the representation of the spatial distribution of the collected locations, avoiding the artifacts produced by overplotting. To this end, locations are aggregated in regular tessellations, whose size is determined empirically from their spatial distribution. Two spatial statistics techniques (Moran’s I and Getis-Ord’s G*) are used to visualize the local spatial autocorrelation of the areas with exceptionally high or low densities, under a statistical significance framework. Finally, the kernel density estimation is introduced as a non-parametric alternative. The third level of detail follows the official administrative division of Barcelona in 73 neighborhoods and 12 districts, which obeys to historical, morphological and functional criteria. Micromaps are introduced as a representation technique capable of providing a geographical context to commonly used statistical graphics, along with a methodology to produce these micromaps automatically. This technique is compared to annotated scatterplots to relate picture intensity with different urban indicators at a neighborhood scale. The hypothesis of spatial homogeneity is abandoned at the most detailed scale, focusing the analysis on the street network. Two techniques to assign events to road segments in the street graph are presented (direct by shortest distance or by proxy through the postal addresses), as well as the generalization of the kernel density estimation from the Euclidean space to a network topology. Beyond the spatial domain, the interactions of three temporal cycles are further analyzed using the timestamps available in the picture metadata: daytime/nighttime (daily cycle), work/leisure (weekly cycle) and seasonal (yearly cycle).La major part de la població mundial resideix actualment en àrees urbanes, i es preveu que pràcticament tot el creixement del planeta es concentri en elles en els propers 30 anys, convertint la millora de la qualitat de vida a les ciutats en un dels grans reptes del present segle. És per tant imprescindible disposar d'informació sobre les activitats que les persones desenvolupen en elles, que permetin al planejament donar resposta a les seves necessitats. Aquesta tesi proposa l'ús de dades compartides de manera voluntària pels milions d'usuaris que conformen les comunitats de les xarxes socials com una valuosa eina per a l'estudi de la complexitat de la ciutat, per la seva capacitat de proporcionar un volum d'informació urbana sense precedents, reunint components tant geogràfics, temporals, semàntics i multimèdia. No obstant això, aquest volum i varietat de les dades planteja grans reptes pel que fa a la seva obtenció, tractament, anàlisi i representació, requerint adoptar les millors pràctiques de la ciència de dades, aplicades des de múltiples punts de vista al camp dels estudis urbans, posant sempre l'èmfasi en la reproductibilitat de les metodologies desenvolupades. Aquesta investigació se centra en el cas d'estudi de la ciutat de Barcelona, a partir de les dades públiques obtingudes de Panoramio, Flickr, Twitter i Instagram. Després d'una revisió de l'estat de l'art, es desenvolupa l'operativa d'accés als diferents serveis, revisant les dades disponibles i les seves limitacions. A continuació, s'analitzen les dades obtingudes en diferents escales espacials i temporals. La primera aproximació a les dades es desenvolupa a partir de l'origen dels usuaris que han pres fotografies geolocalitzades de Barcelona, a través de la geocodificació de les ubicacions que apareixen en els seus perfils públics de Flickr, permetent identificar les regions, països i ciutats amb major afluència de visitants i relacionar els resultats amb diferents indicadors a escala global. La següent escala d'anàlisi es centra en la ciutat en el seu conjunt, desenvolupant metodologies per a la representació de la distribució espacial de les localitzacions obtingudes, evitant els artefactes produïts per la superposició de mostres. Per a això s'agreguen les localitzacions en tesselacions regulars, la mida de les quals es determina empíricament a partir de la seva distribució espacial. S'utilitzen dues tècniques d'estadística espacial (I de Moran i G* de Getis-Ord) per a visualitzar l'autocorrelació espacial local dels àmbits amb densitats excepcionalment altes o baixes, seguint un criteri de significança estadística. Finalment s'introdueix com a alternativa no paramètrica l'estimació de la densitat. El tercer nivell de detall coincideix amb la delimitació administrativa oficial de Barcelona en 73 barris i 12 districtes, realitzada a partir de criteris històrics, morfològics i funcionals. S'introdueixen els micromapes com a tècnica de representació capaç d'aportar un context geogràfic a gràfics estadístics d'ús comú, juntament amb una metodologia per produir aquests micromapes de manera automàtica. Es compara aquesta tècnica amb diagrames de dispersió anotats per a relacionar la intensitat de fotografies amb diferents indicadors urbans a escala de barri. En l'escala més detallada s'abandona la hipòtesi d'homogeneïtat espacial i es trasllada l'anàlisi al sistema viari. Es presenten dues tècniques d'atribució de localitzacions a trams de carrer del graf vial (directa per distància o indirecta a través de les adreces postals), així com la generalització de l'estimació de la densitat d'un espai euclidià a una topologia de xarxa. Fora del context espacial, s'analitzen les interaccions de tres cicles temporals a partir de les metadades del moment en què van ser preses les fotografies: diürn/nocturn (cicle diari), treball/oci (cicle setmanal) i estacional (cicle anual).Postprint (published version

    Digital traces and urban research : Barcelona through social media data

    Get PDF
    Most of the world’s population now resides in urban areas, and it is expected that almost all of the planet’s growth will be concentrated in them for the next 30 years, making the improvement of the quality of life in the cities one of the big challenges of this century. To that end, it is crucial to have information on how people use the spaces in the city, and allows urban planning to successfully respond to their needs. This dissertation proposes using data shared voluntarily by the millions of users that make up social network’s communities as a valuable tool for the study of the complexity of the city, because of its capacity of providing an unprecedented volume of urban information, with geographic, temporal, semantic and multimedia components. However, the volume and variety of data raises important challenges regarding its retrieval, manipulation, analysis and representation, requiring the adoption of the best practices in data science, using a multi-faceted approach in the field of urban studies with a strong emphasis in the reproducibility of the developed methodologies. This research focuses in the case of study of the city of Barcelona, using the public data collected from Panoramio, Flickr, Twitter and Instagram. After a literature review, the methods to access the different services are discussed, along with their available data and limitations. Next, the retrieved data is analyzed at different spatial and temporal scales. The first approximation to data focuses on the origins of users who took geotagged pictures of Barcelona, geocoding the hometowns that appear in their Flickr public profiles, allowing the identification of the regions, countries and cities with the largest influx of visitors, and relating the results with multiple indicators at a global scale. The next scale of analysis discusses the city as a whole, developing methodologies for the representation of the spatial distribution of the collected locations, avoiding the artifacts produced by overplotting. To this end, locations are aggregated in regular tessellations, whose size is determined empirically from their spatial distribution. Two spatial statistics techniques (Moran’s I and Getis-Ord’s G*) are used to visualize the local spatial autocorrelation of the areas with exceptionally high or low densities, under a statistical significance framework. Finally, the kernel density estimation is introduced as a non-parametric alternative. The third level of detail follows the official administrative division of Barcelona in 73 neighborhoods and 12 districts, which obeys to historical, morphological and functional criteria. Micromaps are introduced as a representation technique capable of providing a geographical context to commonly used statistical graphics, along with a methodology to produce these micromaps automatically. This technique is compared to annotated scatterplots to relate picture intensity with different urban indicators at a neighborhood scale. The hypothesis of spatial homogeneity is abandoned at the most detailed scale, focusing the analysis on the street network. Two techniques to assign events to road segments in the street graph are presented (direct by shortest distance or by proxy through the postal addresses), as well as the generalization of the kernel density estimation from the Euclidean space to a network topology. Beyond the spatial domain, the interactions of three temporal cycles are further analyzed using the timestamps available in the picture metadata: daytime/nighttime (daily cycle), work/leisure (weekly cycle) and seasonal (yearly cycle).La major part de la població mundial resideix actualment en àrees urbanes, i es preveu que pràcticament tot el creixement del planeta es concentri en elles en els propers 30 anys, convertint la millora de la qualitat de vida a les ciutats en un dels grans reptes del present segle. És per tant imprescindible disposar d'informació sobre les activitats que les persones desenvolupen en elles, que permetin al planejament donar resposta a les seves necessitats. Aquesta tesi proposa l'ús de dades compartides de manera voluntària pels milions d'usuaris que conformen les comunitats de les xarxes socials com una valuosa eina per a l'estudi de la complexitat de la ciutat, per la seva capacitat de proporcionar un volum d'informació urbana sense precedents, reunint components tant geogràfics, temporals, semàntics i multimèdia. No obstant això, aquest volum i varietat de les dades planteja grans reptes pel que fa a la seva obtenció, tractament, anàlisi i representació, requerint adoptar les millors pràctiques de la ciència de dades, aplicades des de múltiples punts de vista al camp dels estudis urbans, posant sempre l'èmfasi en la reproductibilitat de les metodologies desenvolupades. Aquesta investigació se centra en el cas d'estudi de la ciutat de Barcelona, a partir de les dades públiques obtingudes de Panoramio, Flickr, Twitter i Instagram. Després d'una revisió de l'estat de l'art, es desenvolupa l'operativa d'accés als diferents serveis, revisant les dades disponibles i les seves limitacions. A continuació, s'analitzen les dades obtingudes en diferents escales espacials i temporals. La primera aproximació a les dades es desenvolupa a partir de l'origen dels usuaris que han pres fotografies geolocalitzades de Barcelona, a través de la geocodificació de les ubicacions que apareixen en els seus perfils públics de Flickr, permetent identificar les regions, països i ciutats amb major afluència de visitants i relacionar els resultats amb diferents indicadors a escala global. La següent escala d'anàlisi es centra en la ciutat en el seu conjunt, desenvolupant metodologies per a la representació de la distribució espacial de les localitzacions obtingudes, evitant els artefactes produïts per la superposició de mostres. Per a això s'agreguen les localitzacions en tesselacions regulars, la mida de les quals es determina empíricament a partir de la seva distribució espacial. S'utilitzen dues tècniques d'estadística espacial (I de Moran i G* de Getis-Ord) per a visualitzar l'autocorrelació espacial local dels àmbits amb densitats excepcionalment altes o baixes, seguint un criteri de significança estadística. Finalment s'introdueix com a alternativa no paramètrica l'estimació de la densitat. El tercer nivell de detall coincideix amb la delimitació administrativa oficial de Barcelona en 73 barris i 12 districtes, realitzada a partir de criteris històrics, morfològics i funcionals. S'introdueixen els micromapes com a tècnica de representació capaç d'aportar un context geogràfic a gràfics estadístics d'ús comú, juntament amb una metodologia per produir aquests micromapes de manera automàtica. Es compara aquesta tècnica amb diagrames de dispersió anotats per a relacionar la intensitat de fotografies amb diferents indicadors urbans a escala de barri. En l'escala més detallada s'abandona la hipòtesi d'homogeneïtat espacial i es trasllada l'anàlisi al sistema viari. Es presenten dues tècniques d'atribució de localitzacions a trams de carrer del graf vial (directa per distància o indirecta a través de les adreces postals), així com la generalització de l'estimació de la densitat d'un espai euclidià a una topologia de xarxa. Fora del context espacial, s'analitzen les interaccions de tres cicles temporals a partir de les metadades del moment en què van ser preses les fotografies: diürn/nocturn (cicle diari), treball/oci (cicle setmanal) i estacional (cicle anual)

    5th International Conference on Advanced Research Methods and Analytics (CARMA 2023)

    Full text link
    Research methods in economics and social sciences are evolving with the increasing availability of Internet and Big Data sources of information. As these sources, methods, and applications become more interdisciplinary, the 5th International Conference on Advanced Research Methods and Analytics (CARMA) is a forum for researchers and practitioners to exchange ideas and advances on how emerging research methods and sources are applied to different fields of social sciences as well as to discuss current and future challenges.Martínez Torres, MDR.; Toral Marín, S. (2023). 5th International Conference on Advanced Research Methods and Analytics (CARMA 2023). Editorial Universitat Politècnica de València. https://doi.org/10.4995/CARMA2023.2023.1700

    Analyzing Granger causality in climate data with time series classification methods

    Get PDF
    Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested

    Brain-Inspired Computing

    Get PDF
    This open access book constitutes revised selected papers from the 4th International Workshop on Brain-Inspired Computing, BrainComp 2019, held in Cetraro, Italy, in July 2019. The 11 papers presented in this volume were carefully reviewed and selected for inclusion in this book. They deal with research on brain atlasing, multi-scale models and simulation, HPC and data infra-structures for neuroscience as well as artificial and natural neural architectures

    Efficient tree-based content-based routing schemes

    Get PDF
    This thesis is about routing and forwarding for inherently multicast communication such as the communication typical of information-centric networks. The notion of Information-Centric Networking (ICN) is an evolution of the Internet from the current host-centric architecture to a new architecture in which communication is based on “named information”. The ambitious goal of ICN is to effectively support the exchange and use of information in an ever more connected world, with billions of devices, many of which are mobile, producing and consuming large amounts of data. ICN is intended to support scalable content distribution, mobility, and security, for such applications as video on demand and networks of sensors or the so-called Internet of Things. Many ICN architectures have emerged in the past decade, and the ICN community has made significant progress in terms of infrastructure, test-bed deployments, and application case studies. And yet, despite the impressive research effort, the fundamental problems of routing and forwarding remain open. In particular, none of the proposed architectures has developed truly scalable name-based routing schemes and efficient name-based forwarding algorithms. This is not surprising, since the problem of routing based on names, in its most general formulation, is known to be fundamentally difficult. In general, one would want to support application-defined names (as opposed to network-defined addresses) with a compact routing scheme (small routing tables) that uses optimal paths and minimizes congestion, and that admits to a fast forwarding algorithm. Furthermore, one would want to construct this routing scheme with a decentralized and incremental protocol for administrative autonomy and efficient dynamic updates. However, there are clear theoretical limits that simply make it impossible to achieve all these goals. In this thesis we explore the design space of routing and forwarding in an information-centric network. Our purpose is to develop routing schemes and forwarding algorithms that combine many desirable properties. We consider two forms of addressing, one tied to network locations, and one based on more expressive content descriptors. We then consider trees as basic routing structures, and with those we develop routing schemes that are intended to minimize path lengths and congestion, separately or together. For one of these schemes based on expressive content descriptors, we also develop a fast forwarding algorithm specialized for massively parallel architectures such as GPUs. In summary, this thesis presents two efficient and scalable routing algorithms for two different types of networks, plus one scalable forwarding algorithm. We summarize each individual contribution below: Low-congestion geographic routing for wireless networks. We develop a low-congestion, multicast routing scheme designed specifically for wireless networks. The scheme supports geographical multicast routing, meaning routing to a set of nodes addressed by their physical position. The scheme builds a geometric minimum spanning tree connecting the source to all the destinations. Then, for each edge in this tree, the scheme routes a message through a random intermediate node, chosen independently of the set of multicast requests. The intermediate node is chosen in the vicinity of the corresponding edge such that congestion is reduced without stretching routes by more than a constant factor. Multi-tree scheme for content-based routing in ICN. We develop a tree-based routing scheme designed for large-scale wired networks such as the Internet. The scheme supports two forms of addresses: application-defined content descriptors, and network-defined locators. We first show that the scheme is effective in terms of stretch and congestion on the current AS-level Internet graph even with only a few spanning trees. Then we show that our content descriptors, which consist of sets of tags and that are more expressive than the name prefixes used in mainstream ICN, aggregate well in practice under our scheme. We also explain in detail how to use descriptors and locators, together with unique content identifiers, to support the efficient transmission and sharing of information through scalable and loop-free routes. Tag-based forwarding (partial matching) algorithm on GPUs. To accompany our ICN routing scheme, we develop a fast forwarding algorithm that matches incoming packets against forwarding tables with tens of millions of entries. To achieve high performance, we develop a practical solution for the partial matching problem that lies at the heart of this forwarding scheme. This solution amounts to a massively parallel algorithm specifically designed for a hybrid CPU/GPU architecture

    Recent Developments in Smart Healthcare

    Get PDF
    Medicine is undergoing a sector-wide transformation thanks to the advances in computing and networking technologies. Healthcare is changing from reactive and hospital-centered to preventive and personalized, from disease focused to well-being centered. In essence, the healthcare systems, as well as fundamental medicine research, are becoming smarter. We anticipate significant improvements in areas ranging from molecular genomics and proteomics to decision support for healthcare professionals through big data analytics, to support behavior changes through technology-enabled self-management, and social and motivational support. Furthermore, with smart technologies, healthcare delivery could also be made more efficient, higher quality, and lower cost. In this special issue, we received a total 45 submissions and accepted 19 outstanding papers that roughly span across several interesting topics on smart healthcare, including public health, health information technology (Health IT), and smart medicine
    corecore