936 research outputs found

    Algoritmos de aproximação para problemas de alocação de instalações e outros problemas de cadeia de fornecimento

    Get PDF
    Orientadores: Flávio Keidi Miyazawa, Maxim SviridenkoTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O resumo poderá ser visualizado no texto completo da tese digitalAbstract: The abstract is available with the full electronic documentDoutoradoCiência da ComputaçãoDoutor em Ciência da Computaçã

    A systematic approach to bound factor-revealing LPs and its application to the metric and squared metric facility location problems

    No full text
    Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)A systematic technique to bound factor-revealing linear programs is presented. We show how to derive a family of upper bound factor-revealing programs (UPFRP), and show that each such program can be solved by a computer to bound the approximation factor of an associated algorithm. Obtaining an UPFRP is straightforward, and can be used as an alternative to analytical proofs, that are usually very long and tedious. We apply this technique to the metric facility location problem (MFLP) and to a generalization where the distance function is a squared metric. We call this generalization the squared metric facility location problem (SMFLP), and prove that there is no approximation factor better than 2.04, assuming P not equal NP. Then, we analyze the best known algorithms for the MFLP based on primal-dual and LP-rounding techniques when they are applied to the SMFLP. We prove very tight bounds for these algorithms, and show that the LP-rounding algorithm achieves a ratio of 2.04, and therefore has the best possible factor for the SMFLP. We use UPFRPs in the dualfitting analysis of the primal-dual algorithms for both the SMFLP and the MFLP, improving some of the previous analysis for the MFLP.A systematic technique to bound factor-revealing linear programs is presented. We show how to derive a family of upper bound factor-revealing programs (UPFRP), and show that each such program can be solved by a computer to bound the approximation factor o1532655685CNPQ - CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICOFAPESP - FUNDAÇÃO DE AMPARO À PESQUISA DO ESTADO DE SÃO PAULONUMEC - NÚCLEO DE APOIO À PESQUISA EM MODELAGEM ESTOCÁSTICA E COMPLEXIDADEConselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)CNPq [309657/2009-1, 306860/2010-4, 473867/2010-9, 477692/2012-5]FAPESP [2010/20710-4]309657/2009-1; 306860/2010-4; 473867/2010-9; 477692/2012-52010/20710-4sem informaçã

    Design, analysis and implementation of advanced methodologies to measure the socio-economic impact of personal data in large online services

    Get PDF
    El ecosistema web es enorme y, en general, se sustenta principalmente en un atributo intangible que sostiene la mayoría de los servicios gratuitos: la explotación de la información personal del usuario. A lo largo de los años, la preocupación por la forma en que los servicios utilizan los datos personales ha aumentado y atraído la atención de los medios de comunicación, gobiernos, reguladores y también de los usuarios. Esta recogida de información personal es hoy en día la principal fuente de ingresos en Internet. Además, por si fuera poco, la publicidad online es la pieza que lo sustenta todo. Sin la existencia de datos personales en comunión con la publicidad online, Internet probablemente no sería el gigante que hoy conocemos. La publicidad online es un ecosistema muy complejo en el que participan múltiples actores. Es el motor principal que genera ingresos en la red, y en pocos años ha evolucionado hasta llegar a miles de millones de usuarios en todo el mundo. Mientras navegan, los usuarios generan datos muy valiosos sobre sí mismos que los anunciantes utilizan después para ofrecerles productos relevantes en los que podrían estar interesados. Se trata de un enfoque bidireccional, ya que los anunciantes pagan a intermediarios para que muestren anuncios al público que, en principio, está más interesado. Sin embargo, este comercio, intercambio y tratamiento de datos personales, además de abrir nuevas vías de publicidad, exponen la privacidad de los usuarios. Esta incesante recopilación y comercialización de la información personal suele quedar tras un muro opaco, donde el usuario generalmente desconoce para qué se utilizan sus datos. Las iniciativas de privacidad y transparencia se han incrementado a lo largo de los años para empoderar al usuario en este negocio que mueve miles de millones de dólares en ingresos. No en vano, tras varios escándalos, como el de Facebook Cambridge Analytica, las empresas y los reguladores se han unido para crear transparencia y proteger a los usuarios de las malas prácticas derivadas del uso de su información personal. Por ejemplo, el Reglamento General de Protección de Datos, es el ejemplo más prometedor de regulación, que afecta a todos los estados miembros de la Unión Europea, abogando por la protección de los usuarios. El contenido de esta tesis tomará como referencia esta legislación. Por todo ello, el propósito de esta tesis consiste en aportar herramientas y metodologías que pongan de manifiesto usos inapropiados de datos personales por las grandes compañías del ecosistema publicitario online, y cree transparencia entre los usuarios, proporcionando, a su vez, soluciones para que se protejan. Así pues, el contenido de esta tesis ofrece diseño, análisis e implementación de metodologías que miden el impacto social y económico de la información personal online en los servicios extensivos de Internet. Principalmente, se centra en Facebook, una de las mayores redes sociales y servicios en la web, que cuenta con más de 2,8B de usuarios en todo el mundo y generó unos ingresos solo en publicidad online de más de 84 mil millones de dólares en el año 2020. En primer lugar, esta tesis presenta una solución, en forma de extensión del navegador llamada FDVT (Data Valuation Tool for Facebook users), para proporcionar a los usuarios una estimación personalizada y en tiempo real del dinero que están generando para Facebook. Analizando el número de anuncios e interacciones en una sesión, el usuario obtiene información sobre su valor dentro de esta red social. La extensión del navegador ha tenido una importante repercusión y adopción tanto por parte de los usuarios, instalándose más de 10k veces desde su lanzamiento público en octubre de 2016, como de los medios de comunicación, apareciendo en más de 100 medios. En segundo lugar, el estudio e investigación de los posibles riesgos asociados al tratamiento de los datos de los usuarios debe seguir también a la creación de este tipo de soluciones. En este contexto, esta tesis descubre y desvela resultados impactantes sobre el uso de la información personal: (i) cuantifica el número de usuarios afectados por el uso de atributos sensibles utilizados para la publicidad en Facebook, utilizando como referencia la definición de datos sensibles del Reglamento General de Protección de Datos. Esta tesis se basa en el uso de Procesamiento de Lenguaje Natural para identificar los atributos sensibles, y posteriormente utiliza el la plataforma de creación de anuncios de Facebook para recuperar el número de usuarios asignados con esta información sensible. Dos tercios de los usuarios de Facebook se ven afectados por el uso de datos personales sensibles que se les atribuyen. Además, la legislación parece no tener efecto en este uso de atributos sensibles por parte de Facebook, y presenta graves riesgos para los usuarios. (ii) Se modela cuál es el número de atributos que no identifican a priori personalmente al usuario y que aun así son suficientes para identificar de forma única a un individuo sobre una base de datos de miles de millones de usuarios, y se demuestra que llegar a un solo usuario es plausible incluso sin conocer datos que lo identifiquen personalmente de ellos mismos. Los resultados demuestran que 22 intereses al azar de un usuario son suficientes para identificarlo unívocamente con un 90% de probabilidad, y 4 si tomamos los menos populares. Por último, esta tesis se ha visto afectada por el estallido de la pandemia del COVID- 19, lo que ha contribuido al análisis de la evolución del mercado de la publicidad en línea con este periodo. La investigación demuestra que el mercado de la publicidad muestra una inelasticidad casi perfecta en la oferta y que cambió su composición debido a un cambio en el comportamiento en línea de los usuarios. También ilustra el potencial que tiene la utilización de los datos de los grandes servicios en línea, dado que ya tienen una alta tasa de adopción, y presenta un protocolo para la localización de contactos que han estado potencialmente expuestos a personas que direon positivo en COVID-19, en contraste con el fracaso de las nuevas aplicaciones de localización de contactos. En conclusión, la investigación de esta tesis muestra el impacto social y económico de la publicidad online y de los grandes servicios online en los usuarios. La metodología utilizada y desplegada sirve para poner de manifiesto y cuantificar los riesgos derivados de los datos personales en los servicios en línea. Presenta la necesidad de tales herramientas y metodologías en consonancia con la nueva legislación y los deseos de los usuarios. Siguiendo estas peticiones, en la búsqueda de transparencia y privacidad, esta tesis muestra soluciones y medidas fácilmente implementables para prevenir estos riesgos y capacitar al usuario para controlar su información personal.The web ecosystem is enormous, and overall it is sustained by an intangible attribute that mainly supports the majority of free services: the exploitation of personal information. Over the years, concerns on how services use personal data have increased and attracted the attention of media and users. This collection of personal information is the primary source of revenue on the Internet nowadays. Furthermore, on top of this, online advertising is the piece that supports it all. Without the existence of personal data in communion with online advertising, the Internet would probably not be the giant we know today. Online advertising is a very complex ecosystem in which multiple stakeholders take part. It is the motor that generates revenue on the web, and it has evolved in a few years to reach billions of users worldwide. While browsing, users generate valuable data about themselves that advertisers later use to offer them relevant products in which users could be interested. It is a two-way approach since advertisers pay intermediates to show ads to the public that is, in principle, most interested. However, this trading, sharing, and processing of personal data and behavior patterns, apart from opening up new advertising ways, expose users’ privacy. This incessant collection and commercialization of personal information usually fall behind an opaque wall, where the user often does not know what their data is used for. Privacy and transparency initiatives have increased over the years to empower the user in this business that moves billions of US dollars in revenue. Not surprisingly, after several scandals, such as the Facebook Cambridge Analytica scandal, businesses and regulators have joined forces to create transparency and protect users against the harmful practices derived from the use of their personal information. For instance, the General Data Protection Regulation (GDPR), is the most promising example of a data protection regulation, affecting all the member states of the European Union (EU), advocating for protecting users. The content of this thesis will use this legislation as a reference. For all these reasons, the purpose of this thesis is to provide tools and methodologies that reveal inappropriate uses of personal data by large companies in the online advertising ecosystem and create transparency among users, providing solutions to protect themselves. Thus, the content of this thesis offers design, analysis, and implementation of methodologies that measure online personal information’s social and economic impact on extensive Internet services. Mainly, it focuses on Facebook (FB), one of the largest social networks and services on the web, accounting with more than 2.8B Monthly Active Users (MAU) worldwide and generating only in online advertising revenue, more than $84B in 2020. First, this thesis presents a solution, in the form of a browser extension called Data Valuation Tool for Facebook users (FDVT), to provide users with a personalized, real-time estimation of the money they are generating for FB. By analyzing the number of ads and interactions in a session, the user gets information on their value within this social network. The add-on has had significant impact and adoption both by users, being installed more than 10k times since its public launch in October 2016, and media, appearing in more than 100 media outlets. Second, the study and research of the potential risks associated with processing users’ data should also follow the creation of these kinds of solutions. In this context, this thesis discovers and unveils striking results on the usage of personal information: (i) it quantifies the number of users affected by the usage of sensitive attributes used for advertising on FB, using as reference the definition of sensitive data from the GDPR. This thesis relies on the use of Natural Language Processing (NLP) to identify sensitive attributes, and it later uses the FB Ads Manager to retrieve the number of users assigned with this sensitive information. Two-thirds of FB users are affected by the use of sensitive personal data attributed to them. Moreover, the legislation seems not to affect this use of sensitive attributes from FB, and it presents severe risks to users. (ii) It models the number of non-Personal Identifiable Information (PII) attributes that are enough to uniquely identify an individual over a database of billions of users and proofs that reaching a single user is plausible even without knowing PII data of themselves. The results demonstrate that 22 interests at random from a user are enough to identify them uniquely with a 90% of probability, and 4 when taking the least popular ones. Finally, this thesis was affected by the outbreak of the COVID-19 pandemic what led to side contribute to the analysis of how the online advertising market evolved during this period. The research shows that the online advertising market shows an almost perfect inelasticity on supply and that it changed its composition due to a change in users’ online behavior. It also illustrates the potential of using data from large online services which already have a high adoption rate and presents a protocol for contact tracing individuals who have been potentially exposed to people who tested positive in COVID-19, in contrast to the failure of newly deployed contact tracing apps. In conclusion, the research for this thesis showcases the social and economic impact of online advertising and extensive online services on users. The methodology used and deployed is used to highlight and quantify the risks derived from personal data in online services. It presents the necessity of such tools and methodologies in line with new legislation and users’ desires. Following these requests, in the search for transparency and privacy, this thesis displays easy implementable solutions and measurements to prevent these risks and empower the user to control their personal information.This work was supported by the Ministerio de Educación, Cultura y Deporte, Spain, through the FPU Grant FPU16/05852, the Ministerio de Ciencia e Innovación, Spain, through the project ACHILLES Grant PID2019-104207RB-I00, the H2020 EU-Funded SMOOTH project under Grant 786741, and the H2020 EU-Funded PIMCITY project under Grant 871370.Programa de Doctorado en Ingeniería Telemática por la Universidad Carlos III de MadridPresidente: David Larrabeiti López.- Secretario: Gregorio Ignacio López López.- Vocal: Noel Cresp

    Operational Research: Methods and Applications

    Get PDF
    Throughout its history, Operational Research has evolved to include a variety of methods, models and algorithms that have been applied to a diverse and wide range of contexts. This encyclopedic article consists of two main sections: methods and applications. The first aims to summarise the up-to-date knowledge and provide an overview of the state-of-the-art methods and key developments in the various subdomains of the field. The second offers a wide-ranging list of areas where Operational Research has been applied. The article is meant to be read in a nonlinear fashion. It should be used as a point of reference or first-port-of-call for a diverse pool of readers: academics, researchers, students, and practitioners. The entries within the methods and applications sections are presented in alphabetical order. The authors dedicate this paper to the 2023 Turkey/Syria earthquake victims. We sincerely hope that advances in OR will play a role towards minimising the pain and suffering caused by this and future catastrophes

    Operational Research: Methods and Applications

    Get PDF
    Throughout its history, Operational Research has evolved to include a variety of methods, models and algorithms that have been applied to a diverse and wide range of contexts. This encyclopedic article consists of two main sections: methods and applications. The first aims to summarise the up-to-date knowledge and provide an overview of the state-of-the-art methods and key developments in the various subdomains of the field. The second offers a wide-ranging list of areas where Operational Research has been applied. The article is meant to be read in a nonlinear fashion. It should be used as a point of reference or first-port-of-call for a diverse pool of readers: academics, researchers, students, and practitioners. The entries within the methods and applications sections are presented in alphabetical order

    A decision aid for me, Neolithic man and other impaired decision makers

    Get PDF

    Application of multiplex bead serological assays to integrated monitoring of neglected tropical diseases

    Get PDF
    BACKGROUND: Neglected tropical diseases remain a significant burden to global health, despite substantial efforts among global governments and partnerships to control these diseases. In recent years, there has been considerable attention directed towards integrating control platforms across these geographically co-endemic diseases due to demonstrated resource efficacy and sustainability moving forward. As surveillance is a key component of disease control, effective surveillance tools are needed to support these integrated disease platforms. Serological multiplex bead assays are capable of monitoring numerous pathogens simultaneously by using antigens of specific pathogens. However, gaps in knowledge pertaining to the interpretation of serological data from these assays and their utility in public health surveillance necessitates further research to determine their capabilities in supporting concurrent surveillance of multiple diseases. This thesis aims to determine appropriate methods of interpreting serological data from multiplex bead assays and assess their utility in the context of public health surveillance of neglected tropical diseases. METHODS: Three datasets from Haiti (2015 Tracking Results Continuously Survey, n=4438 ; 2017 Artibonite Easy Access Group Survey, n= 6004 ; and 2017 Artibonite Community Household Survey; n=21222 ) and one dataset from Malaysia (2015 Sabah Household Cluster Survey, n=10100 ) were used in this thesis. Collectively, these datasets included antigens of twelve different pathogens that were analysed using serological multiplex bead assays . The first objective of this thesis was to identify existing methods of characterising serological data from neglected tropical diseases using multiplex bead assays. This was done through a systematic review of literature which examined existing and potential methods of characterizing antibody responses from these assays. Several of these methods were then applied to two case studies to evaluate potential implications pertaining to method choice on public health interpretation. The second objective was to assess the utility of multiplex bead assays for multi-disease surveillance. To do this, panels of diverse tropical disease antigens (encompassing twelve different pathogens collectively) from three datasets in Haiti and one dataset from Malaysia were used to analyse concurrent monitoring of multiple diseases and to assess demographic and spatial disease risk factors for co-endemic pathogens. The third 3 objective of this thesis was to determine the capacity of serological multiplex bead assays to support different sampling strategies. This was done by formally comparing outputs from serological multiplex bead assays from a community household active sampling strategy and an easy-access group convenient sampling strategy in Haiti. RESULTS: The review of literature revealed that serological data from multiplex bead assays are typically converted to seroprevalence for programme interpretation, however there is currently an absence of a standard approach to determine serological prevalence estimates. Instead, seven different approaches were identified in the literature. Comparing three different approaches resulted in varying disease prevalence estimates in both Haiti and Malaysia, suggesting potential impacts of classification approaches on postliminary programmatic interpretation in both case studies. Multiplex bead assays provided concurrent estimates of exposure to various pathogens simultaneously at the national and subnational levels of surveillance within Haitian and Malaysian study populations. Demographic and spatial data collected alongside serologic surveys determined several consistent risk factors across antigens assessed, including age, wealth, gender, and also allowed for visualization of any spatial trends in disease exposure in both settings. Using multiplex bead assays in two different sampling approaches demonstrated its capacity to support multi-disease surveillance in different sampling approaches. In comparing prevalence estimates between surveys, observed differences may be attributable to inherent biases in sampling populations and design. CONCLUSIONS:The research in this thesis contributes to the understanding of the utility of serological multiplex bead assays to support multi-disease surveillance and provides a foundation in the broader study of applying these platforms to neglected tropical disease control. This thesis also highlights the need to develop standardized approaches in sampling, laboratory protocols, and analysis for these platforms to ensure consistent and confident disease estimates and data reporting

    Clustering-based Algorithms for Big Data Computations

    Get PDF
    In the age of big data, the amount of information that applications need to process often exceeds the computational capabilities of single machines. To cope with this deluge of data, new computational models have been defined. The MapReduce model allows the development of distributed algorithms targeted at large clusters, where each machine can only store a small fraction of the data. In the streaming model a single processor processes on-the-fly an incoming stream of data, using only limited memory. The specific characteristics of these models combined with the necessity of processing very large datasets rule out, in many cases, the adoption of known algorithmic strategies, prompting the development of new ones. In this context, clustering, the process of grouping together elements according to some proximity measure, is a valuable tool, which allows to build succinct summaries of the input data. In this thesis we develop novel algorithms for some fundamental problems, where clustering is a key ingredient to cope with very large instances or is itself the ultimate target. First, we consider the problem of approximating the diameter of an undirected graph, a fundamental metric in graph analytics, for which the known exact algorithms are too costly to use for very large inputs. We develop a MapReduce algorithm for this problem which, for the important class of graphs of bounded doubling dimension, features a polylogarithmic approximation guarantee, uses linear memory and executes in a number of parallel rounds that can be made sublinear in the input graph's diameter. To the best of our knowledge, ours is the first parallel algorithm with these guarantees. Our algorithm leverages a novel clustering primitive to extract a concise summary of the input graph on which to compute the diameter approximation. We complement our theoretical analysis with an extensive experimental evaluation, finding that our algorithm features an approximation quality significantly better than the theoretical upper bound and high scalability. Next, we consider the problem of clustering uncertain graphs, that is, graphs where each edge has a probability of existence, specified as part of the input. These graphs, whose applications range from biology to privacy in social networks, have an exponential number of possible deterministic realizations, which impose a big-data perspective. We develop the first algorithms for clustering uncertain graphs with provable approximation guarantees which aim at maximizing the probability that nodes be connected to the centers of their assigned clusters. A preliminary suite of experiments, provides evidence that the quality of the clusterings returned by our algorithms compare very favorably with respect to previous approaches with no theoretical guarantees. Finally, we deal with the problem of diversity maximization, which is a fundamental primitive in big data analytics: given a set of points in a metric space we are asked to provide a small subset maximizing some notion of diversity. We provide efficient streaming and MapReduce algorithms with approximation guarantees that can be made arbitrarily close to the ones of the best sequential algorithms available. The algorithms crucially rely on the use of a k-center clustering primitive to extract a succinct summary of the data and their analysis is expressed in terms of the doubling dimension of the input point set. Moreover, unlike previously known algorithms, ours feature an interesting tradeoff between approximation quality and memory requirements. Our theoretical findings are supported by the first experimental analysis of diversity maximization algorithms in streaming and MapReduce, which highlights the tradeoffs of our algorithms on both real-world and synthetic datasets. Moreover, our algorithms exhibit good scalability, and a significantly better performance than the approaches proposed in previous works
    corecore