111 research outputs found

    Interventions to mitigate the risks of COVID-19 for people experiencing homelessness, and their effectiveness: systematic review

    Get PDF
    ObjectivesPeople experiencing homelessness also experience poorer clinical outcomes of COVID-19. Various interventions were implemented for people experiencing homelessness in 2020-22 in different countries in response to varied national guidance to limit this impact of COVID-19. It is important to understand what was done and the effectiveness of such interventions. This systematic review aims to describe interventions to mitigate the risks of COVID-19 in people experiencing homelessness and their effectiveness.MethodsA protocol was developed and registered in PROSPERO. Nine databases were searched for studies on interventions to mitigate the impact of COVID-19 in people experiencing homelessness. Included studies were summarised with narrative synthesis.ResultsFrom 8233 references retrieved from the database searches and handsearching, 15 were included. There was a variety of interventions, including early identification of potential COVID-19 infections, provision of isolation space, healthcare support, and urgent provision of housing regardless of COVID-19 infection. ConclusionThe strategies identified were generally found to be effective, feasible and transferable. This review must be interpreted with caution due to the low volume of eligible studies, and low quality of the evidence available. <br/

    MethOSM: a methodology for computing composite indicators derived from OpenStreetMap data

    Get PDF
    The task of computing composite indicators to define and analyze complex social, economic, political, or environmental phenomena has traditionally been the exclusive competence of statistical offices. Nowadays, the availability of increasing volumes of data and the emergence of the open data movement have enabled individuals and businesses affordable access to all kinds of datasets that can be used as valuable input to compute indicators. OpenStreetMap (OSM) is a good example of this. It has been used as a baseline to compute indicators in areas where official data is scarce or difficult to access. Although the extraction and application of OSM data to compute indicators is an attractive proposition, this practice is by no means hassle-free. The use of OSM reveals a number of challenges that are usually addressed with ad-hoc and often overlapping solutions. In this context, this paper proposes MethOSM-a systematic methodology for computing indicators derived from OSM data. By applying MethOSM, the computation task is divided into four steps, with each step having a clear goal and a set of guidelines to apply. In this way, the methodology contributes to an effective and efficient use of OSM data for the purpose of computing indicators. To demonstrate its use, we apply MethOSM to a number of indicators used for real estate valuation of properties in Italy

    Crawling deep web entity pages

    Full text link
    Deep-web crawl is concerned with the problem of surfacing hid-den content behind search interfaces on the Web. While many deep-web sites maintain document-oriented textual content (e.g., Wikipedia, PubMed, Twitter, etc.), which has traditionally been the focus of the deep-web literature, we observe that a significant por-tion of deep-web sites, including almost all online shopping sites, curate structured entities as opposed to text documents. Although crawling such entity-oriented content is clearly useful for a variety of purposes, existing crawling techniques optimized for document oriented content are not best suited for entity-oriented sites. In this work, we describe a prototype system we have built that specializes in crawling entity-oriented deep-web sites. We propose techniques tailored to tackle important subproblems including query genera-tion, empty page filtering and URL deduplication in the specific context of entity oriented deep-web sites. These techniques are ex-perimentally evaluated and shown to be effective

    CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks

    Full text link
    Data quality affects machine learning (ML) model performances, and data scientists spend considerable amount of time on data cleaning before model training. However, to date, there does not exist a rigorous study on how exactly cleaning affects ML -- ML community usually focuses on developing ML algorithms that are robust to some particular noise types of certain distributions, while database (DB) community has been mostly studying the problem of data cleaning alone without considering how data is consumed by downstream ML analytics. We propose a CleanML study that systematically investigates the impact of data cleaning on ML classification tasks. The open-source and extensible CleanML study currently includes 14 real-world datasets with real errors, five common error types, seven different ML models, and multiple cleaning algorithms for each error type (including both commonly used algorithms in practice as well as state-of-the-art solutions in academic literature). We control the randomness in ML experiments using statistical hypothesis testing, and we also control false discovery rate in our experiments using the Benjamini-Yekutieli (BY) procedure. We analyze the results in a systematic way to derive many interesting and nontrivial observations. We also put forward multiple research directions for researchers.Comment: published in ICDE 202

    Counterfactual Memorization in Neural Language Models

    Full text link
    Modern neural language models that are widely used in various NLP tasks risk memorizing sensitive information from their training data. Understanding this memorization is important in real world applications and also from a learning-theoretical perspective. An open question in previous studies of language model memorization is how to filter out "common" memorization. In fact, most memorization criteria strongly correlate with the number of occurrences in the training set, capturing memorized familiar phrases, public knowledge, templated texts, or other repeated data. We formulate a notion of counterfactual memorization which characterizes how a model's predictions change if a particular document is omitted during training. We identify and study counterfactually-memorized training examples in standard text datasets. We estimate the influence of each memorized training example on the validation set and on generated texts, showing how this can provide direct evidence of the source of memorization at test time.Comment: NeurIPS 2023; 42 pages, 33 figure

    Generalized Lineage-Aware Temporal Windows: Supporting Outer and Anti Joins in Temporal-Probabilistic Databases

    Get PDF
    The result of a temporal-probabilistic (TP) join with negation includes, at each time point, the probability with which a tuple of a positive relation p{\bf p} matches none of the tuples in a negative relation n{\bf n}, for a given join condition θ\theta. TP outer and anti joins thus resemble the characteristics of relational outer and anti joins also in the case when there exist time points at which input tuples from p{\bf p} have non-zero probabilities to be truetrue and input tuples from n{\bf n} have non-zero probabilities to be falsefalse, respectively. For the computation of TP joins with negation, we introduce generalized lineage-aware temporal windows, a mechanism that binds an output interval to the lineages of all the matching valid tuples of each input relation. We group the windows of two TP relations into three disjoint sets based on the way attributes, lineage expressions and intervals are produced. We compute all windows in an incremental manner, and we show that pipelined computations allow for the direct integration of our approach into PostgreSQL. We thereby alleviate the prevalent redundancies in the interval computations of existing approaches, which is proven by an extensive experimental evaluation with real-world datasets

    A Textual Data-Oriented Method for Doctor Selection in Online Health Communities

    Get PDF
    As doctor–patient interactive platforms, online health communities (OHCs) offer patients massive information including doctor basic information and online patient reviews. However, how to develop a systematic framework for doctor selection in OHCs according to doctor basic information and online patient reviews is a challenged issue, which will be explored in this study. For doctor basic information, we define the quantification method and aggregate them to characterize relative influence of doctors. For online patient reviews, data analysis techniques (i.e., topics extraction and sentiment analysis) are used to mine the core attributes and evaluations. Subsequently, frequency weights and position weights are respectively determined by a frequency-oriented formula and a position score-based formula, which are integrated to obtain the final importance of attributes. Probabilistic linguistic-prospect theory-multiplicative multiobjective optimization by ratio analysis (PL-PT-MULTIMOORA) is proposed to analyze patient satisfactions on doctors. Finally, selection rules are made according to doctor influence and patient satisfactions so as to choose optimal and suboptimal doctors for rational or emotional patients. The designed textual data-driven method is successfully applied to analyze doctors from Haodf.com and some suggestions are given to help patients pick out optimal and suboptimal doctors.National Natural Science Foundation of China (NSFC) 72171182 71801175 71871171 72031009Project of Service Science and Innovation Key Laboratory of Sichuan Province KL2105Project of China Scholarship Council 202107000064 202007000143Andalusian government B-TIC-590-UGR20FEDER/Junta de Andalucia-Consejeria de Transformacion Economica, Industria, Conocimiento y Universidades P20 00673 PID2019-103880RB-I00MCIN/AEI/10.13039/50110001103

    Comparing Supervised Machine Learning Strategies and Linguistic Features to Search for Very Negative Opinions

    Get PDF
    In this paper, we examine the performance of several classifiers in the process of searching for very negative opinions. More precisely, we do an empirical study that analyzes the influence of three types of linguistic features (n-grams, word embeddings, and polarity lexicons) and their combinations when they are used to feed different supervised machine learning classifiers: Naive Bayes (NB), Decision Tree (DT), and Support Vector Machine (SVM). The experiments we have carried out show that SVM clearly outperforms NB and DT in all datasets by taking into account all features individually as well as their combinationsThis research was funded by project TelePares (MINECO, ref:FFI2014-51978-C2-1-R), and the Consellería de Cultura, Educación e Ordenación Universitaria (accreditation 2016-2019, ED431G/08) and the European Regional Development Fund (ERDF)S

    3cixty: Building comprehensive knowledge bases for city exploration

    Get PDF
    International audiencePlanning a visit to Expo Milano 2015 or simply touring in Milan are activities that require a certain amount of a priori knowledge of the city. In this paper, we present the process of building such comprehensive knowledge bases that contain descriptions of events and activities, places and sights, transportation facilities as well as social activities, collected from numerous static, near-and real-time local and global data providers, including hyper local sources such as the Expo Milano 2015 official services and several social media platforms. Entities in the 3cixty KB are deduplicated, interlinked and enriched using semantic technologies. The 3cixty KB is empowering the ExplorMI 360 multi-device application, which has been officially endorsed by the E015 Technical Management Board and has gained the patronage of Expo Milano in 2015, thus has offered a unique testing scenario for the 20 million visitors along the 6 months of the exhibit. In 2016-2017, new knowledge bases have been created for the cities of London, Madeira and Singapore, as well as for the entire French Cote d'Azur area. The 3cixty KB is accessible at https: //kb.3cixty.com/sparql while ExplorMI 360 at https://www.3cixty.com and in the Google Play Store and Apple App Store

    SECURING THE DATA STORAGE AND PROCESSING IN CLOUD COMPUTING ENVIRONMENT

    Get PDF
    Organizations increasingly utilize cloud computing architectures to reduce costs and en- ergy consumption both in the data warehouse and on mobile devices by better utilizing the computing resources available. However, the security and privacy issues with publicly available cloud computing infrastructures have not been studied to a sufficient depth for or- ganizations and individuals to be fully informed of the risks; neither are private nor public clouds prepared to properly secure their connections as middle-men between mobile de- vices which use encryption and external data providers which neglect to encrypt their data. Furthermore, cloud computing providers are not well informed of the risks associated with policy and techniques they could implement to mitigate those risks. In this dissertation, we present a new layered understanding of public cloud comput- ing. On the high level, we concentrate on the overall architecture and how information is processed and transmitted. The key idea is to secure information from outside attack and monitoring. We use techniques such as separating virtual machine roles, re-spawning virtual machines in high succession, and cryptography-based access control to achieve a high-level assurance of public cloud computing security and privacy. On the low level, we explore security and privacy issues on the memory management level. We present a mechanism for the prevention of automatic virtual machine memory guessing attacks
    • …
    corecore