111 research outputs found
Interventions to mitigate the risks of COVID-19 for people experiencing homelessness, and their effectiveness: systematic review
ObjectivesPeople experiencing homelessness also experience poorer clinical outcomes of COVID-19. Various interventions were implemented for people experiencing homelessness in 2020-22 in different countries in response to varied national guidance to limit this impact of COVID-19. It is important to understand what was done and the effectiveness of such interventions. This systematic review aims to describe interventions to mitigate the risks of COVID-19 in people experiencing homelessness and their effectiveness.MethodsA protocol was developed and registered in PROSPERO. Nine databases were searched for studies on interventions to mitigate the impact of COVID-19 in people experiencing homelessness. Included studies were summarised with narrative synthesis.ResultsFrom 8233 references retrieved from the database searches and handsearching, 15 were included. There was a variety of interventions, including early identification of potential COVID-19 infections, provision of isolation space, healthcare support, and urgent provision of housing regardless of COVID-19 infection. ConclusionThe strategies identified were generally found to be effective, feasible and transferable. This review must be interpreted with caution due to the low volume of eligible studies, and low quality of the evidence available. <br/
MethOSM: a methodology for computing composite indicators derived from OpenStreetMap data
The task of computing composite indicators to define and analyze complex social, economic, political, or environmental phenomena has traditionally been the exclusive competence of statistical offices. Nowadays, the availability of increasing volumes of data and the emergence of the open data movement have enabled individuals and businesses affordable access to all kinds of datasets that can be used as valuable input to compute indicators. OpenStreetMap (OSM) is a good example of this. It has been used as a baseline to compute indicators in areas where official data is scarce or difficult to access. Although the extraction and application of OSM data to compute indicators is an attractive proposition, this practice is by no means hassle-free. The use of OSM reveals a number of challenges that are usually addressed with ad-hoc and often overlapping solutions. In this context, this paper proposes MethOSM-a systematic methodology for computing indicators derived from OSM data. By applying MethOSM, the computation task is divided into four steps, with each step having a clear goal and a set of guidelines to apply. In this way, the methodology contributes to an effective and efficient use of OSM data for the purpose of computing indicators. To demonstrate its use, we apply MethOSM to a number of indicators used for real estate valuation of properties in Italy
Crawling deep web entity pages
Deep-web crawl is concerned with the problem of surfacing hid-den content behind search interfaces on the Web. While many deep-web sites maintain document-oriented textual content (e.g., Wikipedia, PubMed, Twitter, etc.), which has traditionally been the focus of the deep-web literature, we observe that a significant por-tion of deep-web sites, including almost all online shopping sites, curate structured entities as opposed to text documents. Although crawling such entity-oriented content is clearly useful for a variety of purposes, existing crawling techniques optimized for document oriented content are not best suited for entity-oriented sites. In this work, we describe a prototype system we have built that specializes in crawling entity-oriented deep-web sites. We propose techniques tailored to tackle important subproblems including query genera-tion, empty page filtering and URL deduplication in the specific context of entity oriented deep-web sites. These techniques are ex-perimentally evaluated and shown to be effective
CleanML: A Study for Evaluating the Impact of Data Cleaning on ML Classification Tasks
Data quality affects machine learning (ML) model performances, and data
scientists spend considerable amount of time on data cleaning before model
training. However, to date, there does not exist a rigorous study on how
exactly cleaning affects ML -- ML community usually focuses on developing ML
algorithms that are robust to some particular noise types of certain
distributions, while database (DB) community has been mostly studying the
problem of data cleaning alone without considering how data is consumed by
downstream ML analytics. We propose a CleanML study that systematically
investigates the impact of data cleaning on ML classification tasks. The
open-source and extensible CleanML study currently includes 14 real-world
datasets with real errors, five common error types, seven different ML models,
and multiple cleaning algorithms for each error type (including both commonly
used algorithms in practice as well as state-of-the-art solutions in academic
literature). We control the randomness in ML experiments using statistical
hypothesis testing, and we also control false discovery rate in our experiments
using the Benjamini-Yekutieli (BY) procedure. We analyze the results in a
systematic way to derive many interesting and nontrivial observations. We also
put forward multiple research directions for researchers.Comment: published in ICDE 202
Counterfactual Memorization in Neural Language Models
Modern neural language models that are widely used in various NLP tasks risk
memorizing sensitive information from their training data. Understanding this
memorization is important in real world applications and also from a
learning-theoretical perspective. An open question in previous studies of
language model memorization is how to filter out "common" memorization. In
fact, most memorization criteria strongly correlate with the number of
occurrences in the training set, capturing memorized familiar phrases, public
knowledge, templated texts, or other repeated data. We formulate a notion of
counterfactual memorization which characterizes how a model's predictions
change if a particular document is omitted during training. We identify and
study counterfactually-memorized training examples in standard text datasets.
We estimate the influence of each memorized training example on the validation
set and on generated texts, showing how this can provide direct evidence of the
source of memorization at test time.Comment: NeurIPS 2023; 42 pages, 33 figure
Generalized Lineage-Aware Temporal Windows: Supporting Outer and Anti Joins in Temporal-Probabilistic Databases
The result of a temporal-probabilistic (TP) join with negation includes, at
each time point, the probability with which a tuple of a positive relation
matches none of the tuples in a negative relation , for a
given join condition . TP outer and anti joins thus resemble the
characteristics of relational outer and anti joins also in the case when there
exist time points at which input tuples from have non-zero
probabilities to be and input tuples from have non-zero
probabilities to be , respectively. For the computation of TP joins with
negation, we introduce generalized lineage-aware temporal windows, a mechanism
that binds an output interval to the lineages of all the matching valid tuples
of each input relation. We group the windows of two TP relations into three
disjoint sets based on the way attributes, lineage expressions and intervals
are produced. We compute all windows in an incremental manner, and we show that
pipelined computations allow for the direct integration of our approach into
PostgreSQL. We thereby alleviate the prevalent redundancies in the interval
computations of existing approaches, which is proven by an extensive
experimental evaluation with real-world datasets
A Textual Data-Oriented Method for Doctor Selection in Online Health Communities
As doctor–patient interactive platforms, online health communities (OHCs) offer patients
massive information including doctor basic information and online patient reviews. However, how to
develop a systematic framework for doctor selection in OHCs according to doctor basic information
and online patient reviews is a challenged issue, which will be explored in this study. For doctor
basic information, we define the quantification method and aggregate them to characterize relative
influence of doctors. For online patient reviews, data analysis techniques (i.e., topics extraction and
sentiment analysis) are used to mine the core attributes and evaluations. Subsequently, frequency
weights and position weights are respectively determined by a frequency-oriented formula and
a position score-based formula, which are integrated to obtain the final importance of attributes.
Probabilistic linguistic-prospect theory-multiplicative multiobjective optimization by ratio analysis
(PL-PT-MULTIMOORA) is proposed to analyze patient satisfactions on doctors. Finally, selection
rules are made according to doctor influence and patient satisfactions so as to choose optimal and
suboptimal doctors for rational or emotional patients. The designed textual data-driven method is
successfully applied to analyze doctors from Haodf.com and some suggestions are given to help
patients pick out optimal and suboptimal doctors.National Natural Science Foundation of China (NSFC) 72171182
71801175
71871171
72031009Project of Service Science and Innovation Key Laboratory of Sichuan Province KL2105Project of China Scholarship Council 202107000064
202007000143Andalusian government B-TIC-590-UGR20FEDER/Junta de Andalucia-Consejeria de Transformacion Economica, Industria, Conocimiento y Universidades
P20 00673
PID2019-103880RB-I00MCIN/AEI/10.13039/50110001103
Comparing Supervised Machine Learning Strategies and Linguistic Features to Search for Very Negative Opinions
In this paper, we examine the performance of several classifiers in the process of searching for very negative opinions. More precisely, we do an empirical study that analyzes the influence of three types of linguistic features (n-grams, word embeddings, and polarity lexicons) and their combinations when they are used to feed different supervised machine learning classifiers: Naive Bayes (NB), Decision Tree (DT), and Support Vector Machine (SVM). The experiments we have carried out show that SVM clearly outperforms NB and DT in all datasets by taking into account all features individually as well as their combinationsThis research was funded by project TelePares (MINECO, ref:FFI2014-51978-C2-1-R), and the ConsellerÃa de Cultura, Educación e Ordenación Universitaria (accreditation 2016-2019, ED431G/08) and the European Regional Development Fund (ERDF)S
3cixty: Building comprehensive knowledge bases for city exploration
International audiencePlanning a visit to Expo Milano 2015 or simply touring in Milan are activities that require a certain amount of a priori knowledge of the city. In this paper, we present the process of building such comprehensive knowledge bases that contain descriptions of events and activities, places and sights, transportation facilities as well as social activities, collected from numerous static, near-and real-time local and global data providers, including hyper local sources such as the Expo Milano 2015 official services and several social media platforms. Entities in the 3cixty KB are deduplicated, interlinked and enriched using semantic technologies. The 3cixty KB is empowering the ExplorMI 360 multi-device application, which has been officially endorsed by the E015 Technical Management Board and has gained the patronage of Expo Milano in 2015, thus has offered a unique testing scenario for the 20 million visitors along the 6 months of the exhibit. In 2016-2017, new knowledge bases have been created for the cities of London, Madeira and Singapore, as well as for the entire French Cote d'Azur area. The 3cixty KB is accessible at https: //kb.3cixty.com/sparql while ExplorMI 360 at https://www.3cixty.com and in the Google Play Store and Apple App Store
SECURING THE DATA STORAGE AND PROCESSING IN CLOUD COMPUTING ENVIRONMENT
Organizations increasingly utilize cloud computing architectures to reduce costs and en- ergy consumption both in the data warehouse and on mobile devices by better utilizing the computing resources available. However, the security and privacy issues with publicly available cloud computing infrastructures have not been studied to a sufficient depth for or- ganizations and individuals to be fully informed of the risks; neither are private nor public clouds prepared to properly secure their connections as middle-men between mobile de- vices which use encryption and external data providers which neglect to encrypt their data. Furthermore, cloud computing providers are not well informed of the risks associated with policy and techniques they could implement to mitigate those risks.
In this dissertation, we present a new layered understanding of public cloud comput- ing. On the high level, we concentrate on the overall architecture and how information is processed and transmitted. The key idea is to secure information from outside attack and monitoring. We use techniques such as separating virtual machine roles, re-spawning virtual machines in high succession, and cryptography-based access control to achieve a high-level assurance of public cloud computing security and privacy. On the low level, we explore security and privacy issues on the memory management level. We present a mechanism for the prevention of automatic virtual machine memory guessing attacks
- …