397 research outputs found
On Collaborative Predictive Blacklisting
Collaborative predictive blacklisting (CPB) allows to forecast future attack
sources based on logs and alerts contributed by multiple organizations.
Unfortunately, however, research on CPB has only focused on increasing the
number of predicted attacks but has not considered the impact on false
positives and false negatives. Moreover, sharing alerts is often hindered by
confidentiality, trust, and liability issues, which motivates the need for
privacy-preserving approaches to the problem. In this paper, we present a
measurement study of state-of-the-art CPB techniques, aiming to shed light on
the actual impact of collaboration. To this end, we reproduce and measure two
systems: a non privacy-friendly one that uses a trusted coordinating party with
access to all alerts (Soldo et al., 2010) and a peer-to-peer one using
privacy-preserving data sharing (Freudiger et al., 2015). We show that, while
collaboration boosts the number of predicted attacks, it also yields high false
positives, ultimately leading to poor accuracy. This motivates us to present a
hybrid approach, using a semi-trusted central entity, aiming to increase
utility from collaboration while, at the same time, limiting information
disclosure and false positives. This leads to a better trade-off of true and
false positive rates, while at the same time addressing privacy concerns.Comment: A preliminary version of this paper appears in ACM SIGCOMM's Computer
Communication Review (Volume 48 Issue 5, October 2018). This is the full
versio
Differentially Private Mixture of Generative Neural Networks
Generative models are used in a wide range of applications building on large
amounts of contextually rich information. Due to possible privacy violations of
the individuals whose data is used to train these models, however, publishing
or sharing generative models is not always viable. In this paper, we present a
novel technique for privately releasing generative models and entire
high-dimensional datasets produced by these models. We model the generator
distribution of the training data with a mixture of generative neural
networks. These are trained together and collectively learn the generator
distribution of a dataset. Data is divided into clusters, using a novel
differentially private kernel -means, then each cluster is given to separate
generative neural networks, such as Restricted Boltzmann Machines or
Variational Autoencoders, which are trained only on their own cluster using
differentially private gradient descent. We evaluate our approach using the
MNIST dataset, as well as call detail records and transit datasets, showing
that it produces realistic synthetic samples, which can also be used to
accurately compute arbitrary number of counting queries.Comment: A shorter version of this paper appeared at the 17th IEEE
International Conference on Data Mining (ICDM 2017). This is the full
version, published in IEEE Transactions on Knowledge and Data Engineering
(TKDE
ReMasker: Imputing Tabular Data with Masked Autoencoding
We present ReMasker, a new method of imputing missing values in tabular data
by extending the masked autoencoding framework. Compared with prior work,
ReMasker is both simple -- besides the missing values (i.e., naturally masked),
we randomly ``re-mask'' another set of values, optimize the autoencoder by
reconstructing this re-masked set, and apply the trained model to predict the
missing values; and effective -- with extensive evaluation on benchmark
datasets, we show that ReMasker performs on par with or outperforms
state-of-the-art methods in terms of both imputation fidelity and utility under
various missingness settings, while its performance advantage often increases
with the ratio of missing data. We further explore theoretical justification
for its effectiveness, showing that ReMasker tends to learn
missingness-invariant representations of tabular data. Our findings indicate
that masked modeling represents a promising direction for further research on
tabular data imputation. The code is publicly available
Building and evaluating privacy-preserving data processing systems
Large-scale data processing prompts a number of important challenges, including guaranteeing that collected or published data is not misused, preventing disclosure of sensitive information, and deploying privacy protection frameworks that support usable and scalable services. In this dissertation, we study and build systems geared for privacy-friendly data processing, enabling computational scenarios and applications where potentially sensitive data can be used to extract useful knowledge, and which would otherwise be impossible without such strong privacy guarantees. For instance, we show how to privately and efficiently aggregate data from many sources and large streams, and how to use the aggregates to extract useful statistics and train simple machine learning models. We also present a novel technique for privately releasing generative machine learning models and entire high-dimensional datasets produced by these models. Finally, we demonstrate that the data used by participants in training generative and collaborative learning models may be vulnerable to inference attacks and discuss possible mitigation strategies
Novel homogeneous selective electrocatalysts for CO2 reduction: an electrochemical and computational study of cyclopentadienyl-phenylendiamino-cobalt complexes
Four cyclopentadienyl-phenylendiamino-cobalt complexes [CoCp(bqdi)] with different substituents (R) at the phenylene moiety (bqdi, I; o-perfluoro-bqdi, II; p-NO2-bqdi, III; p-COOH-bqdi, IV) have been studied with an aim to investigate their capability as catalysts for the CO2 reduction. These compounds were characterized by cyclic voltammetry measurements both under nitrogen and CO2 atmospheres, showing an increase in the cathodic current ranging from 3.36 (III) to 5.59 times (II) that of the measurement under nitrogen. Moreover, with the addition of water, the current enhancement in the presence of CO2 reaches 31.07 times that of the case of complex II. Interestingly, these complexes exhibit very good selectivity toward CO2 reduction irrespective of hydrogen even in the presence of water. The relative turnover frequencies were also estimated, given the values ranging from 3.23 (III) to 187.21 s−1 (II) in the presence of water. In addition, these results were analysed by means of density functional theory (DFT) calculations and Fukui functions analysis. In particular, DFT results clearly show effects of different substituents on the electrochemical properties of these compounds. Whereas, the Fukui functions analysis indicates that the most favourable positions for an electrophilic attack on the reduced complex are the nitrogen and cobalt atoms
Sulphur vs NH Group: Effects on the CO2 Electroreduction Capability of Phenylenediamine-Cp Cobalt Complexes
The cobalt complex (I) with cyclopentadienyl and 2-aminothiophenolate ligands was investigated as a homogeneous catalyst for electrochemical CO2 reduction. By comparing its behavior with an analogous complex with the phenylenediamine (II), the effect of sulfur atom as a substituent has been evaluated. As a result, a positive shift of the reduction potential and the reversibility of the corresponding redox process have been observed, also suggesting a higher stability of the compound with sulfur. Under anhydrous conditions, complex I showed a higher current enhancement in the presence of CO2 (9.41) in comparison with II (4.12). Moreover, the presence of only one -NH group in I explained the difference in the observed increases on the catalytic activity toward CO2 due to the presence of water, with current enhancements of 22.73 and 24.40 for I and II, respectively. DFT calculations confirmed the effect of sulfur on the lowering of the energy of the frontier orbitals of I, highlighted by electrochemical measurements. Furthermore, the condensed Fukui function f - values agreed very well with the current enhancement observed in the absence of water
On the Use of Tri-Stereo Pleiades Images for the Morphometric Measurement of Dolines in the Basaltic Plateau of Azrou (Middle Atlas, Morocco)
Hundreds of large and deep collapse dolines dot the surface of the Quaternary basaltic plateau of Azrou, in the Middle Atlas of Morocco. In the absence of detailed topographic maps, the morphometric study of such a large number of features requires the use of remote sensing techniques. We present the processing, extraction, and validation of depth measurements of 89 dolines using tri-stereo Pleiades images acquired in 2018–2019 (the European Space Agency (ESA) © CNES 2018, distributed by Airbus DS). Satellite image-derived DEMs were field-verified using traditional mapping techniques, which showed a very good agreement between field and remote sensing measures. The high resolution of these tri-stereo images allowed to automatically generate accurate morphometric datasets not only regarding the planimetric parameters of the dolines (diameters, contours, orientation of long axes), but also for what concerns their depth and altimetric profiles. Our study demonstrates the potential of using these types of images on rugged morphologies and for the measurement of steep depressions, where traditional remote sensing techniques may be hindered by shadow zones and blind portions. Tri-stereo images might also be suitable for the measurement of deep and steep depressions (skylights and collapses) on Martian and Lunar lava flows, suitable targets for future planetary cave exploration
Evaluating Privacy Leakage in Split Learning
Privacy-Preserving machine learning (PPML) can help us train and deploy
models that utilize private information. In particular, on-device machine
learning allows us to avoid sharing raw data with a third-party server during
inference. On-device models are typically less accurate when compared to their
server counterparts due to the fact that (1) they typically only rely on a
small set of on-device features and (2) they need to be small enough to run
efficiently on end-user devices. Split Learning (SL) is a promising approach
that can overcome these limitations. In SL, a large machine learning model is
divided into two parts, with the bigger part residing on the server side and a
smaller part executing on-device, aiming to incorporate the private features.
However, end-to-end training of such models requires exchanging gradients at
the cut layer, which might encode private features or labels. In this paper, we
provide insights into potential privacy risks associated with SL. Furthermore,
we also investigate the effectiveness of various mitigation strategies. Our
results indicate that the gradients significantly improve the attackers'
effectiveness in all tested datasets reaching almost perfect reconstruction
accuracy for some features. However, a small amount of differential privacy
(DP) can effectively mitigate this risk without causing significant training
degradation.Comment: 10 page
Moving towards happiness? Understanding travel moods through twitter data in Turin
The paper will address the following questions: does urban mobility matter for health, and mental health in particular? How does each transport mode relate to our level of stress/happiness? A previous study conducted on Turin (Melis et al. 2015) showed that among indicators related to urban structure and social composition, ‘accessibility by public transport’ seems to be the one with strongest relation with mental health (depression) outcomes. Starting from this results, we decided to further explore this association through the use of data from social media. Recent trends in the use of social networks have opened up new opportunities in the field of urban and transport studies: the great amount of data coming from Twitter is an example, providing easily available, often geo-referenced, marginally costly, datasets offering new insights on individual and collective life. The accuracy and reliability, as well as representativeness of the results coming from the use of this new source of data in the mobility and planning field is undoubtedly growing. The project uses Twitter data collected for the metropolitan area of Turin (IT) and analyses it using a Semantic Analysis algorithm to show spatiotemporal levels of happiness (valence) of users, related to the transport mode they have been using. Geographic Information Systems (GIS) and spatial analysis techniques are then used to visualize spatial patterns and associations among happiness levels and contextual variables, such as land-use. From a methodological point of view, results can be compared to research conducted on US cities by Flint University (Rybarczyk and Banerjee 2015), as the method used is the same. The purpose of the study is exploratory, in order to understand which use can be done of such a rich data source as social media information. Therefore, the results may be used to promote the use of social media data by transportation planners and public health officials for developing more effective transportation plans and policies, as well as to understand the degree of satisfaction/stress linked to different transport modes
- …