3,105 research outputs found
Towards a New Science of a Clinical Data Intelligence
In this paper we define Clinical Data Intelligence as the analysis of data
generated in the clinical routine with the goal of improving patient care. We
define a science of a Clinical Data Intelligence as a data analysis that
permits the derivation of scientific, i.e., generalizable and reliable results.
We argue that a science of a Clinical Data Intelligence is sensible in the
context of a Big Data analysis, i.e., with data from many patients and with
complete patient information. We discuss that Clinical Data Intelligence
requires the joint efforts of knowledge engineering, information extraction
(from textual and other unstructured data), and statistics and statistical
machine learning. We describe some of our main results as conjectures and
relate them to a recently funded research project involving two major German
university hospitals.Comment: NIPS 2013 Workshop: Machine Learning for Clinical Data Analysis and
Healthcare, 201
TeD-SPAD: Temporal Distinctiveness for Self-supervised Privacy-preservation for video Anomaly Detection
Video anomaly detection (VAD) without human monitoring is a complex computer
vision task that can have a positive impact on society if implemented
successfully. While recent advances have made significant progress in solving
this task, most existing approaches overlook a critical real-world concern:
privacy. With the increasing popularity of artificial intelligence
technologies, it becomes crucial to implement proper AI ethics into their
development. Privacy leakage in VAD allows models to pick up and amplify
unnecessary biases related to people's personal information, which may lead to
undesirable decision making. In this paper, we propose TeD-SPAD, a
privacy-aware video anomaly detection framework that destroys visual private
information in a self-supervised manner. In particular, we propose the use of a
temporally-distinct triplet loss to promote temporally discriminative features,
which complements current weakly-supervised VAD methods. Using TeD-SPAD, we
achieve a positive trade-off between privacy protection and utility anomaly
detection performance on three popular weakly supervised VAD datasets:
UCF-Crime, XD-Violence, and ShanghaiTech. Our proposed anonymization model
reduces private attribute prediction by 32.25% while only reducing frame-level
ROC AUC on the UCF-Crime anomaly detection dataset by 3.69%. Project Page:
https://joefioresi718.github.io/TeD-SPAD_webpage/Comment: ICCV 202
Learning a Neural Semantic Parser from User Feedback
We present an approach to rapidly and easily build natural language
interfaces to databases for new domains, whose performance improves over time
based on user feedback, and requires minimal intervention. To achieve this, we
adapt neural sequence models to map utterances directly to SQL with its full
expressivity, bypassing any intermediate meaning representations. These models
are immediately deployed online to solicit feedback from real users to flag
incorrect queries. Finally, the popularity of SQL facilitates gathering
annotations for incorrect predictions using the crowd, which is directly used
to improve our models. This complete feedback loop, without intermediate
representations or database specific engineering, opens up new ways of building
high quality semantic parsers. Experiments suggest that this approach can be
deployed quickly for any new target domain, as we show by learning a semantic
parser for an online academic database from scratch.Comment: Accepted at ACL 201
Context-Aware Generative Adversarial Privacy
Preserving the utility of published datasets while simultaneously providing
provable privacy guarantees is a well-known challenge. On the one hand,
context-free privacy solutions, such as differential privacy, provide strong
privacy guarantees, but often lead to a significant reduction in utility. On
the other hand, context-aware privacy solutions, such as information theoretic
privacy, achieve an improved privacy-utility tradeoff, but assume that the data
holder has access to dataset statistics. We circumvent these limitations by
introducing a novel context-aware privacy framework called generative
adversarial privacy (GAP). GAP leverages recent advancements in generative
adversarial networks (GANs) to allow the data holder to learn privatization
schemes from the dataset itself. Under GAP, learning the privacy mechanism is
formulated as a constrained minimax game between two players: a privatizer that
sanitizes the dataset in a way that limits the risk of inference attacks on the
individuals' private variables, and an adversary that tries to infer the
private variables from the sanitized dataset. To evaluate GAP's performance, we
investigate two simple (yet canonical) statistical dataset models: (a) the
binary data model, and (b) the binary Gaussian mixture model. For both models,
we derive game-theoretically optimal minimax privacy mechanisms, and show that
the privacy mechanisms learned from data (in a generative adversarial fashion)
match the theoretically optimal ones. This demonstrates that our framework can
be easily applied in practice, even in the absence of dataset statistics.Comment: Improved version of a paper accepted by Entropy Journal, Special
Issue on Information Theory in Machine Learning and Data Scienc
Recommended from our members
Data standardization
With data rapidly becoming the lifeblood of the global economy, the ability to improve its use significantly affects both social and private welfare. Data standardization is key to facilitating and improving the use of data when data portability and interoperability are needed. Absent data standardization, a “Tower of Babel” of different databases may be created, limiting synergetic knowledge production. Based on interviews with data scientists, this Article identifies three main technological obstacles to data portability and interoperability: metadata uncertainties, data transfer obstacles, and missing data. It then explains how data standardization can remove at least some of these obstacles and lead to smoother data flows and better machine learning. The Article then identifies and analyzes additional effects of data standardization. As shown, data standardization has the potential to support a competitive and distributed data collection ecosystem and lead to easier policing in cases where rights are infringed or unjustified harms are created by data-fed algorithms. At the same time, increasing the scale and scope of data analysis can create negative externalities in the form of better profiling, increased harms to privacy, and cybersecurity harms. Standardization also has implications for investment and innovation, especially if lock-in to an inefficient standard occurs. The Article then explores whether market-led standardization initiatives can be relied upon to increase welfare, and the role governmental-facilitated data standardization should play, if at all
- …