25,294 research outputs found
Cross-Modal Data Programming Enables Rapid Medical Machine Learning
Labeling training datasets has become a key barrier to building medical
machine learning models. One strategy is to generate training labels
programmatically, for example by applying natural language processing pipelines
to text reports associated with imaging studies. We propose cross-modal data
programming, which generalizes this intuitive strategy in a
theoretically-grounded way that enables simpler, clinician-driven input,
reduces required labeling time, and improves with additional unlabeled data. In
this approach, clinicians generate training labels for models defined over a
target modality (e.g. images or time series) by writing rules over an auxiliary
modality (e.g. text reports). The resulting technical challenge consists of
estimating the accuracies and correlations of these rules; we extend a recent
unsupervised generative modeling technique to handle this cross-modal setting
in a provably consistent way. Across four applications in radiography, computed
tomography, and electroencephalography, and using only several hours of
clinician time, our approach matches or exceeds the efficacy of
physician-months of hand-labeling with statistical significance, demonstrating
a fundamentally faster and more flexible way of building machine learning
models in medicine
Measuring Online Social Bubbles
Social media have quickly become a prevalent channel to access information,
spread ideas, and influence opinions. However, it has been suggested that
social and algorithmic filtering may cause exposure to less diverse points of
view, and even foster polarization and misinformation. Here we explore and
validate this hypothesis quantitatively for the first time, at the collective
and individual levels, by mining three massive datasets of web traffic, search
logs, and Twitter posts. Our analysis shows that collectively, people access
information from a significantly narrower spectrum of sources through social
media and email, compared to search. The significance of this finding for
individual exposure is revealed by investigating the relationship between the
diversity of information sources experienced by users at the collective and
individual level. There is a strong correlation between collective and
individual diversity, supporting the notion that when we use social media we
find ourselves inside "social bubbles". Our results could lead to a deeper
understanding of how technology biases our exposure to new information
Prochlo: Strong Privacy for Analytics in the Crowd
The large-scale monitoring of computer users' software activities has become
commonplace, e.g., for application telemetry, error reporting, or demographic
profiling. This paper describes a principled systems architecture---Encode,
Shuffle, Analyze (ESA)---for performing such monitoring with high utility while
also protecting user privacy. The ESA design, and its Prochlo implementation,
are informed by our practical experiences with an existing, large deployment of
privacy-preserving software monitoring.
(cont.; see the paper
The Neurocognitive Process of Digital Radicalization: A Theoretical Model and Analytical Framework
Recent studies suggest that empathy induced by narrative messages can effectively facilitate persuasion and reduce psychological reactance. Although limited, emerging research on the etiology of radical political behavior has begun to explore the role of narratives in shaping an individual’s beliefs, attitudes, and intentions that culminate in radicalization. The existing studies focus exclusively on the influence of narrative persuasion on an individual, but they overlook the necessity of empathy and that in the absence of empathy, persuasion is not salient. We argue that terrorist organizations are strategic in cultivating empathetic-persuasive messages using audiovisual materials, and disseminating their message within the digital medium. Therefore, in this paper we propose a theoretical model and analytical framework capable of helping us better understand the neurocognitive process of digital radicalization
Heliophysics Event Knowledgebase for the Solar Dynamics Observatory and Beyond
The immense volume of data generated by the suite of instruments on SDO
requires new tools for efficient identifying and accessing data that is most
relevant to research investigations. We have developed the Heliophysics Events
Knowledgebase (HEK) to fill this need. The HEK system combines automated data
mining using feature-detection methods and high-performance visualization
systems for data markup. In addition, web services and clients are provided for
searching the resulting metadata, reviewing results, and efficiently accessing
the data. We review these components and present examples of their use with SDO
data.Comment: 17 pages, 4 figure
The SOAR Gravitational Arc Survey - I: Survey overview and photometric catalogs
We present the first results of the SOAR (Southern Astrophysical Research)
Gravitational Arc Survey (SOGRAS). The survey imaged 47 clusters in two
redshift intervals centered at and , targeting the richest
clusters in each interval. Images were obtained in the , and
bands using the SOAR Optical Imager (SOI), with a median seeing of 0.83, 0.76
and 0.71 arcsec, respectively, in these filters. Most of the survey clusters
are located within the Sloan Digital Sky Survey (SDSS) Stripe 82 region and all
of them are in the SDSS footprint. Photometric calibration was therefore
performed using SDSS stars located in our SOI fields. We reached for galaxies
in all fields the detection limits of , and for a signal-to-noise ratio (S/N) = 3. As a by-product of the image
processing, we generated a source catalogue with 19760 entries, the vast
majority of which are galaxies, where we list their positions, magnitudes and
shape parameters. We compared our galaxy shape measurements to those of local
galaxies and concluded that they were not strongly affected by seeing. From the
catalogue data, we are able to identify a red sequence of galaxies in most
clusters in the lower range. We found 16 gravitational arc candidates
around 8 clusters in our sample. They tend to be bluer than the central
galaxies in the lensing cluster. A preliminary analysis indicates that of the clusters have arcs around them, with a possible indication of a
larger efficiency associated to the high- systems when compared to the
low- ones. Deeper follow-up images with Gemini strengthen the case for the
strong lensing nature of the candidates found in this survey.Comment: 17 pages, 11 figures (most of them multi-panel) MNRAS (2013
Transparent government, not transparent citizens: a report on privacy and transparency for the Cabinet Office
1. Privacy is extremely important to transparency. The political legitimacy of a transparency programme will depend crucially on its ability to retain public confidence. Privacy protection should therefore be embedded in any transparency programme, rather than bolted on as an afterthought. 2. Privacy and transparency are compatible, as long as the former is carefully protected and considered at every stage. 3. Under the current transparency regime, in which public data is specifically understood not to include personal data, most data releases will not raise privacy concerns. However, some will, especially as we move toward a more demand-driven scheme. 4. Discussion about deanonymisation has been driven largely by legal considerations, with a consequent neglect of the input of the technical community. 5. There are no complete legal or technical fixes to the deanonymisation problem. We should continue to anonymise sensitive data, being initially cautious about releasing such data under the Open Government Licence while we continue to take steps to manage and research the risks of deanonymisation. Further investigation to determine the level of risk would be very welcome. 6. There should be a focus on procedures to output an auditable debate trail. Transparency about transparency – metatransparency – is essential for preserving trust and confidence. Fourteen recommendations are made to address these conclusions
- …