25,294 research outputs found

    Cross-Modal Data Programming Enables Rapid Medical Machine Learning

    Full text link
    Labeling training datasets has become a key barrier to building medical machine learning models. One strategy is to generate training labels programmatically, for example by applying natural language processing pipelines to text reports associated with imaging studies. We propose cross-modal data programming, which generalizes this intuitive strategy in a theoretically-grounded way that enables simpler, clinician-driven input, reduces required labeling time, and improves with additional unlabeled data. In this approach, clinicians generate training labels for models defined over a target modality (e.g. images or time series) by writing rules over an auxiliary modality (e.g. text reports). The resulting technical challenge consists of estimating the accuracies and correlations of these rules; we extend a recent unsupervised generative modeling technique to handle this cross-modal setting in a provably consistent way. Across four applications in radiography, computed tomography, and electroencephalography, and using only several hours of clinician time, our approach matches or exceeds the efficacy of physician-months of hand-labeling with statistical significance, demonstrating a fundamentally faster and more flexible way of building machine learning models in medicine

    Measuring Online Social Bubbles

    Full text link
    Social media have quickly become a prevalent channel to access information, spread ideas, and influence opinions. However, it has been suggested that social and algorithmic filtering may cause exposure to less diverse points of view, and even foster polarization and misinformation. Here we explore and validate this hypothesis quantitatively for the first time, at the collective and individual levels, by mining three massive datasets of web traffic, search logs, and Twitter posts. Our analysis shows that collectively, people access information from a significantly narrower spectrum of sources through social media and email, compared to search. The significance of this finding for individual exposure is revealed by investigating the relationship between the diversity of information sources experienced by users at the collective and individual level. There is a strong correlation between collective and individual diversity, supporting the notion that when we use social media we find ourselves inside "social bubbles". Our results could lead to a deeper understanding of how technology biases our exposure to new information

    Prochlo: Strong Privacy for Analytics in the Crowd

    Full text link
    The large-scale monitoring of computer users' software activities has become commonplace, e.g., for application telemetry, error reporting, or demographic profiling. This paper describes a principled systems architecture---Encode, Shuffle, Analyze (ESA)---for performing such monitoring with high utility while also protecting user privacy. The ESA design, and its Prochlo implementation, are informed by our practical experiences with an existing, large deployment of privacy-preserving software monitoring. (cont.; see the paper

    The Neurocognitive Process of Digital Radicalization: A Theoretical Model and Analytical Framework

    Get PDF
    Recent studies suggest that empathy induced by narrative messages can effectively facilitate persuasion and reduce psychological reactance. Although limited, emerging research on the etiology of radical political behavior has begun to explore the role of narratives in shaping an individual’s beliefs, attitudes, and intentions that culminate in radicalization. The existing studies focus exclusively on the influence of narrative persuasion on an individual, but they overlook the necessity of empathy and that in the absence of empathy, persuasion is not salient. We argue that terrorist organizations are strategic in cultivating empathetic-persuasive messages using audiovisual materials, and disseminating their message within the digital medium. Therefore, in this paper we propose a theoretical model and analytical framework capable of helping us better understand the neurocognitive process of digital radicalization

    Heliophysics Event Knowledgebase for the Solar Dynamics Observatory and Beyond

    Get PDF
    The immense volume of data generated by the suite of instruments on SDO requires new tools for efficient identifying and accessing data that is most relevant to research investigations. We have developed the Heliophysics Events Knowledgebase (HEK) to fill this need. The HEK system combines automated data mining using feature-detection methods and high-performance visualization systems for data markup. In addition, web services and clients are provided for searching the resulting metadata, reviewing results, and efficiently accessing the data. We review these components and present examples of their use with SDO data.Comment: 17 pages, 4 figure

    The SOAR Gravitational Arc Survey - I: Survey overview and photometric catalogs

    Get PDF
    We present the first results of the SOAR (Southern Astrophysical Research) Gravitational Arc Survey (SOGRAS). The survey imaged 47 clusters in two redshift intervals centered at z=0.27z=0.27 and z=0.55z=0.55, targeting the richest clusters in each interval. Images were obtained in the g′g', r′r' and i′i' bands using the SOAR Optical Imager (SOI), with a median seeing of 0.83, 0.76 and 0.71 arcsec, respectively, in these filters. Most of the survey clusters are located within the Sloan Digital Sky Survey (SDSS) Stripe 82 region and all of them are in the SDSS footprint. Photometric calibration was therefore performed using SDSS stars located in our SOI fields. We reached for galaxies in all fields the detection limits of g∼23.5g \sim 23.5, r∼23r \sim 23 and i∼22.5i \sim 22.5 for a signal-to-noise ratio (S/N) = 3. As a by-product of the image processing, we generated a source catalogue with 19760 entries, the vast majority of which are galaxies, where we list their positions, magnitudes and shape parameters. We compared our galaxy shape measurements to those of local galaxies and concluded that they were not strongly affected by seeing. From the catalogue data, we are able to identify a red sequence of galaxies in most clusters in the lower zz range. We found 16 gravitational arc candidates around 8 clusters in our sample. They tend to be bluer than the central galaxies in the lensing cluster. A preliminary analysis indicates that ∼10\sim 10% of the clusters have arcs around them, with a possible indication of a larger efficiency associated to the high-zz systems when compared to the low-zz ones. Deeper follow-up images with Gemini strengthen the case for the strong lensing nature of the candidates found in this survey.Comment: 17 pages, 11 figures (most of them multi-panel) MNRAS (2013

    Transparent government, not transparent citizens: a report on privacy and transparency for the Cabinet Office

    No full text
    1. Privacy is extremely important to transparency. The political legitimacy of a transparency programme will depend crucially on its ability to retain public confidence. Privacy protection should therefore be embedded in any transparency programme, rather than bolted on as an afterthought. 2. Privacy and transparency are compatible, as long as the former is carefully protected and considered at every stage. 3. Under the current transparency regime, in which public data is specifically understood not to include personal data, most data releases will not raise privacy concerns. However, some will, especially as we move toward a more demand-driven scheme. 4. Discussion about deanonymisation has been driven largely by legal considerations, with a consequent neglect of the input of the technical community. 5. There are no complete legal or technical fixes to the deanonymisation problem. We should continue to anonymise sensitive data, being initially cautious about releasing such data under the Open Government Licence while we continue to take steps to manage and research the risks of deanonymisation. Further investigation to determine the level of risk would be very welcome. 6. There should be a focus on procedures to output an auditable debate trail. Transparency about transparency – metatransparency – is essential for preserving trust and confidence. Fourteen recommendations are made to address these conclusions
    • …
    corecore