2,369 research outputs found
Cross-Modal Data Programming Enables Rapid Medical Machine Learning
Labeling training datasets has become a key barrier to building medical
machine learning models. One strategy is to generate training labels
programmatically, for example by applying natural language processing pipelines
to text reports associated with imaging studies. We propose cross-modal data
programming, which generalizes this intuitive strategy in a
theoretically-grounded way that enables simpler, clinician-driven input,
reduces required labeling time, and improves with additional unlabeled data. In
this approach, clinicians generate training labels for models defined over a
target modality (e.g. images or time series) by writing rules over an auxiliary
modality (e.g. text reports). The resulting technical challenge consists of
estimating the accuracies and correlations of these rules; we extend a recent
unsupervised generative modeling technique to handle this cross-modal setting
in a provably consistent way. Across four applications in radiography, computed
tomography, and electroencephalography, and using only several hours of
clinician time, our approach matches or exceeds the efficacy of
physician-months of hand-labeling with statistical significance, demonstrating
a fundamentally faster and more flexible way of building machine learning
models in medicine
Word Sense Disambiguation with LSTM: Do We Really Need 100 Billion Words?
Recently, Yuan et al. (2016) have shown the effectiveness of using Long
Short-Term Memory (LSTM) for performing Word Sense Disambiguation (WSD). Their
proposed technique outperformed the previous state-of-the-art with several
benchmarks, but neither the training data nor the source code was released.
This paper presents the results of a reproduction study of this technique using
only openly available datasets (GigaWord, SemCore, OMSTI) and software
(TensorFlow). From them, it emerged that state-of-the-art results can be
obtained with much less data than hinted by Yuan et al. All code and trained
models are made freely available
Design and Evaluation of a Collective IO Model for Loosely Coupled Petascale Programming
Loosely coupled programming is a powerful paradigm for rapidly creating
higher-level applications from scientific programs on petascale systems,
typically using scripting languages. This paradigm is a form of many-task
computing (MTC) which focuses on the passing of data between programs as
ordinary files rather than messages. While it has the significant benefits of
decoupling producer and consumer and allowing existing application programs to
be executed in parallel with no recoding, its typical implementation using
shared file systems places a high performance burden on the overall system and
on the user who will analyze and consume the downstream data. Previous efforts
have achieved great speedups with loosely coupled programs, but have done so
with careful manual tuning of all shared file system access. In this work, we
evaluate a prototype collective IO model for file-based MTC. The model enables
efficient and easy distribution of input data files to computing nodes and
gathering of output results from them. It eliminates the need for such manual
tuning and makes the programming of large-scale clusters using a loosely
coupled model easier. Our approach, inspired by in-memory approaches to
collective operations for parallel programming, builds on fast local file
systems to provide high-speed local file caches for parallel scripts, uses a
broadcast approach to handle distribution of common input data, and uses
efficient scatter/gather and caching techniques for input and output. We
describe the design of the prototype model, its implementation on the Blue
Gene/P supercomputer, and present preliminary measurements of its performance
on synthetic benchmarks and on a large-scale molecular dynamics application.Comment: IEEE Many-Task Computing on Grids and Supercomputers (MTAGS08) 200
Bivariate galaxy luminosity functions in the Sloan Digital Sky Survey
Bivariate luminosity functions (LFs) are computed for galaxies in the New York Value-Added Galaxy Catalogue, based on the Sloan Digital Sky Survey Data Release 4. The galaxy properties investigated are the morphological type, inverse concentration index, Sérsic index, absolute effective surface brightness (SB), reference frame colours, absolute radius, eClass spectral type, stellar mass and galaxy environment. The morphological sample is flux limited to galaxies with r < 15.9 and consists of 37 047 classifications to an rms accuracy of ± half a class in the sequence E, S0, Sa, Sb, Sc, Sd, Im. These were assigned by an artificial neural network, based on a training set of 645 eyeball classifications. The other samples use r < 17.77 with a median redshift of z∼ 0.08, and a limiting redshift of z < 0.15 to minimize the effects of evolution. Other cuts, for example in axis ratio, are made to minimize biases. A wealth of detail is seen, with clear variations between the LFs according to absolute magnitude and the second parameter. They are consistent with an early-type, bright, concentrated, red population and a late-type, faint, less concentrated, blue, star-forming population. This bimodality suggests two major underlying physical processes, which in agreement with previous authors we hypothesize to be merger and accretion, associated with the properties of bulges and discs, respectively. The bivariate luminosity–SB distribution is fit with the Chołoniewski function (a Schechter function in absolute magnitude and Gaussian in SB). The fit is found to be poor, as might be expected if there are two underlying processes
WiLiTV: A Low-Cost Wireless Framework for Live TV Services
With the evolution of HDTV and Ultra HDTV, the bandwidth requirement for
IP-based TV content is rapidly increasing. Consumers demand uninterrupted
service with a high Quality of Experience (QoE). Service providers are
constantly trying to differentiate themselves by innovating new ways of
distributing content more efficiently with lower cost and higher penetration.
In this work, we propose a cost-efficient wireless framework (WiLiTV) for
delivering live TV services, consisting of a mix of wireless access
technologies (e.g. Satellite, WiFi and LTE overlay links). In the proposed
architecture, live TV content is injected into the network at a few residential
locations using satellite dishes. The content is then further distributed to
other homes using a house-to-house WiFi network or via an overlay LTE network.
Our problem is to construct an optimal TV distribution network with the minimum
number of satellite injection points, while preserving the highest QoE, for
different neighborhood densities. We evaluate the framework using realistic
time-varying demand patterns and a diverse set of home location data. Our study
demonstrates that the architecture requires 75 - 90% fewer satellite injection
points, compared to traditional architectures. Furthermore, we show that most
cost savings can be obtained using simple and practical relay routing
solutions
Target Tracking in Confined Environments with Uncertain Sensor Positions
To ensure safety in confined environments such as mines or subway tunnels, a
(wireless) sensor network can be deployed to monitor various environmental
conditions. One of its most important applications is to track personnel,
mobile equipment and vehicles. However, the state-of-the-art algorithms assume
that the positions of the sensors are perfectly known, which is not necessarily
true due to imprecise placement and/or dropping of sensors. Therefore, we
propose an automatic approach for simultaneous refinement of sensors' positions
and target tracking. We divide the considered area in a finite number of cells,
define dynamic and measurement models, and apply a discrete variant of belief
propagation which can efficiently solve this high-dimensional problem, and
handle all non-Gaussian uncertainties expected in this kind of environments.
Finally, we use ray-tracing simulation to generate an artificial mine-like
environment and generate synthetic measurement data. According to our extensive
simulation study, the proposed approach performs significantly better than
standard Bayesian target tracking and localization algorithms, and provides
robustness against outliers.Comment: IEEE Transactions on Vehicular Technology, 201
Spleen histology in children with sickle cell disease and hereditary spherocytosis: Hints on the disease pathophysiology
open2Hereditary spherocytosis (HS) and sickle cell disease (SCD) are associated with splenomegaly and spleen dysfunction in pediatric patients. Scant data exist on possible correlations between spleen morphology and function in HS and SCD. This study aimed to assess the histological and morphometric features of HS and SCD spleens, in order to get possible correlations with disease pathophysiology. In a large series of spleens from SCD, HS and control patients the following parameters were considered: (i) macroscopic features; (ii) lymphoid follicle (LF) density; (iii) presence of peri-follicular marginal zones (MZs); (iv) presence of Gamna-Gandy bodies; (v) density of CD8-positive sinusoids; (vi) density of CD34-positive microvessels; (vii) presence/distribution of fibrosis and SMA-positive myoid cells; (viii) density of CD68-positive macrophages. SCD and HS spleens have similar macroscopic features. SCD spleens had lower LF density and fewer MZs than HS spleens and controls. SCD also showed lower CD8-positive sinusoid density, increased CD34-positive microvessel density and SMA-positive myoid cells, and higher prevalence of fibrosis and Gamna-Gandy bodies. HS had lower LF and CD8-positive sinusoid density than controls. No significant differences were noted in red pulp macrophages. By multivariate analysis, the majority of HS spleens clustered with controls, while SCD grouped separately. A multi-parametric score could predict the degree of spleen changes irrespective of the underlying disease. In conclusion, SCD spleens display greater histologic effacement than HS and SCD-related changes suggest impaired function due to vascular damage. These observations may contribute to guide the clinical management of patients.embargoed_20161128Alaggio, RitaAlaggio, Rita; Gamba, Piergiorgi
- …