356 research outputs found
Redundant disk arrays: Reliable, parallel secondary storage
During the past decade, advances in processor and memory technology have given rise to increases in computational performance that far outstrip increases in the performance of secondary storage technology. Coupled with emerging small-disk technology, disk arrays provide the cost, volume, and capacity of current disk subsystems, by leveraging parallelism, many times their performance. Unfortunately, arrays of small disks may have much higher failure rates than the single large disks they replace. Redundant arrays of inexpensive disks (RAID) use simple redundancy schemes to provide high data reliability. The data encoding, performance, and reliability of redundant disk arrays are investigated. Organizing redundant data into a disk array is treated as a coding problem. Among alternatives examined, codes as simple as parity are shown to effectively correct single, self-identifying disk failures
Structure-Aware Dynamic Scheduler for Parallel Machine Learning
Training large machine learning (ML) models with many variables or parameters
can take a long time if one employs sequential procedures even with stochastic
updates. A natural solution is to turn to distributed computing on a cluster;
however, naive, unstructured parallelization of ML algorithms does not usually
lead to a proportional speedup and can even result in divergence, because
dependencies between model elements can attenuate the computational gains from
parallelization and compromise correctness of inference. Recent efforts toward
this issue have benefited from exploiting the static, a priori block structures
residing in ML algorithms. In this paper, we take this path further by
exploring the dynamic block structures and workloads therein present during ML
program execution, which offers new opportunities for improving convergence,
correctness, and load balancing in distributed ML. We propose and showcase a
general-purpose scheduler, STRADS, for coordinating distributed updates in ML
algorithms, which harnesses the aforementioned opportunities in a systematic
way. We provide theoretical guarantees for our scheduler, and demonstrate its
efficacy versus static block structures on Lasso and Matrix Factorization
High-Performance Distributed ML at Scale through Parameter Server Consistency Models
As Machine Learning (ML) applications increase in data size and model
complexity, practitioners turn to distributed clusters to satisfy the increased
computational and memory demands. Unfortunately, effective use of clusters for
ML requires considerable expertise in writing distributed code, while
highly-abstracted frameworks like Hadoop have not, in practice, approached the
performance seen in specialized ML implementations. The recent Parameter Server
(PS) paradigm is a middle ground between these extremes, allowing easy
conversion of single-machine parallel ML applications into distributed ones,
while maintaining high throughput through relaxed "consistency models" that
allow inconsistent parameter reads. However, due to insufficient theoretical
study, it is not clear which of these consistency models can really ensure
correct ML algorithm output; at the same time, there remain many
theoretically-motivated but undiscovered opportunities to maximize
computational throughput. Motivated by this challenge, we study both the
theoretical guarantees and empirical behavior of iterative-convergent ML
algorithms in existing PS consistency models. We then use the gleaned insights
to improve a consistency model using an "eager" PS communication mechanism, and
implement it as a new PS system that enables ML algorithms to reach their
solution more quickly.Comment: 19 pages, 2 figure
Luminosity Evolution of Early-type Galaxies to z=0.83: Constraints on Formation Epoch and Omega
We present deep spectroscopy with the Keck telescope of eight galaxies in the
luminous X-ray cluster MS1054-03 at z=0.83. The data are combined with imaging
observations from the Hubble Space Telescope (HST). The spectroscopic data are
used to measure the internal kinematics of the galaxies, and the HST data to
measure their structural parameters. Six galaxies have early-type spectra, and
two have "E+A" spectra. The galaxies with early-type spectra define a tight
Fundamental Plane (FP) relation. The evolution of the mass-to-light ratio is
derived from the FP. The M/L ratio evolves as \Delta log M/L_B \propto -0.40 z
(Omega_m=0.3, Omega_Lambda=0). The observed evolution of the M/L ratio provides
a combined constraint on the formation redshift of the stars, the IMF, and
cosmological parameters. For a Salpeter IMF (x=2.35) we find that z_form>2.8
and Omega_m<0.86 with 95% confidence. The constraint on the formation redshift
is weaker if Omega_Lambda>0: z_form>1.7 if Omega_m=0.3 and Omega_Lambda=0.7. At
present the limiting factor in constraining z_form and Omega from the observed
luminosity evolution of early-type galaxies is the poor understanding of the
IMF. We find that if Omega_m=1 the IMF must be significantly steeper than the
Salpeter IMF (x>2.6).Comment: To be published in ApJ Letters, Volume 504, September 1, 1998. 5
pages, 4 figure
Approaches to Capacity Building for Machine Learning and Artificial Intelligence Applications in Health
Many health systems and research institutes are interested in supplementing their traditional analyses of linked data with machine learning (ML) and other artificial intelligence (AI) methods and tools. However, the availability of individuals who have the required skills to develop and/or implement ML/AI is a constraint, as there is high demand for ML/AI talent in many sectors. The three organizations presenting are all actively involved in training and capacity building for ML/AI broadly, and each has a focus on, and/or discrete initiatives for, particular trainees.
P. Alison Paprica, Vector Institute for artificial intelligence, Institute for Clinical Evaluative Sciences, University of Toronto, Canada. Alison is VP, Health Strategy and Partnerships at Vector, responsible for health strategy and also playing a lead role in “1000AIMs” – a Vector-led initiative in support of the Province of Ontario’s \$30 million investment to increase the number of AI-related master’s program graduates to 1,000 per year within five years.
Frank Sullivan, University of St Andrews Scotland. Frank is a family physician and an associate director of HDRUK@Scotland. Health Data Research UK \url{https://hdruk.ac.uk/} has recently provided funding to six sites across the UK to address challenging healthcare issues through use of data science. A 50 PhD student Doctoral Training Scheme in AI has also been announced. Each site works in close partnership with National Health Service bodies and the public to translate research findings into benefits for patients and populations.
Yin Aphinyanaphongs – INTREPID NYU clinical training program for incoming clinical fellows. Yin is the Director of the Clinical Informatics Training Program at NYU Langone Health. He is deeply interested in the intersection of computer science and health care and as a physician and a scientist, he has a unique perspective on how to train medical professionals for a data drive world. One version of this teaching process is demonstrated in the INTREPID clinical training program. Yin teaches clinicians to work with large scale data within the R environment and generate hypothesis and insights.
The session will begin with three brief presentations followed by a facilitated session where all participants share their insights about the essential skills and competencies required for different kinds of ML/AI application and contributions. Live polling and voting will be used at the end of the session to capture participants’ view on the key learnings and take away points.
The intended outputs and outcomes of the session are:
• Participants will have a better understanding of the skills and competencies required for individuals to contribute to AI applications in health in various ways
• Participants will gain knowledge about different options for capacity building from targeted enhancement of the skills of clinical fellows, to producing large number of applied master’s graduates, to doctoral-level training
After the session, the co-leads will work together to create a resource that summarizes the learnings from the session and make them public (though publication in a peer-reviewed journal and/or through the IPDLN website
A Database of Cepheid Distance Moduli and TRGB, GCLF, PNLF and SBF Data Useful for Distance Determinations
We present a compilation of Cepheid distance moduli and data for four
secondary distance indicators that employ stars in the old stellar populations:
the planetary nebula luminosity function (PNLF), the globular cluster
luminosity function (GCLF), the tip of the red giant branch (TRGB), and the
surface brightness fluctuation (SBF) method. The database includes all data
published as of July 15, 1999. The main strength of this compilation resides in
all data being on a consistent and homogeneous system: all Cepheid distances
are derived using the same calibration of the period-luminosity relation, the
treatment of errors is consistent for all indicators, measurements which are
not considered reliable are excluded. As such, the database is ideal for
inter-comparing any of the distance indicators considered, or for deriving a
Cepheid calibration to any secondary distance indicator. Specifically, the
database includes: 1) Cepheid distances, extinctions and metallicities; 2)
apparent magnitudes of the PNLF cutoff; 3) apparent magnitudes and colors of
the turnover of the GCLF (both in the V- and B-bands); 4) apparent magnitudes
of the TRGB (in the I-band) and V-I colors at and 0.5 magnitudes fainter than
the TRGB; 5) apparent surface brightness fluctuation magnitudes I, K', K_short,
and using the F814W filter with the HST/WFPC2. In addition, for every galaxy in
the database we give reddening estimates from DIRBE/IRAS as well as HI maps,
J2000 coordinates, Hubble and T-type morphological classification, apparent
total magnitude in B, and systemic velocity. (Abridged)Comment: Accepted for publication in the Astrophysical Journal Supplement
Series. Because of space limitations, the figures included are low resolution
bitmap images. Original figures can be found at
http://www.astro.ucla.edu/~laura/pub.ht
- …