1,833 research outputs found
Evaluating a Cluster of Low-Power ARM64 Single-Board Computers with MapReduce
With the meteoric rise of enormous data collection in science, industry, and the cloud, methods for processing massive datasets have become more crucial than ever. MapReduce is a restricted programing model for expressing parallel computations as simple serial functions, and an execution framework for distributing those computations over large datasets residing on clusters of commodity hardware. MapReduce abstracts away the challenging low-level synchronization and scalability details which parallel and distributed computing often necessitate, reducing the concept burden on programmers and scientists who require data processing at-scale. Typically, MapReduce clusters are implemented using inexpensive commodity hardware, emphasizing quantity over quality due to the fault-tolerant nature of the MapReduce execution framework. The nascent explosion of inexpensive single-board computers designed around multi-core 64bit ARM processors, such as the RasberryPi 3, Pine64, and Odroid C2, has opened new avenues for inexpensive and low-power cluster computing. In this thesis, we implement a novel cluster around low-power ARM64 single-board computers and the Disco Python MapReduce execution framework. We use MapReduce to empirically evaluate our cluster by solving the Word Count and Inverted Link Index problems for the Wikipedia article dataset. We benchmark our MapReduce solutions against local solutions of the same algorithms for a conventional low-power x86 platform. We show our cluster out-performs the conventional platform for larger benchmarks, thus demonstrating low-power single-board computers as a viable avenue for data-intensive cluster computing
Linear Mode Connectivity in Sparse Neural Networks
With the rise in interest of sparse neural networks, we study how neural
network pruning with synthetic data leads to sparse networks with unique
training properties. We find that distilled data, a synthetic summarization of
the real data, paired with Iterative Magnitude Pruning (IMP) unveils a new
class of sparse networks that are more stable to SGD noise on the real data,
than either the dense model, or subnetworks found with real data in IMP. That
is, synthetically chosen subnetworks often train to the same minima, or exhibit
linear mode connectivity. We study this through linear interpolation, loss
landscape visualizations, and measuring the diagonal of the hessian. While
dataset distillation as a field is still young, we find that these properties
lead to synthetic subnetworks matching the performance of traditional IMP with
up to 150x less training points in settings where distilled data applies.Comment: Published in NeurIPS 2023 UniReps Worksho
On the analysis of tuberculosis studies with intermittent missing sputum data
In randomized studies evaluating treatments for tuberculosis (TB), individuals are scheduled to be routinely evaluated for the presence of TB using sputum cultures. One important endpoint in such studies is the time of culture conversion, the first visit at which a patient’s sputum culture is negative and remains negative. This article addresses how to draw inference about treatment effects when sputum cultures are intermittently missing on some patients. We discuss inference under a novel benchmark assumption and under a class of assumptions indexed by a treatment-specific sensitivity parameter that quantify departures from the benchmark assumption. We motivate and illustrate our approach using data from a randomized trial comparing the effectiveness of two treatments for adult TB patients in Brazil.Fil: Scharfstein, Daniel. University Johns Hopkins; Estados UnidosFil: Rotnitzky, Andrea Gloria. Consejo Nacional de Investigaciones CientÃficas y Técnicas; Argentina. Universidad Torcuato Di Tella. Departamento de EconomÃa; ArgentinaFil: Abraham, Maria. Statistics Collaborative; Estados UnidosFil: McDermott, Aidan. University Johns Hopkins; Estados UnidosFil: Chaisson, Richard. University Johns Hopkins; Estados UnidosFil: Geiter, Lawrence. Otsuka Novel Products; Estados Unido
Neuroqueer Literacies in a Physics Context: A Discussion on Changing the Physics Classroom Using a Neuroqueer Literacy Framework
Life experience, identity, the relationship between ourselves and the world
around us among others, all affect and shape how we, as scientists, construct
knowledge. Neurodiversity, the diversity of minds, is an interesting concept
when keeping this in mind. Being neurodivergent, or neuroqueer (the viewing of
being neurodivergent as a queer thing, along with the intersection of
neurodiversity and queerness), means having non-neurotypical ways of perceiving
and interacting with the world, and especially of creating knowledge about the
rules and regulations, both natural and societal, that govern it locally and
broadly. Neuroqueer physicists, therefore, have unique non-normative ways of
doing physics, the study of the rules (which is done societally) which govern
the natural world. It is imperative that, when teaching neurodivergent
students, we encourage and support this non-normative way of thinking about
physics, and help them do physics in ways that they will be successful, and
support the development of Neuroqueer (Scientific) Literacies, from Kleekamp's
and Smilges's works on literacy. We here present a brief overview of Neuroqueer
Literacies and how to apply them in the physics classroom.Comment: 8 pages, 1 figure. Submitted to The Physics Teache
The TACL model: A framework for safeguarding children with a disability in sport
This study represents the first investigation of how children with a disability can be safeguarded in Rugby Union. In study 1, a questionnaire containing quantitative questions was completed by 389 safeguarding volunteers regarding their experiences of working with a child with a disability in their role. Descriptive statistics revealed that 76% of this sample had worked with a child with a disability in Rugby Union and that 28% continue to do so on a weekly basis. In study 2, a qualitative survey was completed by 329 safeguarding volunteers and interviews were conducted with a geographically representative sample of 14 Safeguarding Officers. This study focused on developing a model of promising practice with respect to safeguarding children with a disability in Rugby Union. Based on an inductive thematic analysis of the qualitative survey and interview data, the TACL model was developed: Trigger (creating a system that sensitively identifies children with a disability), Action Plan (creating an individualized approach such that the child is effectively included and protected), Communicate (ensuring that all key stakeholders are informed about the plan) and Learn (ensuring that cases of good practice are identified and disseminated). The name TACL (pronounced tackle) was chosen to promote proactive strategies and to provide a label relevant to the language of Rugby Union. These strategies are proposed as the basis for the safeguarding of children with a disability
Validation of the Social Security Death Index (SSDI): An Important Readily-Available Outcomes Database for Researchers
Study Objective: To determine the accuracy of the online Social Security Death Index (SSDI) for determining death outcomes. Methods: We selected 30 patients who were determined to be dead and 90 patients thought to be alive after an ED visit as determined by a web-based searched of the SSDI. For those thought to be dead we requested death certificates. We then had a research coordinator blinded to the results of the SSDI search, complete direct follow-up by contacting the patients, family or primary care physicians to determine vital status. To determine the sensitivity and specificity of the SSDI for death at six months in this cohort, we used direct follow-up as the criterion reference and calculated 95% confidence intervals. Results: Direct follow-up was completed for 90% (108 of 120) of the patients. For those patients 20 were determined to be dead and 88 alive. The dead were more likely to be male (57%) and older [(mean age 83.9 (95% CI 79.1 – 88.7) vs. 60.9 (95% CI 56.4 – 65.4) for those alive]. The sensitivity of the SSDI for those with completed direct follow-up was 100% (95% CI 91 -100%) with specificity of 100% (95% CI 98-100%). Of the 12 patients who were not able to be contacted through direct follow-up, the SSDI indicated that 10 were dead and two were alive. Conclusions: SSDI is an accurate measure of death outcomes and appears to have the advantage of finding deaths among patients lost to follow-up
UniCat: Crafting a Stronger Fusion Baseline for Multimodal Re-Identification
Multimodal Re-Identification (ReID) is a popular retrieval task that aims to
re-identify objects across diverse data streams, prompting many researchers to
integrate multiple modalities into a unified representation. While such fusion
promises a holistic view, our investigations shed light on potential pitfalls.
We uncover that prevailing late-fusion techniques often produce suboptimal
latent representations when compared to methods that train modalities in
isolation. We argue that this effect is largely due to the inadvertent
relaxation of the training objectives on individual modalities when using
fusion, what others have termed modality laziness. We present a nuanced
point-of-view that this relaxation can lead to certain modalities failing to
fully harness available task-relevant information, and yet, offers a protective
veil to noisy modalities, preventing them from overfitting to task-irrelevant
data. Our findings also show that unimodal concatenation (UniCat) and other
late-fusion ensembling of unimodal backbones, when paired with best-known
training techniques, exceed the current state-of-the-art performance across
several multimodal ReID benchmarks. By unveiling the double-edged sword of
"modality laziness", we motivate future research in balancing local modality
strengths with global representations.Comment: Accepted NeurIPS 2023 UniReps, 9 pages, 4 table
A Meta-Narrative Review on the Use of R.O.S.E in Telecytology for the Patient, Pathologist, and Cytologist
Advancements in technology have given rise to a path of convenience, ease, and flexibility for workers to work remotely. A new tool in laboratory diagnostics is Telecytology for Rapid On-Site Evaluation. A cytologist processes a specimen on site and captures an image or video of the findings. The media is sent directly to a pathologist for further evaluation, and then a diagnosis is given to the patient. With this being a relatively new practice, we need to ask what the advantages and disadvantages are for everyone involved: the patient, the cytologist, and the pathologist. We found articles that were less than five years old and reviewed the methodologies in the articles. We found that there were many advantages including decreased diagnosis time, availability to patients in rural areas, and fewer repeated procedures. We also found disadvantages such as extensive training requirements and the possibility of incorrect diagnoses. Our findings indicate there is success in using Telecytology for R.O.S.E., but that faults are present to some degree. As technology continues to advance, we expect more studies to be conducted that highlight the success of Teleyctology with Rapid On-Site Evaluation.https://openworks.mdanderson.org/rmps/1007/thumbnail.jp
GraFT: Gradual Fusion Transformer for Multimodal Re-Identification
Object Re-Identification (ReID) is pivotal in computer vision, witnessing an
escalating demand for adept multimodal representation learning. Current models,
although promising, reveal scalability limitations with increasing modalities
as they rely heavily on late fusion, which postpones the integration of
specific modality insights. Addressing this, we introduce the \textbf{Gradual
Fusion Transformer (GraFT)} for multimodal ReID. At its core, GraFT employs
learnable fusion tokens that guide self-attention across encoders, adeptly
capturing both modality-specific and object-specific features. Further
bolstering its efficacy, we introduce a novel training paradigm combined with
an augmented triplet loss, optimizing the ReID feature embedding space. We
demonstrate these enhancements through extensive ablation studies and show that
GraFT consistently surpasses established multimodal ReID benchmarks.
Additionally, aiming for deployment versatility, we've integrated neural
network pruning into GraFT, offering a balance between model size and
performance.Comment: 3 Borderline Reviews at WACV, 8 pages, 5 figures, 8 table
Randomized Controlled Trial of Prophylactic Antibiotics for Dog Bites with Refined Cost Model
Reprints available through open access a
- …