424 research outputs found
Leveraging Coding Techniques for Speeding up Distributed Computing
Large scale clusters leveraging distributed computing frameworks such as
MapReduce routinely process data that are on the orders of petabytes or more.
The sheer size of the data precludes the processing of the data on a single
computer. The philosophy in these methods is to partition the overall job into
smaller tasks that are executed on different servers; this is called the map
phase. This is followed by a data shuffling phase where appropriate data is
exchanged between the servers. The final so-called reduce phase, completes the
computation.
One potential approach, explored in prior work for reducing the overall
execution time is to operate on a natural tradeoff between computation and
communication. Specifically, the idea is to run redundant copies of map tasks
that are placed on judiciously chosen servers. The shuffle phase exploits the
location of the nodes and utilizes coded transmission. The main drawback of
this approach is that it requires the original job to be split into a number of
map tasks that grows exponentially in the system parameters. This is
problematic, as we demonstrate that splitting jobs too finely can in fact
adversely affect the overall execution time.
In this work we show that one can simultaneously obtain low communication
loads while ensuring that jobs do not need to be split too finely. Our approach
uncovers a deep relationship between this problem and a class of combinatorial
structures called resolvable designs. Appropriate interpretation of resolvable
designs can allow for the development of coded distributed computing schemes
where the splitting levels are exponentially lower than prior work. We present
experimental results obtained on Amazon EC2 clusters for a widely known
distributed algorithm, namely TeraSort. We obtain over 4.69 improvement
in speedup over the baseline approach and more than 2.6 over current
state of the art
Real Analysis in Functional Equations
In this article, we will showcase some analytical concepts that can be used
to tackle Functional Equations (FE) in the positive real numbers domain. Such
concepts and related techniques have occasionally appeared in recent High
School Math Olympiads, and they are often accompanied neatly by other known
techniques. In each section, we develop a theoretical background; next, we
briefly mention methods that employ the theory and conclude the article by
providing unsolved problems that the reader can try independently
Genome-Wide Proteomics and Quantitative Analyses on Halophilic Archaea
The aerobic, haloalkaliphilic archaeon Natronomonas pharaonis is able to survive in salt-saturated lakes of pH 11. With genome-wide shotgun proteomics, 886 soluble proteins (929 proteins in total) of the theoretical Natronomonas pharaonis soluble proteome consisting of 2187 proteins have been confidentially identified by MS/MS. By comparing the identified proteins of Natronomonas pharaonis with homologues of other organisms, both extreme diversity between halophiles and occasional extraordinary sequence conservation to proteins from unrelated species were observed, substantiating genetic exchange between organisms that are evolutionary nearly unrelated to cope with several extreme conditions. Alternative and largely overlapping open reading frames (called overprinting) could not be identified in the genome of neither Natronomonas pharaonis nor Halobacterium salinarum, leading to the conclusion that in halophiles, not more than one protein can be produced from the same genomic sequence stretch.
In the second part of this work, analyses on both the transcriptional and translational level have been performed on the halophilic archaeon Halobacterium salinarum, to gain insights into its lifestyle changes leading to cell response when challenged by heat shock. Thereby, quantitative proteomic data obtained from two different approaches regarding the labeling method (ICPL; SILAC), the fractionation of the protein or peptide mixtures (2DE; 1DE-LC), the mass spectrometric analysis (MALDI-TOF/TOF; ESI Q-TOF), and the choice of the growth medium (complex; synthetic) were integrated with data from whole-genome DNA microarrays, real-time quantitative PCR (RTqPCR), and Northern analyses.
A number of genes congruently displayed substantial induction after heat shock on both the transcript and protein level as in the case of the thermosome, two AAA-type ATPases, a Dps-like ferritin protein (DpsA), a hsp5-type molecular chaperone, and the transcription initiation factor tfbB. In contrast, the dnaK operon (hsp70) did not exhibit any significant upregulation in either of the approaches. Some genes encoding enzymes of the TCA cycle, of pathways flowing into the latter, and of pathways leading to pyrimidine synthesis, were only translationally induced. Finally, differential transcriptional induction of the transcription initiation factors tfbB and tfbA, determined by RTqPCR, led to the conclusion that they may regulate genes by reciprocal action.
The multiplicity of proteomics and transcriptomics methods are complementing one another, covering a bigger area on the one hand, but also confirming some unexpected findings
Numerical analysis of coalescence-induced jumping droplets on superhydrophobic surfaces
Bio-inspired superhydrophobic surfaces are used in numerous technological applications due to their self-cleaning ability. One of the several mechanisms reported in literature and responsible for self-cleaning is the phenomenon of coalescence-induced jumping of droplets from such surfaces. The phenomenon is observed for scales below the capillary length and when gravity is negligible. Primary applications of this technology are on heat-exchangers or any other that involve surfaces for which anti-icing and water-repellency properties are desired. This thesis comprises two publications that involve high-fidelity numerical investigations on fundamental features of the jumping droplets phenomenon and focuses on two important aspects. The first one is a study on coalescing and jumping of microdroplets (R < 10 \ub5m). The differences in the jumping process (for example, reduction of the merged droplet jumping velocity) are pointed out as a function of the initial size of the droplets. Through an analysis of the energy budget, several degrees of dissipation are found, which is attributed to a competition between viscosity and the strong capillarity on the interface. The second publication focuses on the interaction of the merged droplet with a superhydrophobic surface with hysteresis. It is found that such a case has a reduced jumping velocity as compared to a no-hysteresis one. Using a dynamic contact angle model is beneficial to capture the receding contact angle and provide a more accurate estimation of the overall process. In this work, a combined Immersed Boundary -- Volume-of-fluid method with different contact angle models and a Navier-slip boundary condition is used. The numerical framework has been extensively validated
Erasure coding for distributed matrix multiplication for matrices with bounded entries
Distributed matrix multiplication is widely used in several scientific
domains. It is well recognized that computation times on distributed clusters
are often dominated by the slowest workers (called stragglers). Recent work has
demonstrated that straggler mitigation can be viewed as a problem of designing
erasure codes. For matrices and , the technique
essentially maps the computation of into the
multiplication of smaller (coded) submatrices. The stragglers are treated as
erasures in this process. The computation can be completed as long as a certain
number of workers (called the recovery threshold) complete their assigned
tasks.
We present a novel coding strategy for this problem when the absolute values
of the matrix entries are sufficiently small. We demonstrate a tradeoff between
the assumed absolute value bounds on the matrix entries and the recovery
threshold. At one extreme, we are optimal with respect to the recovery
threshold and on the other extreme, we match the threshold of prior work.
Experimental results on cloud-based clusters validate the benefits of our
method
CAMR: Coded Aggregated MapReduce
Many big data algorithms executed on MapReduce-like systems have a shuffle phase that often dominates the overall job execution time. Recent work has demonstrated schemes where the communication load in the shuffle phase can be traded off for the computation load in the map phase. In this work, we focus on a class of distributed algorithms, broadly used in deep learning, where intermediate computations of the same task can be combined. Even though prior techniques reduce the communication load significantly, they require a number of jobs that grows exponentially in the system parameters. This limitation is crucial and may diminish the load gains as the algorithm scales. We propose a new scheme which achieves the same load as the state-of-the-art while ensuring that the number of jobs as well as the number of subfiles that the data set needs to be split into remain small
Robotic Hiatal Hernia Repair
Robotic surgery has revolutionized medicine during the last 16 years by transformation of the classic operating theaters into computer-mediated working stations. Numerous procedures have been proved to be feasible and safe by using the continuously evolving, various robotic platforms. From the early beginnings of this revolution, challenging operations such as those concerning the gastroesophageal junction, especially in super-obese patients or during redo operations, proved out to have certain benefits when performed robotically, both for patients as well as for surgeons
- …