49 research outputs found
Proteus: Simulating the Performance of Distributed DNN Training
DNN models are becoming increasingly larger to achieve unprecedented
accuracy, and the accompanying increased computation and memory requirements
necessitate the employment of massive clusters and elaborate parallelization
strategies to accelerate DNN training. In order to better optimize the
performance and analyze the cost, it is indispensable to model the training
throughput of distributed DNN training. However, complex parallelization
strategies and the resulting complex runtime behaviors make it challenging to
construct an accurate performance model. In this paper, we present Proteus, the
first standalone simulator to model the performance of complex parallelization
strategies through simulation execution. Proteus first models complex
parallelization strategies with a unified representation named Strategy Tree.
Then, it compiles the strategy tree into a distributed execution graph and
simulates the complex runtime behaviors, comp-comm overlap and bandwidth
sharing, with a Hierarchical Topo-Aware Executor (HTAE). We finally evaluate
Proteus across a wide variety of DNNs on three hardware configurations.
Experimental results show that Proteus achieves average prediction
error and preserves order for training throughput of various parallelization
strategies. Compared to state-of-the-art approaches, Proteus reduces prediction
error by up to
Recommended from our members
Uptake, sequestration and tolerance of cadmium at cellular levels in the hyperaccumulator plant species Sedum alfredii.
Sedum alfredii is one of a few plant species known to hyperaccumulate cadmium (Cd). Uptake, localization, and tolerance of Cd at cellular levels in shoots were compared in hyperaccumulating (HE) and non-hyperaccumulating (NHE) ecotypes of Sedum alfredii. X-ray fluorescence images of Cd in stems and leaves showed only a slight Cd signal restricted within vascular bundles in the NHEs, while enhanced localization of Cd, with significant tissue- and age-dependent variations, was detected in HEs. In contrast to the vascular-enriched Cd in young stems, parenchyma cells in leaf mesophyll, stem pith and cortex tissues served as terminal storage sites for Cd sequestration in HEs. Kinetics of Cd transport into individual leaf protoplasts of the two ecotypes showed little difference in Cd accumulation. However, far more efficient storage of Cd in vacuoles was apparent in HEs. Subsequent analysis of cell viability and hydrogen peroxide levels suggested that HE protoplasts exhibited higher resistance to Cd than those of NHE protoplasts. These results suggest that efficient sequestration into vacuoles, as opposed to rapid transport into parenchyma cells, is a pivotal process in Cd accumulation and homeostasis in shoots of HE S. alfredii. This is in addition to its efficient root-to-shoot translocation of Cd
Optimizing Video Object Detection via a Scale-Time Lattice
High-performance object detection relies on expensive convolutional networks
to compute features, often leading to significant challenges in applications,
e.g. those that require detecting objects from video streams in real time. The
key to this problem is to trade accuracy for efficiency in an effective way,
i.e. reducing the computing cost while maintaining competitive performance. To
seek a good balance, previous efforts usually focus on optimizing the model
architectures. This paper explores an alternative approach, that is, to
reallocate the computation over a scale-time space. The basic idea is to
perform expensive detection sparsely and propagate the results across both
scales and time with substantially cheaper networks, by exploiting the strong
correlations among them. Specifically, we present a unified framework that
integrates detection, temporal propagation, and across-scale refinement on a
Scale-Time Lattice. On this framework, one can explore various strategies to
balance performance and cost. Taking advantage of this flexibility, we further
develop an adaptive scheme with the detector invoked on demand and thus obtain
improved tradeoff. On ImageNet VID dataset, the proposed method can achieve a
competitive mAP 79.6% at 20 fps, or 79.0% at 62 fps as a performance/speed
tradeoff.Comment: Accepted to CVPR 2018. Project page:
http://mmlab.ie.cuhk.edu.hk/projects/ST-Lattice
Quantum Image Processing and Its Application to Edge Detection: Theory and Experiment
Processing of digital images is continuously gaining in volume and relevance,
with concomitant demands on data storage, transmission and processing power.
Encoding the image information in quantum-mechanical systems instead of
classical ones and replacing classical with quantum information processing may
alleviate some of these challenges. By encoding and processing the image
information in quantum-mechanical systems, we here demonstrate the framework of
quantum image processing, where a pure quantum state encodes the image
information: we encode the pixel values in the probability amplitudes and the
pixel positions in the computational basis states. Our quantum image
representation reduces the required number of qubits compared to existing
implementations, and we present image processing algorithms that provide
exponential speed-up over their classical counterparts. For the commonly used
task of detecting the edge of an image, we propose and implement a quantum
algorithm that completes the task with only one single-qubit operation,
independent of the size of the image. This demonstrates the potential of
quantum image processing for highly efficient image and video processing in the
big data era.Comment: 13 pages, including 9 figures and 5 appendixe
Rapid assessment of T-cell receptor specificity of the immune repertoire
Accurate assessment of T-cell-receptor (TCR)–antigen specificity across the whole immune repertoire lies at the heart of improved cancer immunotherapy, but predictive models capable of high-throughput assessment of TCR–peptide pairs are lacking. Recent advances in deep sequencing and crystallography have enriched the data available for studying TCR–peptide systems. Here, we introduce RACER, a pairwise energy model capable of rapid assessment of TCR–peptide affinity for entire immune repertoires. RACER applies supervised machine learning to efficiently and accurately resolve strong TCR–peptide binding pairs from weak ones. The trained parameters further enable a physical interpretation of interacting patterns encoded in each TCR–peptide system. When applied to simulate thymic selection of a major-histocompatibility-complex (MHC)-restricted T-cell repertoire, RACER accurately estimates recognition rates for tumor-associated neoantigens and foreign peptides, thus demonstrating its utility in helping address the computational challenge of reliably identifying properties of tumor antigen-specific T-cells at the level of an individual patient’s immune repertoire
Active YAP promotes pancreatic cancer cell motility, invasion and tumorigenesis in a mitotic phosphorylation-dependent manner through LPAR3.
The transcriptional co-activator Yes-associated protein, YAP, is a main effector in the Hippo tumor suppressor pathway. We recently defined a mechanism for positive regulation of YAP through CDK1-mediated mitotic phosphorylation. Here, we show that active YAP promotes pancreatic cancer cell migration, invasion and anchorage-independent growth in a mitotic phosphorylation-dependent manner. Mitotic phosphorylation is essential for YAP-driven tumorigenesis in animals. YAP reduction significantly impairs cell migration and invasion. Immunohistochemistry shows significant upregulation and nuclear localization of YAP in metastases when compared with primary tumors and normal tissue in human. Mitotic phosphorylation of YAP controls a unique transcriptional program in pancreatic cells. Expression profiles reveal LPAR3 (lysophosphatidic acid receptor 3) as a mediator for mitotic phosphorylation-driven pancreatic cell motility and invasion. Together, this work identifies YAP as a novel regulator of pancreatic cancer cell motility, invasion and metastasis, and as a potential therapeutic target for invasive pancreatic cancer
Clinical Study Efficacy of Combined Laparoscopic and Hysteroscopic Repair of Post-Cesarean Section Uterine Diverticulum: A Retrospective Analysis
Background. Diverticulum, one of the long-term sequelae of cesarean section, can cause abnormal uterine bleeding and increase the risk of uterine scar rupture. In this study, we aimed to evaluate the efficacy of combined laparoscopic and hysteroscopic repair, a newly occurring method, treating post-cesarean section uterine scar diverticulum. Methods. Data relating to 40 patients with post-cesarean section uterine diverticulum who underwent combined laparoscopic and hysteroscopic repair were retrospectively analyzed. Preoperative clinical manifestations, size of uterine defects, thickness of the lower uterine segment (LUS), and duration of menstruation were compared with follow-up findings at 1, 3, and 6 months after surgery. Results. The average preoperative length and width of uterine diverticula and thickness of the lower uterine segment were recorded and analyzed. The average durations of menstruations at 1, 3, and 6 months after surgery were significantly shorter than the preoperative one ( < 0.05), respectively. At 6 months after surgery, the overall success improvement rate of surgery was 90% (36/40). Three patients (3/40 = 7.5%) developed partial improvement, and 1/40 (2.5%) was lost to follow-up. Conclusions. Our findings showed that combined treatment with laparoscopy and hysteroscopy was an effective method for the repair of post-cesarean section uterine diverticulum