38 research outputs found
Deep Regionlets for Object Detection
In this paper, we propose a novel object detection framework named "Deep
Regionlets" by establishing a bridge between deep neural networks and
conventional detection schema for accurate generic object detection. Motivated
by the abilities of regionlets for modeling object deformation and multiple
aspect ratios, we incorporate regionlets into an end-to-end trainable deep
learning framework. The deep regionlets framework consists of a region
selection network and a deep regionlet learning module. Specifically, given a
detection bounding box proposal, the region selection network provides guidance
on where to select regions to learn the features from. The regionlet learning
module focuses on local feature selection and transformation to alleviate local
variations. To this end, we first realize non-rectangular region selection
within the detection framework to accommodate variations in object appearance.
Moreover, we design a "gating network" within the regionlet leaning module to
enable soft regionlet selection and pooling. The Deep Regionlets framework is
trained end-to-end without additional efforts. We perform ablation studies and
conduct extensive experiments on the PASCAL VOC and Microsoft COCO datasets.
The proposed framework outperforms state-of-the-art algorithms, such as
RetinaNet and Mask R-CNN, even without additional segmentation labels.Comment: Accepted to ECCV 201
Correcting soft errors online in fast fourier transform
While many algorithm-based fault tolerance (ABFT) schemes have been proposed to detect soft errors offline in the fast Fourier transform (FFT) after computation finishes, none of the existing ABFT schemes detect soft errors online before the computation finishes. This paper presents an online ABFT scheme for FFT so that soft errors can be detected online and the corrupted computation can be terminated in a much more timely manner. We also extend our scheme to tolerate both arithmetic errors and memory errors, develop strategies to reduce its fault tolerance overhead and improve its numerical stability and fault coverage, and finally incorporate it into the widely used FFTW library - one of the today's fastest FFT software implementations. Experimental results demonstrate that: (1) the proposed online ABFT scheme introduces much lower overhead than the existing offline ABFT schemes; (2) it detects errors in a much more timely manner; and (3) it also has higher numerical stability and better fault coverage
Association of Intraoperative Hypotension with Acute Kidney Injury after Noncardiac Surgery in Patients Younger than 60 Years Old
Background/Aims: Intraoperative hypotension (IOH) may be associated with surgery-related acute kidney injury (AKI). However, the duration of hypotension that triggers AKI is poorly understood. The incidence of AKI with various durations of IOH and mean arterial pressures (MAPs) was investigated. Materials: A retrospective cohort study of 4,952 patients undergoing noncardiac surgery (2011 to 2016) with MAP monitoring and a length of stay of one or more days was performed. The exclusion criteria were a preoperative estimated glomerular filtration (eGFR) ≤60 mL min–1 1.73 m2–1, a preoperative MAP less than 65 mm Hg, dialysis dependence, urologic surgery, age older than 60 years, and a surgical duration of less than 60 min. The primary exposure was IOH, and the primary outcome was AKI (50% or 0.3 mg dL–1 increase in creatinine) during the first 7 postoperative days. Multivariable logistic regression was used to model the exposure-outcome relationship. Results: AKI occurred in 186 (3.76%) noncardiac surgery patients. The adjusted odds ratio for surgery-related AKI for a MAP of less than 55 mm Hg was 14.11 (95% confidence interval: 5.02–39.69) for an exposure of more than 20 min. Age was not an interaction factor between AKI and IOH. Conclusion: There was a considerably increased risk of postoperative AKI when intraoperative MAP was less than 55 mm Hg for more than 10 min. Strict blood pressure management is recommended even for patients younger than 60 years old
Recommended from our members
Exploring Interprocess Techniques for High-Performance MPI Communication
In exascale computing era, applications are executed at larger scale than ever before, whichresults in higher requirement of scalability for communication library design. Message Pass-
ing Interface (MPI) is widely adopted by the parallel application nowadays for interprocess
communication, and the performance of the communication can significantly impact the
overall performance of applications especially at large scale.
There are many aspects of MPI communication that need to be explored for the
maximal message rate and network throughput. Considering load balance, communication
load balance is essential for high-performance applications. Unbalanced communication can
cause severe performance degradation, even in computation-balanced Bulk Synchronous
Parallel (BSP) applications. MPI communication imbalance issue is not well investigated
like computation load balance. Since the communication is not fully controlled by application
developers, designing communication-balanced applications is challenging because of
the diverse communication implementations at the underlying runtime system.
In addition, MPI provides nonblocking point-to-point and one-sided communication
models where asynchronous progress is required to guarantee the completion of MPI
communications and achieve better communication and computation overlap. Traditional
mechanisms either spawn an additional background thread on each MPI process or launch
a fixed number of helper processes on each node. For complex multiphase applications,
unfortunately, severe performance degradation may occur due to dynamically changing
communication characteristics.
On the other hand, as the number of CPU cores and nodes adopted by the applications
greatly increases, even the small message size MPI collectives can result in the
huge communication overhead at large scale if they are not carefully designed. There are
MPI collective algorithms that have been hierarchically designed to saturate inter-node
network bandwidth for the maximal communication performance. Meanwhile, advanced
shared memory techniques such as XPMEM, KNEM and CMA are adopted to accelerate
intra-node MPI collective communication. Unfortunately, these studies mainly focus on
large-message collective optimization which leaves small- and medium-message MPI collectives
suboptimal. In addition, they are not able to achieve the optimal performance due to
the limitations of the shared memory techniques.
To solve these issues, we first present CAB-MPI, an MPI implementation that can
identify idle processes inside MPI and use these idle resources to dynamically balance communication
workload on the node. We design throughput-optimized strategies to ensure
efficient stealing of the data movement tasks. The experimental results show the benefits
of CAB-MPI through several internal processes in MPI, including intranode data transfer,
pack/unpack for noncontiguous communication, and computation in one-sided accumulates
through a set of microbenchmarks and proxy applications on Intel Xeon and Xeon Phi plat-
forms. Then, we propose a novel Dynamic Asynchronous Progress Stealing model (Daps)
to completely address the asynchronous progress complication; Daps is implemented inside
the MPI runtime, and it dynamically leverages idle MPI processes to steal communication
progress tasks from other busy computing processes located on the same node. We compare
Daps with state-of-the-art asynchronous progress approaches by utilizing both microbenchmarks
and HPC proxy applications, and the results show the Daps can outperform the
baselines and achieve less idleness during asynchronous communication. Finally, to further
improve MPI collectives performance, we propose Process-in-Process based Multiobject
Interprocess MPI Collective (PiP-MColl) design to maximize small and medium-message
MPI collective performance at a large scale. Different from previous studies, PiP-MColl
is designed with efficient multiple senders and receivers collective algorithms and adopts
Process-in-Process shared memory technique to avoid unnecessary system call and page
fault overhead to achieve the best intra- and inter-node message rate and throughput. We
focus on three widely used MPI collectives MPI Scatter, MPI Allgather and MPI_Allreduce
and apply PiP-MColl to them. Our microbenchmark and real-world HPC application experimental
results show PiP-MColl can significantly improve the collective performance at
a large scale compared with baseline PiP-MPICH and other widely used MPI libraries such
as OpenMPI, MVAPICH2 and Intel MPI
On the Difference Between Shared Memory and Shared Address Space in HPC Communication
AbstractShared memory mechanisms, e.g., POSIX shmem or XPMEM, are widely used to implement efficient intra-node communication among processes running on the same node. While POSIX shmem allows other processes to access only newly allocated memory, XPMEM allows accessing any existing data and thus enables more efficient communication because the send buffer content can directly be copied to the receive buffer. Recently, the shared address space model has been proposed, where processes on the same node are mapped into the same address space at the time of process creation, allowing processes to access any data in the shared address space. Process-in-Process (PiP) is an implementation of such mechanism. The functionalities of shared memory mechanisms and the shared address space model look very similar – both allow accessing the data of other processes –, however, the shared address space model includes the shared memory model. Their internal mechanisms are also notably different. This paper clarifies the differences between the shared memory and the shared address space models, both qualitatively and quantitatively. This paper is not to showcase applications of the shared address space model, but through minimal modifications to an existing MPI implementation it highlights the basic differences between the two models. The following four MPI configurations are evaluated and compared; 1) POSIX Shmem, 2) XPMEM, 3) PiP-Shmem, where intra-node communication is implemented to utilize POSIX shmem but MPI processes share the same address space, and 4) PiP-XPMEM, where XPMEM functions are implemented by the PiP library (without the need for linking to XPMEM library). Evaluation is done using the Intel MPI benchmark suite and six HPC benchmarks (HPCCG, miniGhost, LULESH2.0, miniMD, miniAMR and mpiGraph). Most notably, mpiGraph performance of PiP-XPMEM outperforms the XPMEM implementation by almost 1.5x. The performance numbers of HPCCG, miniGhost, miniMD, LULESH2.0 running with PiP-Shmem and PiP-XPMEM are comparable with those of POSIX Shmem and XPMEM. PiP is not only a practical implementation of the shared address space model, but it also provides opportunities for developing new optimization techniques, which the paper further elaborates on.
</jats:p
Tailoring the Microstructure of Porous Carbon Spheres as High Rate Performance Anodes for Lithium-Ion Batteries
Benefiting from their high surface areas, excellent conductivity, and environmental-friendliness, porous carbon nanospheres (PCSs) are of particular attraction for the anodes of lithium-ion batteries (LIBs). However, the regulation of carbon nanospheres with controlled pore distribution and graphitization for delivering high Li+ storage behavior is still under investigation. Here, we provide a facile approach to obtain PCSs with different microstructures via modulating the carbonization temperatures. With the processing temperature of 850 °C, the optimized PCSs exhibit an increased surface area, electrical conductivity, and enhanced specific capacity (202 mA h g−1 at 2 A g−1) compared to the PCSs carbonized at lower temperatures. Additionally, PCSs 850 provide excellent cyclability with a capacity retention of 83% for 500 cycles. Such work can pave a new pathway to achieve carbon nanospheres with excellent performances in LIBs
