10 research outputs found

    ZNN -- A Fast and Scalable Algorithm for Training 3D Convolutional Networks on Multi-core and Many-Core Shared Memory Machines

    No full text
    Convolutional networks (ConvNets) have become a popular approach to computer vision. It is important to accelerate ConvNet training, which is computationally costly. We propose a novel parallel algorithm based on decomposition into a set of tasks, most of which are convolutions or FFTs. Applying Brent's theorem to the task dependency graph implies that linear speedup with the number of processors is attainable within the PRAM model of parallel computation, for wide network architectures. To attain such performance on real shared-memory machines, our algorithm computes convolutions converging on the same node of the network with temporal locality to reduce cache misses, and sums the convergent convolution outputs via an almost wait-free concurrent method to reduce time spent in critical sections. We implement the algorithm with a publicly available software package called ZNN. Benchmarking with multi-core CPUs shows that ZNN can attain speedup roughly equal to the number of physical cores. We also show that ZNN can attain over 90x speedup on a many-core CPU (Xeon Phi Knights Corner). These speedups are achieved for network architectures with widths that are in common use. The task parallelism of the ZNN algorithm is suited to CPUs, while the SIMD parallelism of previous algorithms is compatible with GPUs. Through examples, we show that ZNN can be either faster or slower than certain GPU implementations depending on specifics of the network architecture, filter sizes, and density and size of the output patch. ZNN may be less costly to develop and maintain, due to the relative ease of general-purpose CPU programming

    Integrating human behaviour and building vulnerability for the assessment and mitigation of seismic risk in historic centres: Proposal of a holistic human-centred simulation-based approach

    No full text
    The complexity of historic centres implies that risk assessment in those areas should be based on joint analyses of the characteristics of the built environment and the population's features, exposure and interaction with the surrounding environment. Such a holistic approach is urgently needed to evaluate the impact of mitigation strategies, especially in sudden onset disasters, and, mainly, earthquakes. In fact, the effectiveness of retrofitting interventions and emergency management strategies on the safety level depends greatly on such interactions, also in relation to the path network features. This work proposes a PDCA-based methodology for earthquake risk assessment which innovatively combines built environment damage assessment with a simulation of human evacuation behaviour so as to identify potentially inaccessible evacuation paths and urban areas, define related paths/areas safety levels and evaluate the impact of proposed retrofitting and management strategies on the population's safety in an emergency. To this end, a validated seismic vulnerability index method for masonry façade walls is combined with empirical damage assessment correlations (debris depth estimation in outdoor spaces) to create post-earthquake damage scenarios. Then, these are used as input data for evacuation process assessment through an existing earthquake pedestrians' evacuation simulator. Paths and safe areas risk indices are proposed to evaluate the main behavioural issues in emergency conditions. Finally, different solutions aimed at improving evacuation safety (i.e. emergency plans, rescuers' access strategies and retrofitting of buildings) are proposed and discussed for a significant case study, the historic centre of Coimbra, Portugal

    Space-time wiring specificity supports direction selectivity in the retina

    No full text
    How does the mammalian retina detect motion? This classic problem in visual neuroscience has remained unsolved for 50 years. In search of clues, here we reconstruct Off-type starburst amacrine cells (SACs) and bipolar cells (BCs) in serial electron microscopic images with help from EyeWire, an online community of 'citizen neuroscientists'. On the basis of quantitative analyses of contact area and branch depth in the retina, we find evidence that one BC type prefers to wire with a SAC dendrite near the SAC soma, whereas another BC type prefers to wire far from the soma. The near type is known to lag the far type in time of visual response. A mathematical model shows how such 'space-time wiring specificity' could endow SAC dendrites with receptive fields that are oriented in space-time and therefore respond selectively to stimuli that move in the outward direction from the som

    A Multicore Path to Connectomics-on-Demand

    No full text
    The current design trend in large scale machine learning is to use distributed clusters of CPUs and GPUs with MapReduce-style programming. Some have been led to believe that this type of horizontal scaling can reduce or even eliminate the need for traditional algorithm development, careful parallelization, and performance engineering. This paper is a case study showing the contrary: that the benefits of algorithms, parallelization, and performance engineering, can sometimes be so vast that it is possible to solve "cluster-scale" problems on a single commodity multicore machine. Connectomics is an emerging area of neurobiology that uses cutting edge machine learning and image processing to extract brain connectivity graphs from electron microscopy images. It has long been assumed that the processing of connectomics data will require mass storage, farms of CPU/GPUs, and will take months (if not years) of processing time. We present a high-throughput connectomics-on-demand system that runs on a multicore machine with less than 100 cores and extracts connectomes at the terabyte per hour pace of modern electron microscopes.National Science Foundation (U.S.) (grant IIS-1447786)National Science Foundation (U.S.) (grant CCF1563880)United States. Intelligence Advanced Research Projects Activity (grant 138076-5093555
    corecore