174 research outputs found

    RetScan: efficient fovea and optic disc detection in retinographies

    Get PDF
    Dissertação de mestrado em Engenharia de InformáticaThe Fovea and Optic Disc are relevant anatomical eye structures to diagnose various diseases. Its automatic detection can provide both a cost reduction when analysing large populations and improve the effectiveness of ophthalmologists and optometrists. This dissertation describes a methodology to automatically detect these structures and analyses a, CPU only, MATLAB implementation of this methodology. RetScan is a port to a freeware environment of this methodology, its functionality and performance are evaluated and compared to the original. The results of both evaluations lead to a discussion on possible improvements in the metodology that influence the functionality and performance. The resulting improvements are implemented and integrated in RetScan. To further improve performance, a parallelization of RetScan to take advantage of a multi-core architecture or a CUDA-enabled accelerator was designed, coded and evaluated.This evaluation reveals that RetScan achieves its best throughput efficiency using a multi-core architecture only and analysing several images at once. For one image usage, using multi-core only is also the best solution, but with a small speed-up. The usage of CUDA-enabled accelerators is not recommended for this scope as the images are small and the cost of the data transfer to and from the accelerator has a severe impact on performance.A Fóvea e o Disco Ótico são estruturas oculares importantes quando se procura diagnosticar doenças no olho. A sua deteção automática permite reduzir o custo de um rastreio a grandes populações e também aumentar a eficácia de oftalmologistas e optometristas. Nesta dissertação é descrita uma metodologia para detetar estas estruturas automaticamente e é analisada uma implementação em MATLAB desta metodologia. RetScan é o resultado do porte para um ambiente de desenvolvimento com ferramentas livres (open source) desta metodologia. O RetScan é avaliado quer em funcionalidade, quer em performance. Os resultados da avaliação levam a uma reflexão sobre mudanças a realizar à metodologia para melhorar os resultados em ambas as avaliações. Estas melhorias são implementadas e integradas no RetScan. Para melhorar a sua performance é também realizada um paralelização do RetScan de forma a que tire partido de uma arquitetura multi-core ou de um acelerador compatível com CUDA. Após realizar uma nova avaliação conclui-se que o RetScan atinge o seu melhor débito de dados (throughput) quando usa apenas os CPUs numa arquitetura multi-core e analisando várias imagens em paralelo. Para a análise de uma só imagem, o uso apenas de CPUs numa arquitetura multi-core também é o melhor resultado, embora tenha um ganho (speed up) reduzido. O uso de aceleradores compatíveis com CUDA não é recomendado neste âmbito pois as imagens têm um tamanho reduzido e o custo da transferência de e para estes aceleradores tem um grande impacto no tempo tota

    A rigorous definition of axial lines: ridges on isovist fields

    Get PDF
    We suggest that 'axial lines' defined by (Hillier and Hanson, 1984) as lines of uninterrupted movement within urban streetscapes or buildings, appear as ridges in isovist fields (Benedikt, 1979). These are formed from the maximum diametric lengths of the individual isovists, sometimes called viewsheds, that make up these fields (Batty and Rana, 2004). We present an image processing technique for the identification of lines from ridges, discuss current strengths and weaknesses of the method, and show how it can be implemented easily and effectively.Comment: 18 pages, 5 figure

    Parallelized Inference for Gravitational-Wave Astronomy

    Full text link
    Bayesian inference is the workhorse of gravitational-wave astronomy, for example, determining the mass and spins of merging black holes, revealing the neutron star equation of state, and unveiling the population properties of compact binaries. The science enabled by these inferences comes with a computational cost that can limit the questions we are able to answer. This cost is expected to grow. As detectors improve, the detection rate will go up, allowing less time to analyze each event. Improvement in low-frequency sensitivity will yield longer signals, increasing the number of computations per event. The growing number of entries in the transient catalog will drive up the cost of population studies. While Bayesian inference calculations are not entirely parallelizable, key components are embarrassingly parallel: calculating the gravitational waveform and evaluating the likelihood function. Graphical processor units (GPUs) are adept at such parallel calculations. We report on progress porting gravitational-wave inference calculations to GPUs. Using a single code - which takes advantage of GPU architecture if it is available - we compare computation times using modern GPUs (NVIDIA P100) and CPUs (Intel Gold 6140). We demonstrate speed-ups of 50×\sim 50 \times for compact binary coalescence gravitational waveform generation and likelihood evaluation and more than 100×100\times for population inference within the lifetime of current detectors. Further improvement is likely with continued development. Our python-based code is publicly available and can be used without familiarity with the parallel computing platform, CUDA.Comment: 5 pages, 4 figures, submitted to PRD, code can be found at https://github.com/ColmTalbot/gwpopulation https://github.com/ColmTalbot/GPUCBC https://github.com/ADACS-Australia/ADACS-SS18A-RSmith Add demonstration of improvement in BNS spi

    Evaluation of High Performance Fortran through Application Kernels

    Get PDF
    Since the definition of the High Performance Fortran (HPF) standard, we have been maintaining a suite of application kernel codes with the aim of using them to evaluate the available compilers. This paper presents the results and conclusions from this study, for sixteen codes, on compilers from IBM, DEC, and the Portland Group Inc. (PGI), and on three machines: a DEC Alphafarm, an IBM SP-2, and a Cray T3D. From this, we hope to show the prospective HPF user that scalable performance is possible with modest effort, yet also where the current weaknesses lay

    Optimization of image processing algorithms via communication hiding in distributed processing systems

    Get PDF
    Real-time image processing is an important topic studied in the realm of computer systems. The task of real-time image processing is found in a wide range of applications, from multimedia systems to automobiles to military systems. Typically these systems require high throughput and low latency to perform at their required specifications. Therefore, hardware, software, and communications optimizations in these systems are very important factors in meeting these specifications. This thesis analyzes the implementation and optimization of a real-world image processing system destined for an aircraft environment. It discusses the steps of optimizing the software in the system, and then looks at how the system can be distributed over multiple processing nodes via functional pipelining. Next, the thesis discusses the optimization of interprocessor communication via communication hiding. Finally, it analyzes whether communication hiding is even necessary given today\u27s high-speed networking and communication interfaces

    GPU accelerated parallel Iris segmentation

    Get PDF
    A biometric system provides automatic identification of an individual based on a unique feature or characteristic possessed by the person. Iris recognition systems are the most definitive biometric system since complex random iris patterns are unique to each individual and do not change with time. Iris Recognition is basically divided into three steps, namely, Iris Segmentation or Localization, Feature Extraction and Template Matching. To get a performance gain for the entire system it becomes vital to improve performance of each individual process. Localization of the iris borders in an eye image can be considered as a vital step in the iris recognition process due to high processing required. The Iris Segmentation algorithms are currently implemented on general purpose sequential processing systems, such as common Central Processing Units (CPUs). In this thesis, an attempt has been made to present a more straight and parallel processing alternative using the graphics processing unit (GPU), which originally was used exclusively for visualization purposes, and has evolved into an extremely powerful coprocessor, offering an opportunity to increase speed and potentially intensify the resulting system performance. To realize a speedup in Iris Segmentation, NVIDIA’s Compute Unified Device Architecture (CUDA) programming model has been used. Iris Localization is achieved by implementing Hough Circular Transform on edge image obtained by using Canny edge detection technique. Parallelism is employed in Hough Transformation step

    Parallelization of an algorithm for the automatic detection of deformable objects

    Get PDF
    This work presents the parallelization of an algorithm for the detection of deformable objects in digital images. The parallelization has been implemented with the message passing paradigm, using a master-slave model. Two versions have been developed, with synchronous and asynchronous communications

    Algorithms for Vision-Based Quality Control of Circularly Symmetric Components

    Get PDF
    Quality inspection in the industrial production field is experiencing a strong technological development that benefits from the combination of vision-based techniques with artificial intelligence algorithms. This paper initially addresses the problem of defect identification for circularly symmetric mechanical components, characterized by the presence of periodic elements. In the specific case of knurled washers, we compare the performances of a standard algorithm for the analysis of grey-scale image with a Deep Learning (DL) approach. The standard algorithm is based on the extraction of pseudo-signals derived from the conversion of the grey scale image of concentric annuli. In the DL approach, the component inspection is shifted from the entire sample to specific areas repeated along the object profile where the defect may occur. The standard algorithm provides better results in terms of accuracy and computational time with respect to the DL approach. Nevertheless, DL reaches accuracy higher than 99% when performance is evaluated targeting the identification of damaged teeth. The possibility of extending the methods and the results to other circularly symmetrical components is analyzed and discussed
    corecore