174 research outputs found
RetScan: efficient fovea and optic disc detection in retinographies
Dissertação de mestrado em Engenharia de InformáticaThe Fovea and Optic Disc are relevant anatomical eye structures to diagnose various diseases. Its automatic detection can provide both a cost reduction when analysing large populations and improve the effectiveness of ophthalmologists and optometrists.
This dissertation describes a methodology to automatically detect these structures and analyses a, CPU only, MATLAB implementation of this methodology.
RetScan is a port to a freeware environment of this methodology, its functionality and performance are evaluated and compared to the original. The results of both evaluations lead to a discussion on possible improvements in the metodology that influence the functionality and performance. The resulting improvements are implemented and integrated in RetScan.
To further improve performance, a parallelization of RetScan to take advantage of a multi-core architecture or a CUDA-enabled accelerator was designed, coded and evaluated.This evaluation reveals that RetScan achieves its best throughput efficiency using a multi-core architecture only and analysing several images at once. For one image usage, using multi-core only is also the best solution, but with a small speed-up. The usage of CUDA-enabled accelerators is not recommended for this scope as the images are small and the cost of the data transfer to and from the accelerator has a severe impact on performance.A Fóvea e o Disco Ótico são estruturas oculares importantes quando se procura
diagnosticar doenças no olho. A sua deteção automática permite reduzir o custo de
um rastreio a grandes populações e também aumentar a eficácia de oftalmologistas
e optometristas.
Nesta dissertação é descrita uma metodologia para detetar estas estruturas automaticamente
e é analisada uma implementação em MATLAB desta metodologia.
RetScan é o resultado do porte para um ambiente de desenvolvimento com ferramentas
livres (open source) desta metodologia. O RetScan é avaliado quer em
funcionalidade, quer em performance. Os resultados da avaliação levam a uma reflexão
sobre mudanças a realizar à metodologia para melhorar os resultados em ambas
as avaliações. Estas melhorias são implementadas e integradas no RetScan. Para
melhorar a sua performance é também realizada um paralelização do RetScan de
forma a que tire partido de uma arquitetura multi-core ou de um acelerador compatível
com CUDA. Após realizar uma nova avaliação conclui-se que o RetScan atinge
o seu melhor débito de dados (throughput) quando usa apenas os CPUs numa arquitetura
multi-core e analisando várias imagens em paralelo. Para a análise de uma
só imagem, o uso apenas de CPUs numa arquitetura multi-core também é o melhor
resultado, embora tenha um ganho (speed up) reduzido. O uso de aceleradores
compatíveis com CUDA não é recomendado neste âmbito pois as imagens têm um
tamanho reduzido e o custo da transferência de e para estes aceleradores tem um
grande impacto no tempo tota
A rigorous definition of axial lines: ridges on isovist fields
We suggest that 'axial lines' defined by (Hillier and Hanson, 1984) as lines
of uninterrupted movement within urban streetscapes or buildings, appear as
ridges in isovist fields (Benedikt, 1979). These are formed from the maximum
diametric lengths of the individual isovists, sometimes called viewsheds, that
make up these fields (Batty and Rana, 2004). We present an image processing
technique for the identification of lines from ridges, discuss current
strengths and weaknesses of the method, and show how it can be implemented
easily and effectively.Comment: 18 pages, 5 figure
Parallelized Inference for Gravitational-Wave Astronomy
Bayesian inference is the workhorse of gravitational-wave astronomy, for
example, determining the mass and spins of merging black holes, revealing the
neutron star equation of state, and unveiling the population properties of
compact binaries. The science enabled by these inferences comes with a
computational cost that can limit the questions we are able to answer. This
cost is expected to grow. As detectors improve, the detection rate will go up,
allowing less time to analyze each event. Improvement in low-frequency
sensitivity will yield longer signals, increasing the number of computations
per event. The growing number of entries in the transient catalog will drive up
the cost of population studies. While Bayesian inference calculations are not
entirely parallelizable, key components are embarrassingly parallel:
calculating the gravitational waveform and evaluating the likelihood function.
Graphical processor units (GPUs) are adept at such parallel calculations. We
report on progress porting gravitational-wave inference calculations to GPUs.
Using a single code - which takes advantage of GPU architecture if it is
available - we compare computation times using modern GPUs (NVIDIA P100) and
CPUs (Intel Gold 6140). We demonstrate speed-ups of for
compact binary coalescence gravitational waveform generation and likelihood
evaluation and more than for population inference within the
lifetime of current detectors. Further improvement is likely with continued
development. Our python-based code is publicly available and can be used
without familiarity with the parallel computing platform, CUDA.Comment: 5 pages, 4 figures, submitted to PRD, code can be found at
https://github.com/ColmTalbot/gwpopulation
https://github.com/ColmTalbot/GPUCBC
https://github.com/ADACS-Australia/ADACS-SS18A-RSmith Add demonstration of
improvement in BNS spi
Evaluation of High Performance Fortran through Application Kernels
Since the definition of the High Performance Fortran (HPF) standard, we have been maintaining a suite of application kernel codes with the aim of using them to evaluate the available compilers. This paper presents the results and conclusions from this study, for sixteen codes, on compilers from IBM, DEC, and the Portland Group Inc. (PGI), and on three machines: a DEC Alphafarm, an IBM SP-2, and a Cray T3D. From this, we hope to show the prospective HPF user that scalable performance is possible with modest effort, yet also where the current weaknesses lay
Optimization of image processing algorithms via communication hiding in distributed processing systems
Real-time image processing is an important topic studied in the realm of computer systems. The task of real-time image processing is found in a wide range of applications, from multimedia systems to automobiles to military systems. Typically these systems require high throughput and low latency to perform at their required specifications. Therefore, hardware, software, and communications optimizations in these systems are very important factors in meeting these specifications. This thesis analyzes the implementation and optimization of a real-world image processing system destined for an aircraft environment. It discusses the steps of optimizing the software in the system, and then looks at how the system can be distributed over multiple processing nodes via functional pipelining. Next, the thesis discusses the optimization of interprocessor communication via communication hiding. Finally, it analyzes whether communication hiding is even necessary given today\u27s high-speed networking and communication interfaces
GPU accelerated parallel Iris segmentation
A biometric system provides automatic identification of an individual based on a unique feature or characteristic possessed by the person. Iris recognition systems are the most definitive biometric system since complex random iris patterns are unique to each individual and do not change with time. Iris Recognition is basically divided into three steps, namely, Iris Segmentation or Localization, Feature Extraction and Template Matching. To get a performance gain for the entire system it becomes vital to improve performance of each individual process. Localization of the iris borders in an eye image can be considered as a vital step in the iris recognition process due to high processing required. The Iris Segmentation algorithms are currently implemented on general purpose sequential processing systems, such as common Central Processing Units (CPUs). In this thesis, an attempt has been made to present a more straight and parallel processing alternative using the graphics processing unit (GPU), which originally was used exclusively for visualization purposes, and has evolved into an extremely powerful coprocessor, offering an opportunity to increase speed and potentially intensify the resulting system performance. To realize a speedup in Iris Segmentation, NVIDIA’s Compute Unified Device Architecture (CUDA) programming model has been used. Iris Localization is achieved by implementing Hough Circular Transform on edge image obtained by using Canny edge detection technique. Parallelism is employed in Hough Transformation step
Parallelization of an algorithm for the automatic detection of deformable objects
This work presents the parallelization of an algorithm for the detection of deformable objects in digital images. The parallelization has been implemented with the message passing paradigm, using a master-slave model. Two versions have been developed, with synchronous and asynchronous communications
Algorithms for Vision-Based Quality Control of Circularly Symmetric Components
Quality inspection in the industrial production field is experiencing a strong technological development that benefits from the combination of vision-based techniques with artificial intelligence algorithms. This paper initially addresses the problem of defect identification for circularly symmetric mechanical components, characterized by the presence of periodic elements. In the specific case of knurled washers, we compare the performances of a standard algorithm for the analysis of grey-scale image with a Deep Learning (DL) approach. The standard algorithm is based on the extraction of pseudo-signals derived from the conversion of the grey scale image of concentric annuli. In the DL approach, the component inspection is shifted from the entire sample to specific areas repeated along the object profile where the defect may occur. The standard algorithm provides better results in terms of accuracy and computational time with respect to the DL approach. Nevertheless, DL reaches accuracy higher than 99% when performance is evaluated targeting the identification of damaged teeth. The possibility of extending the methods and the results to other circularly symmetrical components is analyzed and discussed
- …