426 research outputs found

    Reducing energy usage in resource-intensive Java-based scientific applications via micro-benchmark based code refactorings

    Get PDF
    In-silico research has grown considerably. Today's scientific code involves long-running computer simulations and hence powerful computing infrastructures are needed. Traditionally, research in high-performance computing has focused on executing code as fast as possible, while energy has been recently recognized as another goal to consider. Yet, energy-driven research has mostly focused on the hardware and middleware layers, but few efforts target the application level, where many energy-aware optimizations are possible. We revisit a catalog of Java primitives commonly used in OO scientific programming, or micro-benchmarks, to identify energy-friendly versions of the same primitive. We then apply the micro-benchmarks to classical scientific application kernels and machine learning algorithms for both single-thread and multi-thread implementations on a server. Energy usage reductions at the micro-benchmark level are substantial, while for applications obtained reductions range from 3.90% to 99.18%.Fil: Longo, Mathias. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; Argentina. University of Southern California; Estados UnidosFil: Rodriguez, Ana Virginia. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; ArgentinaFil: Mateos Diaz, Cristian Maximiliano. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; ArgentinaFil: Zunino Suarez, Alejandro Octavio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Instituto Superior de Ingeniería del Software. Universidad Nacional del Centro de la Provincia de Buenos Aires. Instituto Superior de Ingeniería del Software; Argentin

    Riemannian Optimization for Convex and Non-Convex Signal Processing and Machine Learning Applications

    Get PDF
    The performance of most algorithms for signal processing and machine learning applications highly depends on the underlying optimization algorithms. Multiple techniques have been proposed for solving convex and non-convex problems such as interior-point methods and semidefinite programming. However, it is well known that these algorithms are not ideally suited for large-scale optimization with a high number of variables and/or constraints. This thesis exploits a novel optimization method, known as Riemannian optimization, for efficiently solving convex and non-convex problems with signal processing and machine learning applications. Unlike most optimization techniques whose complexities increase with the number of constraints, Riemannian methods smartly exploit the structure of the search space, a.k.a., the set of feasible solutions, to reduce the embedded dimension and efficiently solve optimization problems in a reasonable time. However, such efficiency comes at the expense of universality as the geometry of each manifold needs to be investigated individually. This thesis explains the steps of designing first and second-order Riemannian optimization methods for smooth matrix manifolds through the study and design of optimization algorithms for various applications. In particular, the paper is interested in contemporary applications in signal processing and machine learning, such as community detection, graph-based clustering, phase retrieval, and indoor and outdoor location determination. Simulation results are provided to attest to the efficiency of the proposed methods against popular generic and specialized solvers for each of the above applications

    Frequency Domain Independent Component Analysis Applied To Wireless Communications Over Frequency-selective Channels

    Get PDF
    In wireless communications, frequency-selective fading is a major source of impairment for wireless communications. In this research, a novel Frequency-Domain Independent Component Analysis (ICA-F) approach is proposed to blindly separate and deconvolve signals traveling through frequency-selective, slow fading channels. Compared with existing time-domain approaches, the ICA-F is computationally efficient and possesses fast convergence properties. Simulation results confirm the effectiveness of the proposed ICA-F. Orthogonal Frequency Division Multiplexing (OFDM) systems are widely used in wireless communications nowadays. However, OFDM systems are very sensitive to Carrier Frequency Offset (CFO). Thus, an accurate CFO compensation technique is required in order to achieve acceptable performance. In this dissertation, two novel blind approaches are proposed to estimate and compensate for CFO within the range of half subcarrier spacing: a Maximum Likelihood CFO Correction approach (ML-CFOC), and a high-performance, low-computation Blind CFO Estimator (BCFOE). The Bit Error Rate (BER) improvement of the ML-CFOC is achieved at the expense of a modest increase in the computational requirements without sacrificing the system bandwidth or increasing the hardware complexity. The BCFOE outperforms the existing blind CFO estimator [25, 128], referred to as the YG-CFO estimator, in terms of BER and Mean Square Error (MSE), without increasing the computational complexity, sacrificing the system bandwidth, or increasing the hardware complexity. While both proposed techniques outperform the YG-CFO estimator, the BCFOE is better than the ML-CFOC technique. Extensive simulation results illustrate the performance of the ML-CFOC and BCFOE approaches

    From 3D Point Clouds to Pose-Normalised Depth Maps

    Get PDF
    We consider the problem of generating either pairwise-aligned or pose-normalised depth maps from noisy 3D point clouds in a relatively unrestricted poses. Our system is deployed in a 3D face alignment application and consists of the following four stages: (i) data filtering, (ii) nose tip identification and sub-vertex localisation, (iii) computation of the (relative) face orientation, (iv) generation of either a pose aligned or a pose normalised depth map. We generate an implicit radial basis function (RBF) model of the facial surface and this is employed within all four stages of the process. For example, in stage (ii), construction of novel invariant features is based on sampling this RBF over a set of concentric spheres to give a spherically-sampled RBF (SSR) shape histogram. In stage (iii), a second novel descriptor, called an isoradius contour curvature signal, is defined, which allows rotational alignment to be determined using a simple process of 1D correlation. We test our system on both the University of York (UoY) 3D face dataset and the Face Recognition Grand Challenge (FRGC) 3D data. For the more challenging UoY data, our SSR descriptors significantly outperform three variants of spin images, successfully identifying nose vertices at a rate of 99.6%. Nose localisation performance on the higher quality FRGC data, which has only small pose variations, is 99.9%. Our best system successfully normalises the pose of 3D faces at rates of 99.1% (UoY data) and 99.6% (FRGC data)

    Photonic Crystals: Modeling and Simulation

    Get PDF
    Photonic crystals are periodic electromagnetic media in optical wavelength scale. They possess photonic band gaps (PBGs) that inhibit the existence of light within the crystals in certain wavelength range. Such band gaps produce many interesting optical phenomena. In this dissertation, the frequency (plane wave method, PWM) and time domain (finite difference time domain method, FDTD) methods are developed for their modeling and simulation. The theory and algorithm of plane wave method are studied in detail and implemented in a unique and efficient approach. PWM is used to obtain the gap and mode information of ideal and defective photonic crystals. Several material and structural parameters are shown to affect the band gap. Examples of devices studied include high-Q micro-cavities, linear waveguides, highly efficient sharp bend, and channel drop filters. Effects of defects in photonic crystals are studied in detail. Results show that point defects can form resonator centers of very high quality factors, whereas line defects can form linear waveguides in low/high index material. Highly efficient energy transfer occurs between defect modes. A numerical analysis of the interaction mechanisms between them is carried out and the results serve as a theoretical guide for device designs. Photonic crystal fiber (PCF) with periodic air holes in the cladding is analyzed using a modified PWM method. PCF is able to guide light in single mode in a very broad wavelength region, or it can guide light in air core, offering superior optical properties. By tailoring the microstructures of the cladding, mode shape and group velocity dispersion can be controlled. Finally, a simulation tool using FDTD is developed to study and simulate the device design. The Order-N method using FDTD and periodic boundary conditions is also presented to reduce the heavy computation of PWM method. Light dynamics in PBG devices are simulated and analyzed using FDTD and Perfectly Matched Layer boundary conditions. Excitation sources, mode symmetry, and detection techniques are described to obtain complete and accurate information from the simulations. The combination of the time domain and frequency domain methods provides a powerful tool for analysis and design of PBG devices with high performance

    Biologically inspired feature extraction for rotation and scale tolerant pattern analysis

    Get PDF
    Biologically motivated information processing has been an important area of scientific research for decades. The central topic addressed in this dissertation is utilization of lateral inhibition and more generally, linear networks with recurrent connectivity along with complex-log conformal mapping in machine based implementations of information encoding, feature extraction and pattern recognition. The reasoning behind and method for spatially uniform implementation of inhibitory/excitatory network model in the framework of non-uniform log-polar transform is presented. For the space invariant connectivity model characterized by Topelitz-Block-Toeplitz matrix, the overall network response is obtained without matrix inverse operations providing the connection matrix generating function is bound by unity. It was shown that for the network with the inter-neuron connection function expandable in a Fourier series in polar angle, the overall network response is steerable. The decorrelating/whitening characteristics of networks with lateral inhibition are used in order to develop space invariant pre-whitening kernels specialized for specific category of input signals. These filters have extremely small memory footprint and are successfully utilized in order to improve performance of adaptive neural whitening algorithms. Finally, the method for feature extraction based on localized Independent Component Analysis (ICA) transform in log-polar domain and aided by previously developed pre-whitening filters is implemented. Since output codes produced by ICA are very sparse, a small number of non-zero coefficients was sufficient to encode input data and obtain reliable pattern recognition performance

    Pattern matching of footwear Impressions

    Get PDF
    One of the most frequently secured types of evidence at crime scenes are footware impressions. Identifying the brand and model of the footware can be crucial to narrowing the search for suspects. This is done by forensic experts by comparing the evidence found at the crime scene with a huge list of reference impressions. In order to support the forensic experts an automatic retrieval of the most likely matches is desired.In this thesis different techniques are evaluated to recognize and match footwear impressions, using reference and real crime scene shoeprint images. Due to the conditions in which the shoeprints are found (partial occlusions, variation in shape) a translation, rotation and scale invariant system is needed. A VLAD (Vector of Locally Aggregated Descriptors) encoder is used to clustering descriptors obtained using different approaches, such as SIFT (Scale-Invariant Feature Transform), Dense SIFT in a Triplet CNN (Convolutional Neural Network). These last two approaches provide the best performance results when the parameters are correctly adjusted, using the Cumulative Matching Characteristic curve to evaluate it.En esta tesis se evalúan diferentes técnicas para reconocer y emparejar impresiones de calzado, utilizando imágenes de referencia y de escenas reales de crimen. Debido a las condiciones en que se encuentran las impresiones (oclusiones parciales, variaciones de forma) se necesita un sistema invariante ante translación, rotación y escalado. Para ello se utiliza un codificador VLAD (Vector of Locally Aggregated Descriptors) para agrupar descriptores obtenidos en diferentes enfoques, como SIFT (Scale-Invariant Feature Transform), Dense SIFT y Triplet CNN (Convolutional Neural Network). Estos dos últimos enfoques proporcionan los mejores resultados una vez los parámetros se han ajustado correctamente, utilizando la curva CMC (Characteristic Matching Curve) para realizar la evaluación.En aquesta tesi s'avaluen diferents tècniques per reconèixer i aparellar impressions de calçat, utilitzant imatges de referència i d'escenes reals de crim. Degut a les condicions en què es troben les impressions (oclusions parcials, variació de forma ) es necessita un sistema invariant davant translació, rotació i escalat. Per això s'utilitza un codificador VLAD (Vector of Locally Aggregated Descriptors) per agrupar descriptors obtinguts en diferents enfocaments, com SIFT (Scale-Invariant Feature Transform), Dense SIFT i Triplet CNN (Convolutional Neural Network). Aquests dos últims enfocaments proporcionen els millors resultats un cop els paràmetres s'han ajustat correctament, utilitzant la corba CMC (Characteristic Matching Curve) per realitzar l'avaluació

    Number theoretic techniques applied to algorithms and architectures for digital signal processing

    Get PDF
    Many of the techniques for the computation of a two-dimensional convolution of a small fixed window with a picture are reviewed. It is demonstrated that Winograd's cyclic convolution and Fourier Transform Algorithms, together with Nussbaumer's two-dimensional cyclic convolution algorithms, have a common general form. Many of these algorithms use the theoretical minimum number of general multiplications. A novel implementation of these algorithms is proposed which is based upon one-bit systolic arrays. These systolic arrays are networks of identical cells with each cell sharing a common control and timing function. Each cell is only connected to its nearest neighbours. These are all attractive features for implementation using Very Large Scale Integration (VLSI). The throughput rate is only limited by the time to perform a one-bit full addition. In order to assess the usefulness to these systolic arrays a 'cost function' is developed to compare them with more conventional techniques, such as the Cooley-Tukey radix-2 Fast Fourier Transform (FFT). The cost function shows that these systolic arrays offer a good way of implementing the Discrete Fourier Transform for transforms up to about 30 points in length. The cost function is a general tool and allows comparisons to be made between different implementations of the same algorithm and between dissimilar algorithms. Finally a technique is developed for the derivation of Discrete Cosine Transform (DCT) algorithms from the Winograd Fourier Transform Algorithm. These DCT algorithms may be implemented by modified versions of the systolic arrays proposed earlier, but requiring half the number of cells
    • …
    corecore