2,342 research outputs found
Acceleration by Inline Cache for Memory-Intensive Algorithms on FPGA via High-Level Synthesis
Using FPGA-based acceleration of high-performance computing (HPC) applications to reduce energy and power consumption is becoming an interesting option, thanks to the availability of high-level synthesis (HLS) tools that enable fast design cycles. However, obtaining good performance for memory-intensive algorithms, which often exchange large data arrays with external DRAM, still requires time-consuming optimization and good knowledge of hardware design. This article proposes a new design methodology, based on dedicated application- and data array-specific caches. These caches provide most of the benefits that can be achieved by coding optimized DMA-like transfer strategies by hand into the HPC application code, but require only limited manual tuning (basically the selection of architecture and size), are neutral to target HLS tool and technology (FPGA or ASIC), and do not require changes to application code. We show experimental results obtained on five common memory-intensive algorithms from very diverse domains, namely machine learning, data sorting, and computer vision. We test the cost and performance of our caches against both out-of-the-box code originally optimized for a GPU, and manually optimized implementations specifically targeted for FPGAs via HLS. The implementation using our caches achieved an 8X speedup and 2X energy reduction on average with respect to out-of-the-box models using only simple directive-based optimizations (e.g., pipelining). They also achieved comparable performance with much less design effort when compared with the versions that were manually optimized to achieve efficient memory transfers specifically for an FPGA
Group-Level Neural Responses to Service-to-Service Brand Extension
Brand extension is a marketing strategy leveraging well-established brand to promote new offerings provided as goods or service. The previous neurophysiological studies on goods-to-goods brand extension have proposed that categorization and semantic memory processes are involved in brand extension evaluation. However, it is unknown whether these same processes also underlie service-to-service brand extension. The present study, therefore, aims to investigate neural processes in consumers underlying their judgment of service-to-service brand extension. Specifically, we investigated human electroencephalographic responses to extended services that were commonly considered to fit well or badly with parent brand among consumers. For this purpose, we proposed a new stimulus grouping method to find commonly acceptable or unacceptable service extensions. In the experiment, participants reported the acceptability of 56 brand extension pairs, consisting of parent brand name (S1) and extended service name (S2). From individual acceptability responses, we assigned each pair to one of the three fit levels: high- (i.e., highly acceptable), low-, and mid-fit. Next, we selected stimuli that received high/low-fit evaluations from a majority of participants (i.e., >85%) and assigned them to a high/low population-fit group. A comparison of event-related potentials (ERPs) between population-fit groups through a paired t-test showed significant differences in the fronto-central N2 and fronto-parietal P300 amplitudes. We further evaluated inter-subject variability of these ERP components by a decoding analysis that classified N2 and/or P300 amplitudes into a high, or low population-fit class using a support vector machine. Leave-one-subject-out validation revealed classification accuracy of 60.35% with N2 amplitudes, 78.95% with P300, and 73.68% with both, indicating a relatively high inter-subject variability of N2 but low for P300. This validation showed that fronto-parietal P300 reflected neural processes more consistent across subjects in service-to-service brand extension. We further observed that the left frontal P300 amplitude was increased as fit-level increased across stimuli, indicating a semantic retrieval process to evaluate a semantic link between S1 and S2. Parietal P300 showed a higher amplitude in the high population-fit group, reflecting a similarity-based categorization process. In sum, our results suggest that service-to-service brand extension evaluation may share similar neural processes with goods-to-goods brand extension
Complementary use of TEM and APT for the investigation of steels nanostructured by severe plastic deformation
The properties of bulk nanostructured materials are often controlled by
atomic scale features like segregation along defects or composition gradients.
Here we discuss about the complimentary use of TEM and APT to obtain a full
description of nanostructures. The advantages and limitations of both
techniques are highlighted on the basis of experimental data collected in
severely deformed steels with a special emphasis on carbon spatial
distribution
Characteristics of Human Brain Activity during the Evaluation of Service-to-Service Brand Extension
Brand extension is a marketing strategy to apply the previously established brand name into new goods or service. A number of studies have reported the characteristics of human event-related potentials (ERPs) in response to the evaluation of goods-to-goods brand extension. In contrast, human brain responses to the evaluation of service extension are relatively unexplored. The aim of this study was investigating cognitive processes underlying the evaluation of service-to-service brand extension with electroencephalography (EEG). A total of 56 text stimuli composed of service brand name (S1) followed by extended service name (S2) were presented to participants. The EEG of participants was recorded while participants were asked to evaluate whether a given brand extension was acceptable or not. The behavioral results revealed that participants could evaluate brand extension though they had little knowledge about the extended services, indicating the role of brand in the evaluation of the services. Additionally, we developed a method of grouping brand extension stimuli according to the fit levels obtained from behavioral responses, instead of grouping of stimuli a priori. The ERP analysis identified three components during the evaluation of brand extension: N2, P300, and N400. No difference in the N2 amplitude was found among the different levels of a fit between S1 and S2. The P300 amplitude for the low level of fit was greater than those for higher levels (p < 0.05). The N400 amplitude was more negative for the mid- and high-level fits than the low level. The ERP results of P300 and N400 indicate that the early stage of brain extension evaluation might first detect low-fit brand extension as an improbable target followed by the late stage of the integration of S2 into S1. Along with previous findings, our results demonstrate different cognitive evaluation of service-to-service brand extension from goods-to-goods
Performance and energy-efficient implementation of a smart city application on FPGAs
The continuous growth of modern cities and the request for better quality of life, coupled with the increased availability of computing resources, lead to an increased attention to smart city services. Smart cities promise to deliver a better life to their inhabitants while simultaneously reducing resource requirements and pollution. They are thus perceived as a key enabler to sustainable growth. Out of many other issues, one of the major concerns for most cities in the world is traffic, which leads to a huge waste of time and energy, and to increased pollution. To optimize traffic in cities, one of the first steps is to get accurate information in real time about the traffic flows in the city. This can be achieved through the application of automated video analytics to the video streams provided by a set of cameras distributed throughout the city. Image sequence processing can be performed both peripherally and centrally. In this paper, we argue that, since centralized processing has several advantages in terms of availability, maintainability and cost, it is a very promising strategy to enable effective traffic management even in large cities. However, the computational costs are enormous, and thus require an energy-efficient High-Performance Computing approach. Field Programmable Gate Arrays (FPGAs) provide comparable computational resources to CPUs and GPUs, yet require much lower amounts of energy per operation (around 6 and 10 for the application considered in this case study). They are thus preferred resources to reduce both energy supply and cooling costs in the huge datacenters that will be needed by Smart Cities. In this paper, we describe efficient implementations of high-performance algorithms that can process traffic camera image sequences to provide traffic flow information in real-time at a low energy and power cost
Development Of Al-B-C Master Alloy Under External Fields
This study investigates the application of external fields in the development of an Al-B-C alloy, with the aim of synthesizing in situ Al3BC particles. A combination of ultrasonic cavitation and distributive mixing was applied for uniform dispersion of insoluble graphite particles in the Al melt, improving their wettability and its subsequent incorporation into the Al matrix. Lower operating temperatures facilitated the reduction in the amount of large clusters of reaction phases, with Al3BC being identified as the main phase in XRD analysis. The distribution of Al3BC particles was quantitatively evaluated. Grain refinement experiments reveal that Al-B-C alloy can act as a master alloy for Al-4Cu and AZ91D alloys, with average grain size reduction around 50% each at 1wt%Al-1.5B-2C additions
Performance and energy-efficient implementation of a smart city application on FPGAs
The continuous growth of modern cities and the request for better quality of life, coupled with the increased availability of computing resources, lead to an increased attention to smart city services. Smart cities promise to deliver a better life to their inhabitants while simultaneously reducing resource requirements and pollution. They are thus perceived as a key enabler to sustainable growth. Out of many other issues, one of the major concerns for most cities in the world is traffic, which leads to a huge waste of time and energy, and to increased pollution. To optimize traffic in cities, one of the first steps is to get accurate information in real time about the traffic flows in the city. This can be achieved through the application of automated video analytics to the video streams provided by a set of cameras distributed throughout the city. Image sequence processing can be performed both peripherally and centrally. In this paper, we argue that, since centralized processing has several advantages in terms of availability, maintainability and cost, it is a very promising strategy to enable effective traffic management even in large cities. However, the computational costs are enormous, and thus require an energy-efficient High-Performance Computing approach. Field Programmable Gate Arrays (FPGAs) provide comparable computational resources to CPUs and GPUs, yet require much lower amounts of energy per operation (around 6 × and 10 × for the application considered in this case study). They are thus preferred resources to reduce both energy supply and cooling costs in the huge datacenters that will be needed by Smart Cities. In this paper, we describe efficient implementations of high-performance algorithms that can process traffic camera image sequences to provide traffic flow information in real-time at a low energy and power cost
Synthesis of graft copolymers based on hyaluronan and poly(3-hydroxyalkanoates)
This work reports the synthesis and characterisation of new amphiphilic hyaluronan (HA) grafted with poly(3-hydroxyalkanoates) (PHAs) conjugates. Hydrolytic depolymerisation of PHAs was used for the synthesis of defined oligo(3-hydroxyalkanoates)-containing carboxylic terminal moieties. A kinetic study of the depolymerisation was followed to prepare oligomers of required molecular weight. PHAs were coupled with hydroxyl groups of HA mediated by N, N'-carbonyldiimidazole (CDI) or HSTU Tetramethyl-O-(N-succinimidyl) uronium hexafluorophosphate. For the first time, the covalent bonding of oligo derivatives of P(3-hydroxybutyrate), P(3-hydroxyoctanoate), P(3-hydroxyoctanoate-co-3-hydroxydecanoate) and P(3-hydroxyoctanoate-co-3-hydroxydecanoate-co-3-hydroxydodecanoate) and HA was achieved by “grafting to” strategy. Achieved grafting degree was a function of hydrophobicity of PHA, Mw and polarity of the solvent. The most suitable reaction conditions were observed for oligo (3-hydroxybutyrate) grafted to HA (grafting degree of 14%). Graft copolymers were characterized by FT-IR, NMR, DSC and SEC-MALLS. Graft copolymers can be physically loaded with hydrophobic drugs and may serve as drug delivery system
Systematic Parameterization, Storage, and Representation of Volumetric DICOM Data
Tomographic medical imaging systems produce hundreds to thousands of slices, enabling three-dimensional (3D) analysis. Radiologists process these images through various tools and techniques in order to generate 3D renderings for various applications, such as surgical planning, medical education, and volumetric measurements. To save and store these visualizations, current systems use snapshots or video exporting, which prevents further optimizations and requires the storage of significant additional data. The Grayscale Softcopy Presentation State extension of the Digital Imaging and Communications in Medicine (DICOM) standard resolves this issue for two-dimensional (2D) data by introducing an extensive set of parameters, namely 2D Presentation States (2DPR), that describe how an image should be displayed. 2DPR allows storing these parameters instead of storing parameter applied images, which cause unnecessary duplication of the image data. Since there is currently no corresponding extension for 3D data, in this study, a DICOM-compliant object called 3D presentation states (3DPR) is proposed for the parameterization and storage of 3D medical volumes. To accomplish this, the 3D medical visualization process is divided into four tasks, namely pre-processing, segmentation, post-processing, and rendering. The important parameters of each task are determined. Special focus is given to the compression of segmented data, parameterization of the rendering process, and DICOM-compliant implementation of the 3DPR object. The use of 3DPR was tested in a radiology department on three clinical cases, which require multiple segmentations and visualizations during the workflow of radiologists. The results show that 3DPR can effectively simplify the workload of physicians by directly regenerating 3D renderings without repeating intermediate tasks, increase efficiency by preserving all user interactions, and provide efficient storage as well as transfer of visualized data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s40846-015-0097-5) contains supplementary material, which is available to authorized users
- …