745 research outputs found

    Multimedia big data computing for in-depth event analysis

    Get PDF
    While the most part of ”big data” systems target text-based analytics, multimedia data, which makes up about 2/3 of internet traffic, provide unprecedented opportunities for understanding and responding to real world situations and challenges. Multimedia Big Data Computing is the new topic that focus on all aspects of distributed computing systems that enable massive scale image and video analytics. During the course of this paper we describe BPEM (Big Picture Event Monitor), a Multimedia Big Data Computing framework that operates over streams of digital photos generated by online communities, and enables monitoring the relationship between real world events and social media user reaction in real-time. As a case example, the paper examines publicly available social media data that relate to the Mobile World Congress 2014 that has been harvested and analyzed using the described system.Peer ReviewedPostprint (author's final draft

    Access to vectors in multi-module memories

    Get PDF
    The poor bandwidth obtained from memory when conflicts arise in the modules or in the interconnection network degrades the performance of computers. Address transformation schemes, such as interleaving, skewing and linear transformations, have been proposed to achieve conflict-free access for streams with constant stride. However, this is achieved only for some strides. In this paper, we summarize a mechanism to request the elements in an out-of-order way which allows to achieve conflict-free access for a larger number of strides. We study the cases of a single vector processor and of a vector multiprocessor system. For this latter case, we propose a synchronous mode of accessing memory that can be applied in SIMD machines or in MIMD systems with decoupled access and execution.Peer ReviewedPostprint (published version

    Analysis of the overheads incurred due to speculation in a task based programming model

    Get PDF
    In order to efficiently utilize the ever increasing processing power of multi-cores, a programmer must extract as much parallelism as possible from a given application. However with every such attempt there is an associated overhead of its implementation. A parallelization technique is beneficial only if its respective overhead is less than the performance gains realized. In this paper we analyze the overhead of one such endeavor where, in SMPSs, speculation is used to execute tasks ahead in time. Speculation is used to overcome the synchronization pragmas in SMPSs which block the generation of work and lead to the underutilization of the available resources. TinySTM, a Software Transactional Memory library is used to maintain correctness in case of mis-speculation. In this paper, we analyze the affect of TinySTM on a set of SMPSs applications which employ speculation to improve the performance. We show that for the chosen set of benchmarks, no performance gains are achieved if the application spends more than 1% of its execution time in TinySTM.Peer ReviewedPostprint (published version

    TaskPoint: sampled simulation of task-based programs

    Get PDF
    Sampled simulation is a mature technique for reducing simulation time of single-threaded programs, but it is not directly applicable to simulation of multi-threaded architectures. Recent multi-threaded sampling techniques assume that the workload assigned to each thread does not change across multiple executions of a program. This assumption does not hold for dynamically scheduled task-based programming models. Task-based programming models allow the programmer to specify program segments as tasks which are instantiated many times and scheduled dynamically to available threads. Due to system noise and variation in scheduling decisions, two consecutive executions on the same machine typically result in different instruction streams processed by each thread. In this paper, we propose TaskPoint, a sampled simulation technique for dynamically scheduled task-based programs. We leverage task instances as sampling units and simulate only a fraction of all task instances in detail. Between detailed simulation intervals we employ a novel fast-forward mechanism for dynamically scheduled programs. We evaluate the proposed technique on a set of 19 task-based parallel benchmarks and two different architectures. Compared to detailed simulation, TaskPoint accelerates architectural simulation with 64 simulated threads by an average factor of 19.1 at an average error of 1.8% and a maximum error of 15.0%.This work has been supported by the Spanish Government (Severo Ochoa grants SEV2015-0493, SEV-2011-00067), the Spanish Ministry of Science and Innovation (contract TIN2015-65316-P), Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272), the RoMoL ERC Advanced Grant (GA 321253), the European HiPEAC Network of Excellence and the Mont-Blanc project (EU-FP7-610402 and EU-H2020-671697). M. Moreto has been partially supported by the Ministry of Economy and Competitiveness under Juan de la Cierva postdoctoral fellowship JCI-2012-15047. M. Casas is supported by the Ministry of Economy and Knowledge of the Government of Catalonia and the Cofund programme of the Marie Curie Actions of the EUFP7 (contract 2013BP B 00243). T.Grass has been partially supported by the AGAUR of the Generalitat de Catalunya (grant 2013FI B 0058).Peer ReviewedPostprint (author's final draft

    Hierarchical clustered register file organization for VLIW processors

    Get PDF
    Technology projections indicate that wire delays will become one of the biggest constraints in future microprocessor designs. To avoid long wire delays and therefore long cycle times, processor cores must be partitioned into components so that most of the communication is done locally. In this paper, we propose a novel register file organization for VLIW cores that combines clustering with a hierarchical register file organization. Functional units are organized in clusters, each one with a local first level register file. The local register files are connected to a global second level register file, which provides access to memory. All intercluster communications are done through the second level register file. This paper also proposes MIRS-HC, a novel modulo scheduling technique that simultaneously performs instruction scheduling, cluster selection, inserts communication operations, performs register allocation and spill insertion for the proposed organization. The results show that although more cycles are required to execute applications, the execution time is reduced due to a shorter cycle time. In addition, the combination of clustering and hierarchy provides a larger design exploration space that trades-off performance and technology requirements.Peer ReviewedPostprint (published version

    Hypernode reduction modulo scheduling

    Get PDF
    Software pipelining is a loop scheduling technique that extracts parallelism from loops by overlapping the execution of several consecutive iterations. Most prior scheduling research has focused on achieving minimum execution time, without regarding register requirements. Most strategies tend to stretch operand lifetimes because they schedule some operations too early or too late. The paper presents a novel strategy that simultaneously schedules some operations late and other operations early, minimizing all the stretchable dependencies and therefore reducing the registers required by the loop. The key of this strategy is a pre-ordering that selects the order in which the operations will be scheduled. The results show that the method described in this paper performs better than other heuristic methods and almost as well as a linear programming method but requiring much less time to produce the schedules.Peer ReviewedPostprint (published version

    Modulo scheduling with integrated register spilling for clustered VLIW architectures

    Get PDF
    Clustering is a technique to decentralize the design of future wide issue VLIW cores and enable them to meet the technology constraints in terms of cycle time, area and power dissipation. In a clustered design, registers and functional units are grouped in clusters so that new instructions are needed to move data between them. New aggressive instruction scheduling techniques are required to minimize the negative effect of resource clustering and delays in moving data around. In this paper we present a novel software pipelining technique that performs instruction scheduling with reduced register requirements, register allocation, register spilling and inter-cluster communication in a single step. The algorithm uses limited backtracking to reconsider previously taken decisions. This backtracking provides the algorithm with additional possibilities for obtaining high throughput schedules with low spill code requirements for clustered architectures. We show that the proposed approach outperforms previously proposed techniques and that it is very scalable independently of the number of clusters, the number of communication buses and communication latency. The paper also includes an exploration of some parameters in the design of future clustered VLIW cores.Peer ReviewedPostprint (published version

    Swing modulo scheduling: a lifetime-sensitive approach

    Get PDF
    This paper presents a novel software pipelining approach, which is called Swing Modulo Scheduling (SMS). It generates schedules that are near optimal in terms of initiation interval, register requirements and stage count. Swing Modulo Scheduling is an heuristic approach that has a low computational cost. The paper describes the technique and evaluates it for the Perfect Club benchmark suite. SMS is compared with other heuristic methods showing that it outperforms them in terms of the quality of the obtained schedules and compilation time. SMS is also compared with an integer linear programming approach that generates optimum schedules but with a huge computational cost, which makes it feasible only for very small loops. For a set of small loops, SMS obtained the optimum initiation interval in all the cases and its schedules required only 5% more registers and a 1% higher stage count than the optimumPeer ReviewedPostprint (published version

    Modulo scheduling with reduced register pressure

    Get PDF
    Software pipelining is a scheduling technique that is used by some product compilers in order to expose more instruction level parallelism out of innermost loops. Module scheduling refers to a class of algorithms for software pipelining. Most previous research on module scheduling has focused on reducing the number of cycles between the initiation of consecutive iterations (which is termed II) but has not considered the effect of the register pressure of the produced schedules. The register pressure increases as the instruction level parallelism increases. When the register requirements of a schedule are higher than the available number of registers, the loop must be rescheduled perhaps with a higher II. Therefore, the register pressure has an important impact on the performance of a schedule. This paper presents a novel heuristic module scheduling strategy that tries to generate schedules with the lowest II, and, from all the possible schedules with such II, it tries to select that with the lowest register requirements. The proposed method has been implemented in an experimental compiler and has been tested for the Perfect Club benchmarks. The results show that the proposed method achieves an optimal II for at least 97.5 percent of the loops and its compilation time is comparable to a conventional top-down approach, whereas the register requirements are lower. In addition, the proposed method is compared with some other existing methods. The results indicate that the proposed method performs better than other heuristic methods and almost as well as linear programming methods, which obtain optimal solutions but are impractical for product compilers because their computing cost grows exponentially with the number of operations in the loop body.Peer ReviewedPostprint (published version
    corecore