201 research outputs found
Generalized parallelization methodology for video coding
This paper describes a generalized parallelization methodology for mapping video coding algorithms onto a multiprocessing architecture, through systematic task decomposition, scheduling and performance analysis. It exploits data parallelism inherent in the coding process and performs task scheduling base on task data size and access locality with the aim to hide as much communication overhead as possible. Utilizing Petri-nets and task graphs for representation and analysis, the method enables parallel video frame capturing, buffering and encoding without extra communication overhead. The theoretical speedup analysis indicates that this method offers excellent communication hiding, resulting in system efficiency well above 90%. A H.261 video encoder has been implemented on a TMS320C80 system using this method, and its performance was measured. The theoretical and measured performances are similar in that the measured speedup of the H.261 is 3.67 and 3.76 on four PP for QCIF and 352Ă—240 video, respectively. They correspond to frame rates of 30.7 frame per second (fps) and 9.25 fps, and system efficiency of 91.8% and 94% respectively. As it is, this method is particularly efficient for platforms with small number of parallel processors.published_or_final_versio
Parallelization of the H.261 video coding algorithm on the IBM SP2(R) multiprocessor system
In this paper, the parallelization of the H.261 video coding algorithm on the IBM SP2 multiprocessor system is described. Based on domain decomposition as a framework, data partitioning, data dependencies and communication issues are carefully assessed. From these, two parallel algorithms were developed with the first one maximizes on processor utilization and the second one minimizes on communications. Our analysiis shows that the first algorithm exhibits poor scalability and high communication overhead; and the second algorithm exhibits good scalability and low communication overhead. A best median speed up of 13.72 or 11 frameskec was achieved on 24 processors.published_or_final_versio
Generalized parallelization methodology for video coding
This paper describes a generalized parallelization methodology for mapping video coding algorithms onto a multiprocessing architecture, through systematic task decomposition, scheduling and performance analysis. It exploits data parallelism inherent in the coding process and performs task scheduling base on task data size and access locality with the aim to hide as much communication overhead as possible. Utilizing Petri-nets and task graphs for representation and analysis, the method enables parallel video frame capturing, buffering and encoding without extra communication overhead. The theoretical speedup analysis indicates that this method offers excellent communication hiding, resulting in system efficiency well above 90%. A H.261 video encoder has been implemented on a TMS320C80 system using this method, and its performance was measured. The theoretical and measured performances are similar in that the measured speedup of the H.261 is 3.67 and 3.76 on four PP for QCIF and 352Ă—240 video, respectively. They correspond to frame rates of 30.7 frame per second (fps) and 9.25 fps, and system efficiency of 91.8% and 94% respectively. As it is, this method is particularly efficient for platforms with small number of parallel processors.published_or_final_versio
Spatial and temporal data parallelization of the H.261 video coding algorithm
In this paper, the parallelization of the H.261 video coding algorithm on the IBM SP2 multiprocessor system is described. The effect of parallelizing computations and communications in the spatial, temporal, and both spatial-temporal domains are considered through the study of frame rate, speedup, and implementation efficiency, which are modeled and measured with respect to the number of nodes (n) and parallel methods used. Four parallel algorithms were developed, of which the first two exploited the spatial parallelism in each frame, and the last two exploited both the temporal and spatial parallelism over a sequence of frames. The two spatial algorithms differ in that one utilizes a single communication master, while the other attempts to distribute communications across three masters. On the other hand, the spatial-temporal algorithms use a pipeline structure for exploiting the temporal parallelism together with either a single master or multiple masters. The best median speedup (frame rate) achieved was close to 15[15 frames per second (fps)] for 352 Ă— 240 video on 24 nodes, and 13 (37 fps) for QCIF video, by the spatial algorithm with distributed communications. For n 10, with efficiency up to 70%. The spatial-temporal algorithms achieved average speedup performance, but are most scalable for large n.published_or_final_versio
Novel neighborhood search for multiprocessor scheduling with pipelining
Presents a neighborhood search algorithm for heterogeneous multiprocessor scheduling in which loop pipelining is used to exploit parallelism between iterations. The method adopts a realistic model for interprocessor communication where resource contention is taken into consideration. The schedule representation scheme is flexible so that communication scheduling can be performed in a generic manner. Based on a general time formulation of the schedule performance, the algorithm improves an initial schedule in an efficient way. Experimental results show that significant improvement over existing methods can be obtained. Using the scheduling results, a parallel software video encoder was implemented and real-time performance was achieved.published_or_final_versio
Endogenous repair by the activation of cell survival signalling cascades during the early stages of rat parkinsonism.
published_or_final_versio
Parallelization methodology for video coding - an implementation on the TMS320C80
This paper presents a parallelization methodology for video coding based on the philosophy of hiding as much communications by computation as possible. It models the task/data size, processor cache capacity, and communication contention, through a systematic decomposition and scheduling approach. With the aid of Petri-nets and task graphs for representation and analysis, it employs a triple buffering scheme to enable the functions of frame capture, management, and coding to be performed in parallel. The theoretical speedup analysis indicates that this method offers excellent communication hiding, resulting in system efficiency well above 90%. To prove its practicality, a H.261 video encoder has been implemented on a TMS320C80 system using the method. Its performance was measured, from which the speedup and efficiency figures were calculated. The only difference detected between the theoretical and measured data is the program control overhead that has not been accounted for in the theoretical model. Even with this, the measured speedup of the H.261 is 3.67 and 3.76 on four parallel processors (PPs) for QCIF and 352 Ă— 240 video, respectively, which correspond to frame rate of 30.7 and 9.25 frames per second, and system efficiency of 91.8% and 94%, respectively. This method is particularly efficient for platforms with small number of parallel processors.published_or_final_versio
Adaptive parallel video-coding algorithm
Parallel encoding of video inevitably frame rate gives varying rate performance due to dynamically changing video content and motion field since the encoding process of each macro-block, especially motion estimation, is data dependent. A multiprocessor schedule optimized for a particular frame with certain macro-block encoding time may not be optimized towards another frame with different encoding time, which causes performance degradation to the parallelization. To tackle this problem, we propose a method based on a batch of near-optimal schedules generated at compile-time and a run-time mechanism to select the schedule giving the shortest predicted critical path length. This method has the advantage of being near-optimal using compile-time schedules while involving only run-time selection rather than re-scheduling. Implementation on the IBM SP2 multiprocessor system using 24 processors gives an average speedup of about 13.5 (frame rate of 38.5 frames per second) for a CIF sequence consisting of segments of 6 different scenes. This is equivalent to an average improvement of about 16.9% over the single schedule scheme with schedule adapted to each of the scenes. Using an open test sequence consisting of 8 video segments, the average improvement achieved is 13.2%, i.e. an average speedup of 13.3 (35.6 frames per second).published_or_final_versio
Following Mitochondrial Footprints through a Long Mucosal Path to Lung Cancer
BACKGROUND:Mitochondrial DNA (mtDNA) mutations are reported in different tumors. However, there is no information on the temporal development of the mtDNA mutations/content alteration and their extent in normal and abnormal mucosa continuously exposed to tobacco smoke in lung cancer patients. METHODOLOGY:We examined the pattern of mtDNA alteration (mtDNA mutation and content index) in 25 airway mucosal biopsies, corresponding tumors and normal lymph nodes obtained from three patients with primary lung cancers. In addition, we examined the pattern of mtDNA mutation in corresponding tumors and normal lymph nodes obtained from eight other patients with primary lung cancers. The entire 16.5 kb mitochondrial genome was sequenced on Affymetrix Mitochip v2.0 sequencing platform in every sample. To examine mtDNA content index, we performed real-time PCR analysis. PRINCIPAL FINDINGS:The airway mucosal biopsies obtained from three lung cancer patients were histopathologically negative but exhibited multiple clonal mtDNA mutations detectable in the corresponding tumors. One of the patients was operated twice for the removal of tumor from the right upper and left lower lobe respectively within a span of two years. Both of these tumors exhibited twenty identical mtDNA mutations. MtDNA content increased significantly (P<0.001) in the lung cancer and all the histologically negative mucosal biopsies except one compared to the control lymph node. CONCLUSIONS/SIGNIFICANCE:Our results document the extent of massive clonal patches that develop in lifetime smokers and ultimately give rise to clinically significant cancers. These observations shed light on the extent of disease in the airway of smokers traceable through mtDNA mutation. MtDNA mutation could be a reliable tool for molecular assessment of respiratory epithelium exposed to continuous smoke as well as disease detection and monitoring. Functional analysis of the pathogenic mtDNA mutations may be useful to understand their role in lung tumorigenesis
- …