99 research outputs found

    Methods to Improve Applicability and Efficiency of Distributed Data-Centric Compute Frameworks

    Get PDF
    The success of modern applications depends on the insights they collect from their data repositories. Data repositories for such applications currently exceed exabytes and are rapidly increasing in size, as they collect data from varied sources - web applications, mobile phones, sensors and other connected devices. Distributed storage and data-centric compute frameworks have been invented to store and analyze these large datasets. This dissertation focuses on extending the applicability and improving the efficiency of distributed data-centric compute frameworks

    Improving Efficiency in Deep Learning for Large Scale Visual Recognition

    Get PDF
    The emerging recent large scale visual recognition methods, and in particular the deep Convolutional Neural Networks (CNN), are promising to revolutionize many computer vision based artificial intelligent applications, such as autonomous driving and online image retrieval systems. One of the main challenges in large scale visual recognition is the complexity of the corresponding algorithms. This is further exacerbated by the fact that in most real-world scenarios they need to run in real time and on platforms that have limited computational resources. This dissertation focuses on improving the efficiency of such large scale visual recognition algorithms from several perspectives. First, to reduce the complexity of large scale classification to sub-linear with the number of classes, a probabilistic label tree framework is proposed. A test sample is classified by traversing the label tree from the root node. Each node in the tree is associated with a probabilistic estimation of all the labels. The tree is learned recursively with iterative maximum likelihood optimization. Comparing to the hard label partition proposed previously, the probabilistic framework performs classification more accurately with similar efficiency. Second, we explore the redundancy of parameters in Convolutional Neural Networks (CNN) and employ sparse decomposition to significantly reduce both the amount of parameters and computational complexity. Both inter-channel and inner-channel redundancy is exploit to achieve more than 90\% sparsity with approximately 1\% drop of classification accuracy. We also propose a CPU based efficient sparse matrix multiplication algorithm to reduce the actual running time of CNN models with sparse convolutional kernels. Third, we propose a multi-stage framework based on CNN to achieve better efficiency than a single traditional CNN model. With a combination of cascade model and the label tree framework, the proposed method divides the input images in both the image space and the label space, and processes each image with CNN models that are most suitable and efficient. The average complexity of the framework is significantly reduced, while the overall accuracy remains the same as in the single complex model

    IMPROVING MULTIBANK MEMORY ACCESS PARALLELISM ON SIMT ARCHITECTURES

    Get PDF
    Memory mapping has traditionally been an important optimization problem for high-performance parallel systems. Today, these issues are increasingly affecting a much wider range of platforms. Several techniques have been presented to solve bank conflicts and reduce memory access latency but none of them turns out to be generally applicable to different application contexts. One of the ambitious goals of this Thesis is to contribute to modelling the problem of the memory mapping in order to find an approach that generalizes on existing conflict-avoiding techniques, supporting a systematic exploration of feasible mapping schemes

    Techniques of data prefetching, replication, and consistency in the Internet

    Get PDF
    Internet has become a major infrastructure for information sharing in our daily life, and indispensable to critical and large applications in industry, government, business, and education. Internet bandwidth (or the network speed to transfer data) has been dramatically increased, however, the latency time (or the delay to physically access data) has been reduced in a much slower pace. The rich bandwidth and lagging latency can be effectively coped with in Internet systems by three data management techniques: caching, replication, and prefetching. The focus of this dissertation is to address the latency problem in Internet by utilizing the rich bandwidth and large storage capacity for efficiently prefetching data to significantly improve the Web content caching performance, by proposing and implementing scalable data consistency maintenance methods to handle Internet Web address caching in distributed name systems (DNS), and to handle massive data replications in peer-to-peer systems. While the DNS service is critical in Internet, peer-to-peer data sharing is being accepted as an important activity in Internet.;We have made three contributions in developing prefetching techniques. First, we have proposed an efficient data structure for maintaining Web access information, called popularity-based Prediction by Partial Matching (PB-PPM), where data are placed and replaced guided by popularity information of Web accesses, thus only important and useful information is stored. PB-PPM greatly reduces the required storage space, and improves the prediction accuracy. Second, a major weakness in existing Web servers is that prefetching activities are scheduled independently of dynamically changing server workloads. Without a proper control and coordination between the two kinds of activities, prefetching can negatively affect the Web services and degrade the Web access performance. to address this problem, we have developed a queuing model to characterize the interactions. Guided by the model, we have designed a coordination scheme that dynamically adjusts the prefetching aggressiveness in Web Servers. This scheme not only prevents the Web servers from being overloaded, but it can also minimize the average server response time. Finally, we have proposed a scheme that effectively coordinates the sharing of access information for both proxy and Web servers. With the support of this scheme, the accuracy of prefetching decisions is significantly improved.;Regarding data consistency support for Internet caching and data replications, we have conducted three significant studies. First, we have developed a consistency support technique to maintain the data consistency among the replicas in structured P2P networks. Based on Pastry, an existing and popular P2P system, we have implemented this scheme, and show that it can effectively maintain consistency while prevent hot-spot and node-failure problems. Second, we have designed and implemented a DNS cache update protocol, called DNScup, to provide strong consistency for domain/IP mappings. Finally, we have developed a dynamic lease scheme to timely update the replicas in Internet

    VRCC-3D+: Qualitative spatial and temporal reasoning in 3 dimensions

    Get PDF
    Qualitative Spatial Reasoning (QSR) has varying applications in Geographic Information Systems (GIS), visual programming language semantics, and digital image analysis. Systems for spatial reasoning over a set of objects have evolved in both expressive power and complexity, but implementations or usages of these systems are not common. This is partially due to the computational complexity of the operations required by the reasoner to make informed decisions about its surroundings. These theoretical systems are designed to focus on certain criteria, including efficiency of computation, ease of human comprehension, and expressive power. Sadly, the implementation of these systems is frequently left as an exercise for the reader. Herein, a new QSR system, VRCC-3D+, is proposed that strives to maximize expressive power while minimizing the complexity of reasoning and computational cost of using the system. This system is an evolution of RCC-3D; the system and implementation are constantly being refined to handle the complexities of the reasoning being performed. The refinements contribute to the accuracy, correctness, and speed of the implementation. To improve the accuracy and correctness of the implementation, a way to dynamically change error tolerance in the system to more accurately reflect what the user sees is designed. A method that improves the speed of determining spatial relationships between objects by using composition tables and decision trees is introduced, and improvements to the system itself are recommended; by streamlining the relation set and enforcing strict rules for the precision of the predicates that determine the relationships between objects. A potential use case and prototype implementation is introduced to further motivate the need for implementations of QSR systems, and show that their use is not precluded by computational complexity. --Abstract, page iv

    Identical parallel machine scheduling problems: structural patterns, bounding techniques and solution procedures

    Get PDF
    The work is about fundamental parallel machine scheduling problems which occur in manufacturing systems where a set of jobs with individual processing times has to be assigned to a set of machines with respect to several workload objective functions like makespan minimization, machine covering or workload balancing. In the first chapter of the work an up-to-date survey on the most relevant literature for these problems is given, since the last review dealing with these problems has been published almost 20 years ago. We also give an insight into the relevant literature contributed by the Artificial Intelligence community, where the problem is known as number partitioning. The core of the work is a universally valid characterization of optimal makespan and machine-covering solutions where schedules are evaluated independently from the processing times of the jobs. Based on these novel structural insights we derive several strong dominance criteria. Implemented in a branch-and-bound algorithm these criteria have proved to be effective in limiting the solution space, particularly in the case of small ratios of the number of jobs to the number of machines. Further, we provide a counter-example to a central result by Ho et al. (2009) who proved that a schedule which minimizes the normalized sum of squared workload deviations is necessarily a makespan-optimal one. We explain why their proof is incorrect and present computational results revealing the difference between workload balancing and makespan minimization. The last chapter of the work is about the minimum cardinality bin covering problem which is a dual problem of machine-covering with respect to bounding techniques. We discuss reduction criteria, derive several lower bound arguments and propose construction heuristics as well as a subset sum-based improvement algorithm. Moreover, we present a tailored branch-and-bound method which is able to solve instances with up to 20 bins

    Dynamic re-optimization techniques for stream processing engines and object stores

    Get PDF
    Large scale data storage and processing systems are strongly motivated by the need to store and analyze massive datasets. The complexity of a large class of these systems is rooted in their distributed nature, extreme scale, need for real-time response, and streaming nature. The use of these systems on multi-tenant, cloud environments with potential resource interference necessitates fine-grained monitoring and control. In this dissertation, we present efficient, dynamic techniques for re-optimizing stream-processing systems and transactional object-storage systems.^ In the context of stream-processing systems, we present VAYU, a per-topology controller. VAYU uses novel methods and protocols for dynamic, network-aware tuple-routing in the dataflow. We show that the feedback-driven controller in VAYU helps achieve high pipeline throughput over long execution periods, as it dynamically detects and diagnoses any pipeline-bottlenecks. We present novel heuristics to optimize overlays for group communication operations in the streaming model.^ In the context of object-storage systems, we present M-Lock, a novel lock-localization service for distributed transaction protocols on scale-out object stores to increase transaction throughput. Lock localization refers to dynamic migration and partitioning of locks across nodes in the scale-out store to reduce cross-partition acquisition of locks. The service leverages the observed object-access patterns to achieve lock-clustering and deliver high performance. We also present TransMR, a framework that uses distributed, transactional object stores to orchestrate and execute asynchronous components in amorphous data-parallel applications on scale-out architectures

    Proceedings of the Workshop on Change of Representation and Problem Reformulation

    Get PDF
    The proceedings of the third Workshop on Change of representation and Problem Reformulation is presented. In contrast to the first two workshops, this workshop was focused on analytic or knowledge-based approaches, as opposed to statistical or empirical approaches called 'constructive induction'. The organizing committee believes that there is a potential for combining analytic and inductive approaches at a future date. However, it became apparent at the previous two workshops that the communities pursuing these different approaches are currently interested in largely non-overlapping issues. The constructive induction community has been holding its own workshops, principally in conjunction with the machine learning conference. While this workshop is more focused on analytic approaches, the organizing committee has made an effort to include more application domains. We have greatly expanded from the origins in the machine learning community. Participants in this workshop come from the full spectrum of AI application domains including planning, qualitative physics, software engineering, knowledge representation, and machine learning
    • …
    corecore