598 research outputs found

    MADServer: An Architecture for Opportunistic Mobile Advanced Delivery

    Get PDF
    Rapid increases in cellular data traffic demand creative alternative delivery vectors for data. Despite the conceptual attractiveness of mobile data offloading, no concrete web server architectures integrate intelligent offloading in a production-ready and easily deployable manner without relying on vast infrastructural changes to carriers’ networks. Delay-tolerant networking technology offers the means to do just this. We introduce MADServer, a novel DTN-based architecture for mobile data offloading that splits web con- tent among multiple independent delivery vectors based on user and data context. It enables intelligent data offload- ing, caching, and querying solutions which can be incorporated in a manner that still satisfies user expectations for timely delivery. At the same time, it allows for users who have poor or expensive connections to the cellular network to leverage multi-hop opportunistic routing to send and receive data. We also present a preliminary implementation of MADServer and provide real-world performance evaluations

    Investment Technology for Trading Business: Delineating Requirements, Processes, and Design Decisions for Order-Management Systems

    Get PDF
    The requirements and processes for building a robust order management system (OMS) for trading of investments within financial services firms are investigated and enumerated. Requirements and process documentation are not readily available to members of the general public because they are considered a source of competitive advantage in a highly profitable industry. This paper provides single source documentation of those requirements and processes in the context of the Vested OMS application, which was constructed specifically to meet industry needs in this area. This paper describes in detail the core functionality investment businesses currently demand and the software development techniques used to construct a core system to meet those demands

    Understanding and Optimizing Flash-based Key-value Systems in Data Centers

    Get PDF
    Flash-based key-value systems are widely deployed in today’s data centers for providing high-speed data processing services. These systems deploy flash-friendly data structures, such as slab and Log Structured Merge(LSM) tree, on flash-based Solid State Drives(SSDs) and provide efficient solutions in caching and storage scenarios. With the rapid evolution of data centers, there appear plenty of challenges and opportunities for future optimizations. In this dissertation, we focus on understanding and optimizing flash-based key-value systems from the perspective of workloads, software, and hardware as data centers evolve. We first propose an on-line compression scheme, called SlimCache, considering the unique characteristics of key-value workloads, to virtually enlarge the cache space, increase the hit ratio, and improve the cache performance. Furthermore, to appropriately configure increasingly complex modern key-value data systems, which can have more than 50 parameters with additional hardware and system settings, we quantitatively study and compare five multi-objective optimization methods for auto-tuning the performance of an LSM-tree based key-value store in terms of throughput, the 99th percentile tail latency, convergence time, real-time system throughput, and the iteration process, etc. Last but not least, we conduct an in-depth, comprehensive measurement work on flash-optimized key-value stores with recently emerging 3D XPoint SSDs. We reveal several unexpected bottlenecks in the current key-value store design and present three exemplary case studies to showcase the efficacy of removing these bottlenecks with simple methods on 3D XPoint SSDs. Our experimental results show that our proposed solutions significantly outperform traditional methods. Our study also contributes to providing system implications for auto-tuning the key-value system on flash-based SSDs and optimizing it on revolutionary 3D XPoint based SSDs

    Using Textual Emotion Extraction in Context-Aware Computing

    Get PDF
    In 2016, the number of global smartphone users will surpass 2 billion. The common owner uses about 27 apps monthly. On average, users of SwiftKey, an alternative Android software keyboard, type approximately 1800 characters a day. Still, all of the user-generated data of these apps is, for the most part, unused by the owner itself. To change this, we conducted research in Context-Aware Computing, Natural Language Processing and Affective Computing. The goal was to create an environment for recording this non-used contextual data without losing its historical context and to create an algorithm that is able to extract emotions from text. Therefore, we are introducing Emotext, a textual emotion extraction algorithm that uses conceptnet5’s realworld knowledge for word-interpretation, as well as Cofra, a framework for recording contextual data with time-based versioning

    Smart Community Wireless Platforms: Costs, Benefits, Drawbacks, Risks

    Get PDF
    A wireless network covering most of the city is a key component of a smart city. Although the wireless network offers many benefits, a key issue is the costs associated with laying out the infrastructure and services, making the bandwidth available and maintaining the services. We believe community involvement is important in building city-wide wireless networks. Indeed, many community wireless networks have been successful. Could the city inspire and assist the communities with building their wireless networks, and then unite them for a city-wide wireless network? We address the first question by presenting a model where municipality, communities and smart utility providers work together to create a platform, smart community wireless platform, for a community where platform sides work together toward achieving smart community objectives. One challenge is to estimate the total cost, benefits and drawbacks of such platforms. Another challenge is to model risks and mitigation plans for their success. We examine relevant dynamics in measuring the total cost, benefits, drawbacks and risks of smart community wireless platforms and develop models for estimating their success under various scenarios. To develop models, we use an intelligence framework that incorporates systems dynamics modelling with statistical, economical and machine learning methods

    Recurring Query Processing on Big Data

    Get PDF
    The advances in hardware, software, and networks have enabled applications from business enterprises, scientific and engineering disciplines, to social networks, to generate data at unprecedented volume, variety, velocity, and varsity not possible before. Innovation in these domains is thus now hindered by their ability to analyze and discover knowledge from the collected data in a timely and scalable fashion. To facilitate such large-scale big data analytics, the MapReduce computing paradigm and its open-source implementation Hadoop is one of the most popular and widely used technologies. Hadoop\u27s success as a competitor to traditional parallel database systems lies in its simplicity, ease-of-use, flexibility, automatic fault tolerance, superior scalability, and cost effectiveness due to its use of inexpensive commodity hardware that can scale petabytes of data over thousands of machines. Recurring queries, repeatedly being executed for long periods of time on rapidly evolving high-volume data, have become a bedrock component in most of these analytic applications. Efficient execution and optimization techniques must be designed to assure the responsiveness and scalability of these recurring queries. In this dissertation, we thoroughly investigate topics in the area of recurring query processing on big data. In this dissertation, we first propose a novel scalable infrastructure called Redoop that treats recurring query over big evolving data as first class citizens during query processing. This is in contrast to state-of-the-art MapReduce/Hadoop system experiencing significant challenges when faced with recurring queries including redundant computations, significant latencies, and huge application development efforts. Redoop offers innovative window-aware optimization techniques for recurring query execution including adaptive window-aware data partitioning, window-aware task scheduling, and inter-window caching mechanisms. Redoop retains the fault-tolerance of MapReduce via automatic cache recovery and task re-execution support as well. Second, we address the crucial need to accommodate hundreds or even thousands of recurring analytics queries that periodically execute over frequently updated data sets, e.g., latest stock transactions, new log files, or recent news feeds. For many applications, such recurring queries come with user-specified service-level agreements (SLAs), commonly expressed as the maximum allowed latency for producing results before their merits decay. On top of Redoop, we built a scalable multi-query sharing engine tailored for recurring workloads in the MapReduce infrastructure, called Helix. Helix deploys new sliced window-alignment techniques to create sharing opportunities among recurring queries without introducing additional I/O overheads or unnecessary data scans. Furthermore, Helix introduces a cost/benefit model for creating a sharing plan among the recurring queries, and a scheduling strategy for executing them to maximize the SLA satisfaction. Third, recurring analytics queries tend to be expensive, especially when query processing consumes data sets in the hundreds of terabytes or more. Time sensitive recurring queries, such as fraud detection, often come with tight response time constraints as query deadlines. Data sampling is a popular technique for computing approximate results with an acceptable error bound while reducing high-demand resource consumption and thus improving query turnaround times. In this dissertation, we propose the first fast approximate query engine for recurring workloads in the MapReduce infrastructure, called Faro. Faro introduces two key innovations: (1) a deadline-aware sampling strategy that builds samples from the original data with reduced sample sizes compared to uniform sampling, and (2) adaptive resource allocation strategies that maximally improve the approximate results while assuring to still meet the response time requirements specified in recurring queries. In our comprehensive experimental study of each part of this dissertation, we demonstrate the superiority of the proposed strategies over state-of-the-art techniques in scalability, effectiveness, as well as robustness

    A technology and policy analysis for global E-business

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Engineering Systems Division, Technology and Policy Program, 2002.Includes bibliographical references (p. 49-51).We introduce an e-business analytical framework that focuses on transaction flows, including information, physical goods, and services. Within this framework, global e-business involves transaction flows that cross both organizational and national boundaries. Many challenging technology and policy issues arise from this trans-boundary characteristic of global e-business. These issues are analyzed using web aggregation as an example global e-business application. We start the analysis by introducing web aggregation services and their enabling technologies. Our survey of current status of web aggregation indicates that most services are still operated regionally despite their global presence. Although benefits of web aggregation have been realized in regions with extensive use of information aggregation, little is done at the global level. Our case study on worldwide price distribution of a nearly homogeneous consumer electronics product indicates great potential for global aggregation to bring information and efficiency to the global market. In addition to lack of global integration, we identified other deficiencies of web aggregation. Technological challenges and possible solutions to overcoming these deficiencies are discussed. However, having technological capability for trans-boundary information flow does not solve all problems in global aggregation. National policies often prohibit such flow into nations that have different policies, especially in database and privacy protection areas. We analyze these policy issues and propose future research on international policy harmonization.by Hongwei Zhu.S.M

    Scripts in a Frame: A Framework for Archiving Deferred Representations

    Get PDF
    Web archives provide a view of the Web as seen by Web crawlers. Because of rapid advancements and adoption of client-side technologies like JavaScript and Ajax, coupled with the inability of crawlers to execute these technologies effectively, Web resources become harder to archive as they become more interactive. At Web scale, we cannot capture client-side representations using the current state-of-the art toolsets because of the migration from Web pages to Web applications. Web applications increasingly rely on JavaScript and other client-side programming languages to load embedded resources and change client-side state. We demonstrate that Web crawlers and other automatic archival tools are unable to archive the resulting JavaScript-dependent representations (what we term deferred representations), resulting in missing or incorrect content in the archives and the general inability to replay the archived resource as it existed at the time of capture. Building on prior studies on Web archiving, client-side monitoring of events and embedded resources, and studies of the Web, we establish an understanding of the trends contributing to the increasing unarchivability of deferred representations. We show that JavaScript leads to lower-quality mementos (archived Web resources) due to the archival difficulties it introduces. We measure the historical impact of JavaScript on mementos, demonstrating that the increased adoption of JavaScript and Ajax correlates with the increase in missing embedded resources. To measure memento and archive quality, we propose and evaluate a metric to assess memento quality closer to Web users’ perception. We propose a two-tiered crawling approach that enables crawlers to capture embedded resources dependent upon JavaScript. Measuring the performance benefits between crawl approaches, we propose a classification method that mitigates the performance impacts of the two-tiered crawling approach, and we measure the frontier size improvements observed with the two-tiered approach. Using the two-tiered crawling approach, we measure the number of client-side states associated with each URI-R and propose a mechanism for storing the mementos of deferred representations. In short, this dissertation details a body of work that explores the following: why JavaScript and deferred representations are difficult to archive (establishing the term deferred representation to describe JavaScript dependent representations); the extent to which JavaScript impacts archivability along with its impact on current archival tools; a metric for measuring the quality of mementos, which we use to describe the impact of JavaScript on archival quality; the performance trade-offs between traditional archival tools and technologies that better archive JavaScript; and a two-tiered crawling approach for discovering and archiving currently unarchivable descendants (representations generated by client-side user events) of deferred representations to mitigate the impact of JavaScript on our archives. In summary, what we archive is increasingly different from what we as interactive users experience. Using the approaches detailed in this dissertation, archives can create mementos closer to what users experience rather than archiving the crawlers’ experiences on the Web
    corecore