1,228 research outputs found
Probability of breaking waves in random seas
SIGLEAvailable from British Library Document Supply Centre- DSC:D94941 / BLDSC - British Library Document Supply CentreGBUnited Kingdo
Output and Labor Input in Manufacturing
macroeconomics, labor, manufacturing
Content-aware partial compression for textual big data analysis in Hadoop
A substantial amount of information in companies and on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. Compression as an effective means to reduce data size has been employed by many emerging data analytic platforms, whom the main purpose of data compression is to save storage space and reduce data transmission cost over the network. Since general purpose compression methods endeavour to achieve higher compression ratios by leveraging data transformation techniques and contextual data, this context-dependency forces the access to the compressed data to be sequential. Processing such compressed data in parallel, such as desirable in a distributed environment, is extremely challenging. This work proposes techniques for more efficient textual big data analysis with an emphasis on content-aware compression schemes suitable for the Hadoop analytic platform. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of public and private real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements
Recommended from our members
A Platform for Scalable Low-Latency Analytics using MapReduce
Today, the ability to process big data has become crucial to the information needs of many enterprise businesses, scientific applications, and governments. Recently, there have been increasing needs of processing data that is not only big but also fast . Here fast data refers to high-speed real-time and near real-time data streams, such as Twitter feeds, search query streams, click streams, impressions, and system logs. To handle both historical data and real-time data, many companies have to maintain multiple systems. However, recent real-world case studies show that maintaining multiple systems cause not only code duplication, but also intensive manual work to partition the analytics workloads and determine which data is processed by which system. These issues point to the need for a general, unified data processing framework to support analytical queries with different latency requirements.
This thesis takes a further step towards building a general, unified system for big and fast data analytics. In order to build such a system, I propose to build on existing solutions on data parallelism and extend them with two new features: incremental processing and stream processing with latency constraints. This thesis starts with Hadoop, the most popular open-source MapReduce implementation, which provides proven scalability based on data parallelism. I answer the following questions: (1) Is Hadoop able to support incremental processing? (2) What are the necessary architecture changes in order to support incremental processing? (3) What are the additional design features needed to support stream processing with latency constraints? The thesis includes three parts that answer each of the questions.
The first part of the thesis validates whether the existing MapReduce implementations can support incremental processing. Incremental processing means that computation is performed as soon as the relevant data becomes available. My extensive benchmark study of Hadoop-based MapReduce systems shows that the widely-used sort-merge implementation for partitioning and parallel processing poses a fundamental barrier to incremental computation. I further propose a cost model, and optimize the Hadoop system configuration based on the model. The benchmark results over the optimized system verify that the barrier to incremental computation is intrinsic, and cannot be removed by tuning system parameters.
In the second part of the thesis, I employ various purely hash-based techniques to enable fast in-memory incremental processing in MapReduce, and frequent key based techniques to extend such processing to workloads that require memory more than available. I evaluate my Hadoop-based prototype equipped with all proposed techniques. The results show that the hash techniques allow the reduce progress to keep up with the map progress with up to 3 orders of magnitude reduction of internal disk spills, and enable results to be returned early.
The third part of the thesis aims to support stream processing with latency constraints based on the incremental processing platform resulted from the second part. I perform a benchmark study to understand the sources of latency. I then propose a number of necessary architecture changes to support stream processing, and augment the platform with new latency-aware model-driven resource planning and latency-aware runtime scheduling techniques to meet user-specified latency constraints while maximizing throughput. Experiments using real-world workloads show that the techniques reduce the latency from tens or hundreds of seconds to sub-second, with 2x-5x increase in throughput. The new platform offers 1-2 orders of magnitude improvements over Storm, a commercial-grade distributed stream system, and Spark Streaming, a state-of-the-art academic prototype, when considering both latency and throughput
A RE-EXAMINATION OF THE ARCHITECTURE OF THE INTERNATIONAL ECONOMIC SYSTEM IN A GLOBAL SETTING: ISSUES AND PROPOSALS
The globalization of the world economy poses major challenges to the prevailing international economic system. The recent trade-investment system raises the issues of the marginalization of countries, firms, and agents if they are not capable to compete with large successful entities. The system engenders conflicts of interest in its interfacing with sovereign domains. In numerous cases such as employment and mutual trade benefits, it can produce zero sum outcomes. Consequently, significant segments of public opinion in many countries have mobilized against it. In the monetary and financial area, the system has from 1945 evolved on a piecemeal and ad hoc basis. In recent years, it has not been able to predict, prevent or effectively deal with financial crisis. It demonstrates a lacuna in global financial governance especially with respect to enforcing its rules on the major countries and bringing the private sector therein. The central institution, the IMF, is shown to be in need of basic reforms involving forging a global vision, reconsidering and updating conditionality, further democratization of political governance, and revamping the exchange rates and surveillance functions.
Life's a Balancing Act: How Men and Women Experience the Work-Home Interface across the Life Course.
As women, and particularly mothers, have increased their labor force participation in the last half-century, and as the expectation for men to spend more time in childcare and housework has increased, more men and women now must combine role responsibilities in the work and home domains than ever before. Using quantitative and qualitative data, this dissertation takes a sociological approach to understanding the variation in experiences at the work-home interface that have arisen in this time of substantial social change. In assessing how taking on work-home roles can influence individual outcomes, I show that it is important to consider both within- and between-gender differences at the work-home interface, and highlight the utility of applying a life course perspective to work-family research. The first empirical chapter uses two waves of a national sample of working adults to document how transitions in family roles are related to perceptions of work spilling over into home, and home spilling over into work, and how these associations differ by gender. This chapter provides evidence that taking on dual work-home role responsibilities can produce both role strain and role enhancement, laying the groundwork for future studies of how strain and enhancement might combine rather than compete. The second empirical chapter uses data from qualitative interviews with medical trainees to show how competing devotions to work, family, and personal lives shape the way important career decisions are made. The third empirical chapter uses nationally-representative and longitudinal data to document the association between working, parenting, and long-term health trajectories. Taken together, these studies suggest that social policy agendas should consider the ways in which work and home domains are linked, as well as focus on important turning points across the life course. Policy efforts to mitigate gender inequality in the labor force would do well to continue to push for gender-neutral work-family policies, as it is clear that conflict at the work-home interface is not a solely female experience, and that both men and women navigate and seek solutions for complex work-home dilemmas across the life course.PhDSociologyUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113444/1/linkathy_1.pd
- …