6 research outputs found

    BagMinHash - Minwise Hashing Algorithm for Weighted Sets

    Full text link
    Minwise hashing has become a standard tool to calculate signatures which allow direct estimation of Jaccard similarities. While very efficient algorithms already exist for the unweighted case, the calculation of signatures for weighted sets is still a time consuming task. BagMinHash is a new algorithm that can be orders of magnitude faster than current state of the art without any particular restrictions or assumptions on weights or data dimensionality. Applied to the special case of unweighted sets, it represents the first efficient algorithm producing independent signature components. A series of tests finally verifies the new algorithm and also reveals limitations of other approaches published in the recent past.Comment: 10 pages, KDD 201

    Multicloud Resource Allocation:Cooperation, Optimization and Sharing

    Get PDF
    Nowadays our daily life is not only powered by water, electricity, gas and telephony but by "cloud" as well. Big cloud vendors such as Amazon, Microsoft and Google have built large-scale centralized data centers to achieve economies of scale, on-demand resource provisioning, high resource availability and elasticity. However, those massive data centers also bring about many other problems, e.g., bandwidth bottlenecks, privacy, security, huge energy consumption, legal and physical vulnerabilities. One of the possible solutions for those problems is to employ multicloud architectures. In this thesis, our work provides research contributions to multicloud resource allocation from three perspectives of cooperation, optimization and data sharing. We address the following problems in the multicloud: how resource providers cooperate in a multicloud, how to reduce information leakage in a multicloud storage system and how to share the big data in a cost-effective way. More specifically, we make the following contributions: Cooperation in the decentralized cloud. We propose a decentralized cloud model in which a group of SDCs can cooperate with each other to improve performance. Moreover, we design a general strategy function for SDCs to evaluate the performance of cooperation based on different dimensions of resource sharing. Through extensive simulations using a realistic data center model, we show that the strategies based on reciprocity are more effective than other strategies, e.g., those using prediction based on historical data. Our results show that the reciprocity-based strategy can thrive in a heterogeneous environment with competing strategies. Multicloud optimization on information leakage. In this work, we firstly study an important information leakage problem caused by unplanned data distribution in multicloud storage services. Then, we present StoreSim, an information leakage aware storage system in multicloud. StoreSim aims to store syntactically similar data on the same cloud, thereby minimizing the user's information leakage across multiple clouds. We design an approximate algorithm to efficiently generate similarity-preserving signatures for data chunks based on MinHash and Bloom filter, and also design a function to compute the information leakage based on these signatures. Next, we present an effective storage plan generation algorithm based on clustering for distributing data chunks with minimal information leakage across multiple clouds. Finally, we evaluate our scheme using two real datasets from Wikipedia and GitHub. We show that our scheme can reduce the information leakage by up to 60% compared to unplanned placement. Furthermore, our analysis in terms of system attackability demonstrates that our scheme makes attacks on information much more complex. Smart data sharing. Moving large amounts of distributed data into the cloud or from one cloud to another can incur high costs in both time and bandwidth. The optimization on data sharing in the multicloud can be conducted from two different angles: inter-cloud scheduling and intra-cloud optimization. We first present CoShare, a P2P inspired decentralized cost effective sharing system for data replication to optimize network transfer among small data centers. Then we propose a data summarization method to reduce the total size of dataset, thereby reducing network transfer

    An Investigation on Benefit-Cost Analysis of Greenhouse Structures in Antalya

    Get PDF
    Significant population increase across the world, loss of cultivable land and increasing demand for food put pressure on agriculture. To meet the demand, greenhouses are built, which are, light structures with transparent cladding material in order to provide controlled microclimatic environment proper for plant production. Conceptually, greenhouses are similar with manufacturing buildings where a controlled environment for manufacturing and production have been provided and proper spaces for standardized production processes have been enabled. Parallel with the trends in the world, particularly in southern regions, greenhouse structures have been increasingly constructed and operated in Turkey. A significant number of greenhouses are located at Antalya. The satellite images demonstrated that for over last three decades, there has been a continuous invasion of greenhouses on all cultivable land. There are various researches and attempts for the improvement of greenhouse design and for increasing food production by decreasing required energy consumption. However, the majority of greenhouses in Turkey are very rudimentary structures where capital required for investment is low, but maintenance requirements are high when compared with new generation greenhouse structures. In this research paper, life-long capital requirements for construction and operation of greenhouse buildings in Antalya has been investigated by using benefit-cost analysis study

    Knowledge Capturing in Design Briefing Process for Requirement Elicitation and Validation

    Get PDF
    Knowledge capturing and reusing are major processes of knowledge management that deal with the elicitation of valuable knowledge via some techniques and methods for use in actual and further studies, projects, services, or products. The construction industry, as well, adopts and uses some of these concepts to improve various construction processes and stages. From pre-design to building delivery knowledge management principles and briefing frameworks have been implemented across project stakeholders: client, design teams, construction teams, consultants, and facility management teams. At pre-design and design stages, understanding the client’s needs and users’ knowledge are crucial for identifying and articulating the expected requirements and objectives. Due to underperforming results and missed goals and objectives, many projects finish with highly dissatisfied clients and loss of contracts for some organizations. Knowledge capturing has beneficial effects via its principles and methods on requirement elicitation and validation at the briefing stage between user, client and designer. This paper presents the importance and usage of knowledge capturing and reusing in briefing process at pre-design and design stages especially the involvement of client and user, and explores the techniques and technologies that are usable in briefing process for requirement elicitation

    Safety and Reliability - Safe Societies in a Changing World

    Get PDF
    The contributions cover a wide range of methodologies and application areas for safety and reliability that contribute to safe societies in a changing world. These methodologies and applications include: - foundations of risk and reliability assessment and management - mathematical methods in reliability and safety - risk assessment - risk management - system reliability - uncertainty analysis - digitalization and big data - prognostics and system health management - occupational safety - accident and incident modeling - maintenance modeling and applications - simulation for safety and reliability analysis - dynamic risk and barrier management - organizational factors and safety culture - human factors and human reliability - resilience engineering - structural reliability - natural hazards - security - economic analysis in risk managemen

    Maritime expressions:a corpus based exploration of maritime metaphors

    Get PDF
    This study uses a purpose-built corpus to explore the linguistic legacy of Britain’s maritime history found in the form of hundreds of specialised ‘Maritime Expressions’ (MEs), such as TAKEN ABACK, ANCHOR and ALOOF, that permeate modern English. Selecting just those expressions commencing with ’A’, it analyses 61 MEs in detail and describes the processes by which these technical expressions, from a highly specialised occupational discourse community, have made their way into modern English. The Maritime Text Corpus (MTC) comprises 8.8 million words, encompassing a range of text types and registers, selected to provide a cross-section of ‘maritime’ writing. It is analysed using WordSmith analytical software (Scott, 2010), with the 100 million-word British National Corpus (BNC) as a reference corpus. Using the MTC, a list of keywords of specific salience within the maritime discourse has been compiled and, using frequency data, concordances and collocations, these MEs are described in detail and their use and form in the MTC and the BNC is compared. The study examines the transformation from ME to figurative use in the general discourse, in terms of form and metaphoricity. MEs are classified according to their metaphorical strength and their transference from maritime usage into new registers and domains such as those of business, politics, sports and reportage etc. A revised model of metaphoricity is developed and a new category of figurative expression, the ‘resonator’, is proposed. Additionally, developing the work of Lakov and Johnson, Kovesces and others on Conceptual Metaphor Theory (CMT), a number of Maritime Conceptual Metaphors are identified and their cultural significance is discussed
    corecore