10,031 research outputs found

    An Evaluation of Popular Copy-Move Forgery Detection Approaches

    Full text link
    A copy-move forgery is created by copying and pasting content within the same image, and potentially post-processing it. In recent years, the detection of copy-move forgeries has become one of the most actively researched topics in blind image forensics. A considerable number of different algorithms have been proposed focusing on different types of postprocessed copies. In this paper, we aim to answer which copy-move forgery detection algorithms and processing steps (e.g., matching, filtering, outlier detection, affine transformation estimation) perform best in various postprocessing scenarios. The focus of our analysis is to evaluate the performance of previously proposed feature sets. We achieve this by casting existing algorithms in a common pipeline. In this paper, we examined the 15 most prominent feature sets. We analyzed the detection performance on a per-image basis and on a per-pixel basis. We created a challenging real-world copy-move dataset, and a software framework for systematic image manipulation. Experiments show, that the keypoint-based features SIFT and SURF, as well as the block-based DCT, DWT, KPCA, PCA and Zernike features perform very well. These feature sets exhibit the best robustness against various noise sources and downsampling, while reliably identifying the copied regions.Comment: Main paper: 14 pages, supplemental material: 12 pages, main paper appeared in IEEE Transaction on Information Forensics and Securit

    Draft genomes of two Artocarpus plants, jackfruit (A. heterophyllus) and breadfruit (A. altilis)

    Get PDF
    Two of the most economically important plants in the Artocarpus genus are jackfruit (A. heterophyllus Lam.) and breadfruit (A. altilis (Parkinson) Fosberg). Both species are long-lived trees that have been cultivated for thousands of years in their native regions. Today they are grown throughout tropical to subtropical areas as an important source of starch and other valuable nutrients. There are hundreds of breadfruit varieties that are native to Oceania, of which the most commonly distributed types are seedless triploids. Jackfruit is likely native to the Western Ghats of India and produces one of the largest tree-borne fruit structures (reaching up to 45 kg). To-date, there is limited genomic information for these two economically important species. Here, we generated 273 Gb and 227 Gb of raw data from jackfruit and breadfruit, respectively. The high-quality reads from jackfruit were assembled into 162,440 scaffolds totaling 982 Mb with 35,858 genes. Similarly, the breadfruit reads were assembled into 180,971 scaffolds totaling 833 Mb with 34,010 genes. A total of 2822 and 2034 expanded gene families were found in jackfruit and breadfruit, respectively, enriched in pathways including starch and sucrose metabolism, photosynthesis, and others. The copy number of several starch synthesis-related genes were found to be increased in jackfruit and breadfruit compared to closely-related species, and the tissue-specific expression might imply their sugar-rich and starch-rich characteristics. Overall, the publication of high-quality genomes for jackfruit and breadfruit provides information about their specific composition and the underlying genes involved in sugar and starch metabolism

    데이터 집약적 μ‘μš©μ˜ 효율적인 μ‹œμŠ€ν…œ μžμ› ν™œμš©μ„ μœ„ν•œ λ©”λͺ¨λ¦¬ μ„œλΈŒμ‹œμŠ€ν…œ μ΅œμ ν™”

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사) -- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 전기·컴퓨터곡학뢀, 2020. 8. μ—Όν—Œμ˜.With explosive data growth, data-intensive applications, such as relational database and key-value storage, have been increasingly popular in a variety of domains in recent years. To meet the growing performance demands of data-intensive applications, it is crucial to efficiently and fully utilize memory resources for the best possible performance. However, general-purpose operating systems (OSs) are designed to provide system resources to applications running on a system in a fair manner at system-level. A single application may find it difficult to fully exploit the systems best performance due to this system-level fairness. For performance reasons, many data-intensive applications implement their own mechanisms that OSs already provide, under the assumption that they know better about the data than OSs. They can be greedily optimized for performance but this may result in inefficient use of system resources. In this dissertation, we claim that simple OS support with minor application modifications can yield even higher application performance without sacrificing system-level resource utilization. We optimize and extend OS memory subsystem for better supporting applications while addressing three memory-related issues in data-intensive applications. First, we introduce a memory-efficient cooperative caching approach between application and kernel buffer to address double caching problem where the same data resides in multiple layers. Second, we present a memory-efficient, transparent zero-copy read I/O scheme to avoid the performance interference problem caused by memory copy behavior during I/O. Third, we propose a memory-efficient fork-based checkpointing mechanism for in-memory database systems to mitigate the memory footprint problem of the existing fork-based checkpointing scheme; memory usage increases incrementally (up to 2x) during checkpointing for update-intensive workloads. To show the effectiveness of our approach, we implement and evaluate our schemes on real multi-core systems. The experimental results demonstrate that our cooperative approach can more effectively address the above issues related to data-intensive applications than existing non-cooperative approaches while delivering better performance (in terms of transaction processing speed, I/O throughput, or memory footprint).졜근 폭발적인 데이터 μ„±μž₯κ³Ό λ”λΆˆμ–΄ λ°μ΄ν„°λ² μ΄μŠ€, ν‚€-λ°Έλ₯˜ μŠ€ν† λ¦¬μ§€ λ“±μ˜ 데이터 집약적인 μ‘μš©λ“€μ΄ λ‹€μ–‘ν•œ λ„λ©”μΈμ—μ„œ 인기λ₯Ό μ–»κ³  μžˆλ‹€. 데이터 집약적인 μ‘μš©μ˜ 높은 μ„±λŠ₯ μš”κ΅¬λ₯Ό μΆ©μ‘±ν•˜κΈ° μœ„ν•΄μ„œλŠ” 주어진 λ©”λͺ¨λ¦¬ μžμ›μ„ 효율적이고 μ™„λ²½ν•˜κ²Œ ν™œμš©ν•˜λŠ” 것이 μ€‘μš”ν•˜λ‹€. κ·ΈλŸ¬λ‚˜, λ²”μš© 운영체제(OS)λŠ” μ‹œμŠ€ν…œμ—μ„œ μˆ˜ν–‰ 쀑인 λͺ¨λ“  μ‘μš©λ“€μ— λŒ€ν•΄ μ‹œμŠ€ν…œ μ°¨μ›μ—μ„œ κ³΅ν‰ν•˜κ²Œ μžμ›μ„ μ œκ³΅ν•˜λŠ” 것을 μš°μ„ ν•˜λ„λ‘ μ„€κ³„λ˜μ–΄μžˆλ‹€. 즉, μ‹œμŠ€ν…œ μ°¨μ›μ˜ 곡평성 μœ μ§€λ₯Ό μœ„ν•œ 운영체제 μ§€μ›μ˜ ν•œκ³„λ‘œ 인해 단일 μ‘μš©μ€ μ‹œμŠ€ν…œμ˜ 졜고 μ„±λŠ₯을 μ™„μ „νžˆ ν™œμš©ν•˜κΈ° μ–΄λ ΅λ‹€. μ΄λŸ¬ν•œ 이유둜, λ§Žμ€ 데이터 집약적 μ‘μš©μ€ μš΄μ˜μ²΄μ œμ—μ„œ μ œκ³΅ν•˜λŠ” κΈ°λŠ₯에 μ˜μ§€ν•˜μ§€ μ•Šκ³  λΉ„μŠ·ν•œ κΈ°λŠ₯을 μ‘μš© λ ˆλ²¨μ— κ΅¬ν˜„ν•˜κ³€ ν•œλ‹€. μ΄λŸ¬ν•œ μ ‘κ·Ό 방법은 νƒμš•μ μΈ μ΅œμ ν™”κ°€ κ°€λŠ₯ν•˜λ‹€λŠ” μ μ—μ„œ μ„±λŠ₯ 상 이득이 μžˆμ„ 수 μžˆμ§€λ§Œ, μ‹œμŠ€ν…œ μžμ›μ˜ λΉ„νš¨μœ¨μ μΈ μ‚¬μš©μ„ μ΄ˆλž˜ν•  수 μžˆλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” 운영체제의 지원과 μ•½κ°„μ˜ μ‘μš© μˆ˜μ •λ§ŒμœΌλ‘œλ„ λΉ„νš¨μœ¨μ μΈ μ‹œμŠ€ν…œ μžμ› μ‚¬μš© 없이 보닀 높은 μ‘μš© μ„±λŠ₯을 보일 수 μžˆμŒμ„ 증λͺ…ν•˜κ³ μž ν•œλ‹€. 그러기 μœ„ν•΄ 운영체제의 λ©”λͺ¨λ¦¬ μ„œλΈŒμ‹œμŠ€ν…œμ„ μ΅œμ ν™” 및 ν™•μž₯ν•˜μ—¬ 데이터 집약적인 μ‘μš©μ—μ„œ λ°œμƒν•˜λŠ” μ„Έ 가지 λ©”λͺ¨λ¦¬ κ΄€λ ¨ 문제λ₯Ό ν•΄κ²°ν•˜μ˜€λ‹€. 첫째, λ™μΌν•œ 데이터가 μ—¬λŸ¬ 계측에 μ‘΄μž¬ν•˜λŠ” 쀑볡 캐싱 문제λ₯Ό ν•΄κ²°ν•˜κΈ° μœ„ν•΄ μ‘μš©κ³Ό 컀널 버퍼 간에 λ©”λͺ¨λ¦¬ 효율적인 ν˜‘λ ₯ 캐싱 방식을 μ œμ‹œν•˜μ˜€λ‹€. λ‘˜μ§Έ, μž…μΆœλ ₯μ‹œ λ°œμƒν•˜λŠ” λ©”λͺ¨λ¦¬ λ³΅μ‚¬λ‘œ μΈν•œ μ„±λŠ₯ κ°„μ„­ 문제λ₯Ό ν”Όν•˜κΈ° μœ„ν•΄ λ©”λͺ¨λ¦¬ 효율적인 무볡사 읽기 μž…μΆœλ ₯ 방식을 μ œμ‹œν•˜μ˜€λ‹€. μ…‹μ§Έ, 인-λ©”λͺ¨λ¦¬ λ°μ΄ν„°λ² μ΄μŠ€ μ‹œμŠ€ν…œμ„ μœ„ν•œ λ©”λͺ¨λ¦¬ 효율적인 fork 기반 체크포인트 기법을 μ œμ•ˆν•˜μ—¬ κΈ°μ‘΄ 포크 기반 체크포인트 κΈ°λ²•μ—μ„œ λ°œμƒν•˜λŠ” λ©”λͺ¨λ¦¬ μ‚¬μš©λŸ‰ 증가 문제λ₯Ό μ™„ν™”ν•˜μ˜€λ‹€; κΈ°μ‘΄ 방식은 μ—…λ°μ΄νŠΈ 집약적 μ›Œν¬λ‘œλ“œμ— λŒ€ν•΄ μ²΄ν¬ν¬μΈνŒ…μ„ μˆ˜ν–‰ν•˜λŠ” λ™μ•ˆ λ©”λͺ¨λ¦¬ μ‚¬μš©λŸ‰μ΄ μ΅œλŒ€ 2λ°°κΉŒμ§€ μ μ§„μ μœΌλ‘œ 증가할 수 μžˆμ—ˆλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μ œμ•ˆν•œ λ°©λ²•λ“€μ˜ 효과λ₯Ό 증λͺ…ν•˜κΈ° μœ„ν•΄ μ‹€μ œ λ©€ν‹° μ½”μ–΄ μ‹œμŠ€ν…œμ— κ΅¬ν˜„ν•˜κ³  κ·Έ μ„±λŠ₯을 ν‰κ°€ν•˜μ˜€λ‹€. μ‹€ν—˜κ²°κ³Όλ₯Ό 톡해 μ œμ•ˆν•œ ν˜‘λ ₯적 접근방식이 기쑴의 λΉ„ν˜‘λ ₯적 접근방식보닀 데이터 집약적 μ‘μš©μ—κ²Œ 효율적인 λ©”λͺ¨λ¦¬ μžμ› ν™œμš©μ„ κ°€λŠ₯ν•˜κ²Œ ν•¨μœΌλ‘œμ¨ 더 높은 μ„±λŠ₯을 μ œκ³΅ν•  수 μžˆμŒμ„ 확인할 수 μžˆμ—ˆλ‹€.Chapter 1 Introduction 1 1.1 Motivation 1 1.1.1 Importance of Memory Resources 1 1.1.2 Problems 2 1.2 Contributions 5 1.3 Outline 6 Chapter 2 Background 7 2.1 Linux Kernel Memory Management 7 2.1.1 Page Cache 7 2.1.2 Page Reclamation 8 2.1.3 Page Table and TLB Shootdown 9 2.1.4 Copy-on-Write 10 2.2 Linux Support for Applications 11 2.2.1 fork 11 2.2.2 madvise 11 2.2.3 Direct I/O 12 2.2.4 mmap 13 Chapter 3 Memory Efficient Cooperative Caching 14 3.1 Motivation 14 3.1.1 Problems of Existing Datastore Architecture 14 3.1.2 Proposed Architecture 17 3.2 Related Work 17 3.3 Design and Implementation 19 3.3.1 Overview 19 3.3.2 Kernel Support 24 3.3.3 Migration to DBIO 25 3.4 Evaluation 27 3.4.1 System Configuration 27 3.4.2 Methodology 28 3.4.3 TPC-C Benchmarks 30 3.4.4 YCSB Benchmarks 32 3.5 Summary 37 Chapter 4 Memory Efficient Zero-copy I/O 38 4.1 Motivation 38 4.1.1 The Problems of Copy-Based I/O 38 4.2 Related Work 40 4.2.1 Zero Copy I/O 40 4.2.2 TLB Shootdown 42 4.2.3 Copy-on-Write 43 4.3 Design and Implementation 44 4.3.1 Prerequisites for z-READ 44 4.3.2 Overview of z-READ 45 4.3.3 TLB Shootdown Optimization 48 4.3.4 Copy-on-Write Optimization 52 4.3.5 Implementation 55 4.4 Evaluation 55 4.4.1 System Configurations 56 4.4.2 Effectiveness of the TLB Shootdown Optimization 57 4.4.3 Effectiveness of CoW Optimization 59 4.4.4 Analysis of the Performance Improvement 62 4.4.5 Performance Interference Intensity 63 4.4.6 Effectiveness of z-READ in Macrobenchmarks 65 4.5 Summary 67 Chapter 5 Memory Efficient Fork-based Checkpointing 69 5.1 Motivation 69 5.1.1 Fork-based Checkpointing 69 5.1.2 Approach 71 5.2 Related Work 73 5.3 Design and Implementation 74 5.3.1 Overview 74 5.3.2 OS Support 78 5.3.3 Implementation 79 5.4 Evaluation 80 5.4.1 Experimental Setup 80 5.4.2 Performance 81 5.5 Summary 86 Chapter 6 Conclusion 87 μš”μ•½ 100Docto

    Engineering Crowdsourced Stream Processing Systems

    Full text link
    A crowdsourced stream processing system (CSP) is a system that incorporates crowdsourced tasks in the processing of a data stream. This can be seen as enabling crowdsourcing work to be applied on a sample of large-scale data at high speed, or equivalently, enabling stream processing to employ human intelligence. It also leads to a substantial expansion of the capabilities of data processing systems. Engineering a CSP system requires the combination of human and machine computation elements. From a general systems theory perspective, this means taking into account inherited as well as emerging properties from both these elements. In this paper, we position CSP systems within a broader taxonomy, outline a series of design principles and evaluation metrics, present an extensible framework for their design, and describe several design patterns. We showcase the capabilities of CSP systems by performing a case study that applies our proposed framework to the design and analysis of a real system (AIDR) that classifies social media messages during time-critical crisis events. Results show that compared to a pure stream processing system, AIDR can achieve a higher data classification accuracy, while compared to a pure crowdsourcing solution, the system makes better use of human workers by requiring much less manual work effort

    Big data analytics for large-scale wireless networks: Challenges and opportunities

    Full text link
    Β© 2019 Association for Computing Machinery. The wide proliferation of various wireless communication systems and wireless devices has led to the arrival of big data era in large-scale wireless networks. Big data of large-scale wireless networks has the key features of wide variety, high volume, real-time velocity, and huge value leading to the unique research challenges that are different from existing computing systems. In this article, we present a survey of the state-of-art big data analytics (BDA) approaches for large-scale wireless networks. In particular, we categorize the life cycle of BDA into four consecutive stages: Data Acquisition, Data Preprocessing, Data Storage, and Data Analytics. We then present a detailed survey of the technical solutions to the challenges in BDA for large-scale wireless networks according to each stage in the life cycle of BDA. Moreover, we discuss the open research issues and outline the future directions in this promising area

    Shuttle Ground Operations Efficiencies/Technologies (SGOE/T) study. Volume 2: Ground Operations evaluation

    Get PDF
    The Ground Operations Evaluation describes the breath and depth of the various study elements selected as a result of an operational analysis conducted during the early part of the study. Analysis techniques used for the evaluation are described in detail. Elements selected for further evaluation are identified; the results of the analysis documented; and a follow-on course of action recommended. The background and rationale for developing recommendations for the current Shuttle or for future programs is presented

    End-to-End Entity Resolution for Big Data: A Survey

    Get PDF
    One of the most important tasks for improving data quality and the reliability of data analytics results is Entity Resolution (ER). ER aims to identify different descriptions that refer to the same real-world entity, and remains a challenging problem. While previous works have studied specific aspects of ER (and mostly in traditional settings), in this survey, we provide for the first time an end-to-end view of modern ER workflows, and of the novel aspects of entity indexing and matching methods in order to cope with more than one of the Big Data characteristics simultaneously. We present the basic concepts, processing steps and execution strategies that have been proposed by different communities, i.e., database, semantic Web and machine learning, in order to cope with the loose structuredness, extreme diversity, high speed and large scale of entity descriptions used by real-world applications. Finally, we provide a synthetic discussion of the existing approaches, and conclude with a detailed presentation of open research directions
    • …
    corecore