4,160 research outputs found

    New Algorithms and Lower Bounds for Sequential-Access Data Compression

    Get PDF
    This thesis concerns sequential-access data compression, i.e., by algorithms that read the input one or more times from beginning to end. In one chapter we consider adaptive prefix coding, for which we must read the input character by character, outputting each character's self-delimiting codeword before reading the next one. We show how to encode and decode each character in constant worst-case time while producing an encoding whose length is worst-case optimal. In another chapter we consider one-pass compression with memory bounded in terms of the alphabet size and context length, and prove a nearly tight tradeoff between the amount of memory we can use and the quality of the compression we can achieve. In a third chapter we consider compression in the read/write streams model, which allows us passes and memory both polylogarithmic in the size of the input. We first show how to achieve universal compression using only one pass over one stream. We then show that one stream is not sufficient for achieving good grammar-based compression. Finally, we show that two streams are necessary and sufficient for achieving entropy-only bounds.Comment: draft of PhD thesi

    Data structures

    Get PDF
    We discuss data structures and their methods of analysis. In particular, we treat the unweighted and weighted dictionary problem, self-organizing data structures, persistent data structures, the union-find-split problem, priority queues, the nearest common ancestor problem, the selection and merging problem, and dynamization techniques. The methods of analysis are worst, average and amortized case

    Learning, Categorization, Rule Formation, and Prediction by Fuzzy Neural Networks

    Full text link
    National Science Foundation (IRI 94-01659); Office of Naval Research (N00014-91-J-4100, N00014-92-J-4015) Air Force Office of Scientific Research (90-0083, N00014-92-J-4015

    The 4th Conference of PhD Students in Computer Science

    Get PDF

    An overview of decision table literature 1982-1995.

    Get PDF
    This report gives an overview of the literature on decision tables over the past 15 years. As much as possible, for each reference, an author supplied abstract, a number of keywords and a classification are provided. In some cases own comments are added. The purpose of these comments is to show where, how and why decision tables are used. The literature is classified according to application area, theoretical versus practical character, year of publication, country or origin (not necessarily country of publication) and the language of the document. After a description of the scope of the interview, classification results and the classification by topic are presented. The main body of the paper is the ordered list of publications with abstract, classification and comments.

    A review of clustering techniques and developments

    Full text link
    Β© 2017 Elsevier B.V. This paper presents a comprehensive study on clustering: exiting methods and developments made at various times. Clustering is defined as an unsupervised learning where the objects are grouped on the basis of some similarity inherent among them. There are different methods for clustering the objects such as hierarchical, partitional, grid, density based and model based. The approaches used in these methods are discussed with their respective states of art and applicability. The measures of similarity as well as the evaluation criteria, which are the central components of clustering, are also presented in the paper. The applications of clustering in some fields like image segmentation, object and character recognition and data mining are highlighted

    DCT Implementation on GPU

    Get PDF
    There has been a great progress in the field of graphics processors. Since, there is no rise in the speed of the normal CPU processors; Designers are coming up with multi-core, parallel processors. Because of their popularity in parallel processing, GPUs are becoming more and more attractive for many applications. With the increasing demand in utilizing GPUs, there is a great need to develop operating systems that handle the GPU to full capacity. GPUs offer a very efficient environment for many image processing applications. This thesis explores the processing power of GPUs for digital image compression using Discrete cosine transform

    κ°„κ²°ν•œ 자료ꡬ쑰λ₯Ό ν™œμš©ν•œ λ°˜κ΅¬μ‘°ν™”λœ λ¬Έμ„œ ν˜•μ‹λ“€μ˜ 곡간 효율적 ν‘œν˜„λ²•

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사) -- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 전기·컴퓨터곡학뢀, 2021. 2. Srinivasa Rao Satti.Numerous big data are generated from a plethora of sources. Most of the data stored as files contain a non-fixed type of schema, so that the files are suitable to be maintained as semi-structured document formats. A number of those formats, such as XML (eXtensible Markup Language), JSON (JavaScript Object Notation), and YAML (YAML Ain't Markup Language) are suggested to sustain hierarchy in the original corpora of data. Several data models structuring the gathered data - including RDF (Resource Description Framework) - depend on the semi-structured document formats to be serialized and transferred for future processing. Since the semi-structured document formats focus on readability and verbosity, redundant space is required to organize and maintain the document. Even though general-purpose compression schemes are widely used to compact the documents, applying those algorithms hinder future handling of the corpora, owing to loss of internal structures. The area of succinct data structures is widely investigated and researched in theory, to provide answers to the queries while the encoded data occupy space close to the information-theoretic lower bound. Bit vectors and trees are the notable succinct data structures. Nevertheless, there were few attempts to apply the idea of succinct data structures to represent the semi-structured documents in space-efficient manner. In this dissertation we propose a unified, space-efficient representation of various semi-structured document formats. The core functionality of this representation is its compactness and query-ability derived from enriched functions of succinct data structures. Incorporation of (a) bit indexed arrays, (b) succinct ordinal trees, and (c) compression techniques engineers the compact representation. We implement this representation in practice, and show by experiments that construction of this representation decreases the disk usage by up to 60% while occupying 90% less RAM. We also allow processing a document in partial manner, to allow processing of larger corpus of big data even in the constrained environment. In parallel to establishing the aforementioned compact semi-structured document representation, we provide and reinforce some of the existing compression schemes in this dissertation. We first suggest an idea to encode an array of integers that is not necessarily sorted. This compaction scheme improves upon the existing universal code systems, by assistance of succinct bit vector structure. We show that our suggested algorithm reduces space usage by up to 44% while consuming 15% less time than the original code system, while the algorithm additionally supports random access of elements upon the encoded array. We also reinforce the SBH bitmap index compression algorithm. The main strength of this scheme is the use of intermediate super-bucket during operations, giving better performance on querying through a combination of compressed bitmap indexes. Inspired from splits done during the intermediate process of the SBH algorithm, we give an improved compression mechanism supporting parallelism that could be utilized in both CPUs and GPUs. We show by experiments that this CPU parallel processing optimization diminishes compression and decompression times by up to 38% in a 4-core machine without modifying the bitmap compressed form. For GPUs, the new algorithm gives 48% faster query processing time in the experiments, compared to the previous existing bitmap index compression schemes.μ…€ 수 μ—†λŠ” λΉ… 데이터가 λ‹€μ–‘ν•œ μ›λ³Έλ‘œλΆ€ν„° μƒμ„±λ˜κ³  μžˆλ‹€. 이듀 λ°μ΄ν„°μ˜ λŒ€λΆ€λΆ„μ€ κ³ μ •λ˜μ§€ μ•Šμ€ μ’…λ₯˜μ˜ μŠ€ν‚€λ§ˆλ₯Ό ν¬ν•¨ν•œ 파일 ν˜•νƒœλ‘œ μ €μž₯λ˜λŠ”λ°, 이둜 μΈν•˜μ—¬ λ°˜κ΅¬μ‘°ν™”λœ λ¬Έμ„œ ν˜•μ‹μ„ μ΄μš©ν•˜μ—¬ νŒŒμΌμ„ μœ μ§€ν•˜λŠ” 것이 μ ν•©ν•˜λ‹€. XML, JSON 및 YAMLκ³Ό 같은 μ’…λ₯˜μ˜ λ°˜κ΅¬μ‘°ν™”λœ λ¬Έμ„œ ν˜•μ‹μ΄ 데이터에 λ‚΄μž¬ν•˜λŠ” ꡬ쑰λ₯Ό μœ μ§€ν•˜κΈ° μœ„ν•˜μ—¬ μ œμ•ˆλ˜μ—ˆλ‹€. μˆ˜μ§‘λœ 데이터λ₯Ό κ΅¬μ‘°ν™”ν•˜λŠ” RDF와 같은 μ—¬λŸ¬ 데이터 λͺ¨λΈλ“€μ€ 사후 처리λ₯Ό μœ„ν•œ μ €μž₯ 및 전솑을 μœ„ν•˜μ—¬ λ°˜κ΅¬μ‘°ν™”λœ λ¬Έμ„œ ν˜•μ‹μ— μ˜μ‘΄ν•œλ‹€. λ°˜κ΅¬μ‘°ν™”λœ λ¬Έμ„œ ν˜•μ‹μ€ 가독성과 닀변성에 μ§‘μ€‘ν•˜κΈ° λ•Œλ¬Έμ—, λ¬Έμ„œλ₯Ό κ΅¬μ‘°ν™”ν•˜κ³  μœ μ§€ν•˜κΈ° μœ„ν•˜μ—¬ 좔가적인 곡간을 ν•„μš”λ‘œ ν•œλ‹€. λ¬Έμ„œλ₯Ό μ••μΆ•μ‹œν‚€κΈ° μœ„ν•˜μ—¬ 일반적인 μ••μΆ• 기법듀이 널리 μ‚¬μš©λ˜κ³  μžˆμœΌλ‚˜, 이듀 기법듀을 μ μš©ν•˜κ²Œ 되면 λ¬Έμ„œμ˜ λ‚΄λΆ€ ꡬ쑰의 μ†μ‹€λ‘œ μΈν•˜μ—¬ λ°μ΄ν„°μ˜ 사후 μ²˜λ¦¬κ°€ μ–΄λ ΅κ²Œ λœλ‹€. 데이터λ₯Ό 정보이둠적 ν•˜ν•œμ— κ°€κΉŒμš΄ κ³΅κ°„λ§Œμ„ μ‚¬μš©ν•˜μ—¬ μ €μž₯을 κ°€λŠ₯ν•˜κ²Œ ν•˜λ©΄μ„œ μ§ˆμ˜μ— λŒ€ν•œ 응닡을 μ œκ³΅ν•˜λŠ” κ°„κ²°ν•œ μžλ£Œκ΅¬μ‘°λŠ” 이둠적으둜 널리 μ—°κ΅¬λ˜κ³  μžˆλŠ” 뢄야이닀. λΉ„νŠΈμ—΄κ³Ό νŠΈλ¦¬κ°€ 널리 μ•Œλ €μ§„ κ°„κ²°ν•œ μžλ£Œκ΅¬μ‘°λ“€μ΄λ‹€. κ·ΈλŸ¬λ‚˜ λ°˜κ΅¬μ‘°ν™”λœ λ¬Έμ„œλ“€μ„ μ €μž₯ν•˜λŠ” 데 κ°„κ²°ν•œ 자료ꡬ쑰의 아이디어λ₯Ό μ μš©ν•œ μ—°κ΅¬λŠ” 거의 μ§„ν–‰λ˜μ§€ μ•Šμ•˜λ‹€. λ³Έ ν•™μœ„λ…Όλ¬Έμ„ 톡해 μš°λ¦¬λŠ” λ‹€μ–‘ν•œ μ’…λ₯˜μ˜ λ°˜κ΅¬μ‘°ν™”λœ λ¬Έμ„œ ν˜•μ‹μ„ ν†΅μΌλ˜κ²Œ ν‘œν˜„ν•˜λŠ” 곡간 효율적 ν‘œν˜„λ²•μ„ μ œμ‹œν•œλ‹€. 이 κΈ°λ²•μ˜ μ£Όμš”ν•œ κΈ°λŠ₯은 κ°„κ²°ν•œ μžλ£Œκ΅¬μ‘°κ°€ κ°•μ μœΌλ‘œ κ°€μ§€λŠ” νŠΉμ„±μ— κΈ°λ°˜ν•œ κ°„κ²°μ„±κ³Ό 질의 κ°€λŠ₯성이닀. λΉ„νŠΈμ—΄λ‘œ μΈλ±μ‹±λœ λ°°μ—΄, κ°„κ²°ν•œ μˆœμ„œ μžˆλŠ” 트리 및 λ‹€μ–‘ν•œ μ••μΆ• 기법을 ν†΅ν•©ν•˜μ—¬ ν•΄λ‹Ή ν‘œν˜„λ²•μ„ κ³ μ•ˆν•˜μ˜€λ‹€. 이 기법은 μ‹€μž¬μ μœΌλ‘œ κ΅¬ν˜„λ˜μ—ˆκ³ , μ‹€ν—˜μ„ ν†΅ν•˜μ—¬ 이 기법을 μ μš©ν•œ λ°˜κ΅¬μ‘°ν™”λœ λ¬Έμ„œλ“€μ€ μ΅œλŒ€ 60% 적은 λ””μŠ€ν¬ 곡간과 90% 적은 λ©”λͺ¨λ¦¬ 곡간을 톡해 ν‘œν˜„λ  수 μžˆλ‹€λŠ” 것을 보인닀. λ”λΆˆμ–΄ λ³Έ ν•™μœ„λ…Όλ¬Έμ—μ„œ λ°˜κ΅¬μ‘°ν™”λœ λ¬Έμ„œλ“€μ€ λΆ„ν• μ μœΌλ‘œ ν‘œν˜„μ΄ κ°€λŠ₯함을 보이고, 이λ₯Ό ν†΅ν•˜μ—¬ μ œν•œλœ ν™˜κ²½μ—μ„œλ„ λΉ… 데이터λ₯Ό ν‘œν˜„ν•œ λ¬Έμ„œλ“€μ„ μ²˜λ¦¬ν•  수 μžˆλ‹€λŠ” 것을 보인닀. μ•žμ„œ μ–ΈκΈ‰ν•œ 곡간 효율적 λ°˜κ΅¬μ‘°ν™”λœ λ¬Έμ„œ ν‘œν˜„λ²•μ„ ꡬ좕함과 λ™μ‹œμ—, λ³Έ ν•™μœ„λ…Όλ¬Έμ—μ„œ 이미 μ‘΄μž¬ν•˜λŠ” μ••μΆ• 기법 쀑 일뢀λ₯Ό μΆ”κ°€μ μœΌλ‘œ κ°œμ„ ν•œλ‹€. 첫째둜, λ³Έ ν•™μœ„λ…Όλ¬Έμ—μ„œλŠ” μ •λ ¬ 여뢀에 κ΄€κ³„μ—†λŠ” μ •μˆ˜ 배열을 λΆ€ν˜Έν™”ν•˜λŠ” 아이디어λ₯Ό μ œμ‹œν•œλ‹€. 이 기법은 이미 μ‘΄μž¬ν•˜λŠ” λ²”μš© μ½”λ“œ μ‹œμŠ€ν…œμ„ κ°œμ„ ν•œ ν˜•νƒœλ‘œ, κ°„κ²°ν•œ λΉ„νŠΈμ—΄ 자료ꡬ쑰λ₯Ό μ΄μš©ν•œλ‹€. μ œμ•ˆλœ μ•Œκ³ λ¦¬μ¦˜μ€ κΈ°μ‘΄ λ²”μš© μ½”λ“œ μ‹œμŠ€ν…œμ— λΉ„ν•΄ μ΅œλŒ€ 44\% 적은 곡간을 μ‚¬μš©ν•  뿐만 μ•„λ‹ˆλΌ 15\% 적은 λΆ€ν˜Έν™” μ‹œκ°„μ„ ν•„μš”λ‘œ ν•˜λ©°, κΈ°μ‘΄ μ‹œμŠ€ν…œμ—μ„œ μ œκ³΅ν•˜μ§€ μ•ŠλŠ” λΆ€ν˜Έν™”λœ λ°°μ—΄μ—μ„œμ˜ μž„μ˜ 접근을 μ§€μ›ν•œλ‹€. λ˜ν•œ λ³Έ ν•™μœ„λ…Όλ¬Έμ—μ„œλŠ” λΉ„νŠΈλ§΅ 인덱슀 압좕에 μ‚¬μš©λ˜λŠ” SBH μ•Œκ³ λ¦¬μ¦˜μ„ κ°œμ„ μ‹œν‚¨λ‹€. ν•΄λ‹Ή κΈ°λ²•μ˜ 주된 강점은 λΆ€ν˜Έν™”μ™€ λ³΅ν˜Έν™” 진행 μ‹œ 쀑간 맀개인 μŠˆνΌλ²„μΌ“μ„ μ‚¬μš©ν•¨μœΌλ‘œμ¨ μ—¬λŸ¬ μ••μΆ•λœ λΉ„νŠΈλ§΅ μΈλ±μŠ€μ— λŒ€ν•œ 질의 μ„±λŠ₯을 κ°œμ„ μ‹œν‚€λŠ” 것이닀. μœ„ μ••μΆ• μ•Œκ³ λ¦¬μ¦˜μ˜ 쀑간 κ³Όμ •μ—μ„œ μ§„ν–‰λ˜λŠ” λΆ„ν• μ—μ„œ μ˜κ°μ„ μ–»μ–΄, λ³Έ ν•™μœ„λ…Όλ¬Έμ—μ„œ CPU 및 GPU에 적용 κ°€λŠ₯ν•œ κ°œμ„ λœ 병렬화 μ••μΆ• λ§€μ»€λ‹ˆμ¦˜μ„ μ œμ‹œν•œλ‹€. μ‹€ν—˜μ„ 톡해 CPU 병렬 μ΅œμ ν™”κ°€ 이루어진 μ•Œκ³ λ¦¬μ¦˜μ€ μ••μΆ•λœ ν˜•νƒœμ˜ λ³€ν˜• 없이 4μ½”μ–΄ μ»΄ν“¨ν„°μ—μ„œ μ΅œλŒ€ 38\%의 μ••μΆ• 및 ν•΄μ œ μ‹œκ°„μ„ κ°μ†Œμ‹œν‚¨λ‹€λŠ” 것을 보인닀. GPU 병렬 μ΅œμ ν™”λŠ” 기쑴에 μ‘΄μž¬ν•˜λŠ” GPU λΉ„νŠΈλ§΅ μ••μΆ• 기법에 λΉ„ν•΄ 48\% λΉ λ₯Έ 질의 처리 μ‹œκ°„μ„ ν•„μš”λ‘œ 함을 ν™•μΈν•œλ‹€.Chapter 1 Introduction 1 1.1 Contribution 3 1.2 Organization 5 Chapter 2 Background 6 2.1 Model of Computation 6 2.2 Succinct Data Structures 7 Chapter 3 Space-efficient Representation of Integer Arrays 9 3.1 Introduction 9 3.2 Preliminaries 10 3.2.1 Universal Code System 10 3.2.2 Bit Vector 13 3.3 Algorithm Description 13 3.3.1 Main Principle 14 3.3.2 Optimization in the Implementation 16 3.4 Experimental Results 16 Chapter 4 Space-efficient Parallel Compressed Bitmap Index Processing 19 4.1 Introduction 19 4.2 Related Work 23 4.2.1 Byte-aligned Bitmap Code (BBC) 24 4.2.2 Word-Aligned Hybrid (WAH) 27 4.2.3 WAH-derived Algorithms 28 4.2.4 GPU-based WAH Algorithms 31 4.2.5 Super Byte-aligned Hybrid (SBH) 33 4.3 Parallelizing SBH 38 4.3.1 CPU Parallelism 38 4.3.2 GPU Parallelism 39 4.4 Experimental Results 40 4.4.1 Plain Version 41 4.4.2 Parallelized Version 46 4.4.3 Summary 49 Chapter 5 Space-efficient Representation of Semi-structured Document Formats 50 5.1 Preliminaries 50 5.1.1 Semi-structured Document Formats 50 5.1.2 Resource Description Framework 57 5.1.3 Succinct Ordinal Tree Representations 60 5.1.4 String Compression Schemes 64 5.2 Representation 66 5.2.1 Bit String Indexed Array 67 5.2.2 Main Structure 68 5.2.3 Single Document as a Collection of Chunks 72 5.2.4 Supporting Queries 73 5.3 Experimental Results 75 5.3.1 Datasets 76 5.3.2 Construction Time 78 5.3.3 RAM Usage during Construction 80 5.3.4 Disk Usage and Serialization Time 83 5.3.5 Chunk Division 83 5.3.6 String Compression 88 5.3.7 Query Time 89 Chapter 6 Conclusion 94 Bibliography 96 μš”μ•½ 109 Acknowledgements 111Docto

    Intelligent Sensor Networks

    Get PDF
    In the last decade, wireless or wired sensor networks have attracted much attention. However, most designs target general sensor network issues including protocol stack (routing, MAC, etc.) and security issues. This book focuses on the close integration of sensing, networking, and smart signal processing via machine learning. Based on their world-class research, the authors present the fundamentals of intelligent sensor networks. They cover sensing and sampling, distributed signal processing, and intelligent signal learning. In addition, they present cutting-edge research results from leading experts
    • …
    corecore