739 research outputs found

    Efficient Large-scale Distance-Based Join Queries in SpatialHadoop

    Get PDF
    Efficient processing of Distance-Based Join Queries (DBJQs) in spatial databases is of paramount importance in many application domains. The most representative and known DBJQs are the K Closest Pairs Query (KCPQ) and the Ξ΅ Distance Join Query (Ξ΅DJQ). These types of join queries are characterized by a number of desired pairs (K) or a distance threshold (Ξ΅) between the components of the pairs in the final result, over two spatial datasets. Both are expensive operations, since two spatial datasets are combined with additional constraints. Given the increasing volume of spatial data originating from multiple sources and stored in distributed servers, it is not always efficient to perform DBJQs on a centralized server. For this reason, this paper addresses the problem of computing DBJQs on big spatial datasets in SpatialHadoop, an extension of Hadoop that supports efficient processing of spatial queries in a cloud-based setting. We propose novel algorithms, based on plane-sweep, to perform efficient parallel DBJQs on large-scale spatial datasets in Spatial Hadoop. We evaluate the performance of the proposed algorithms in several situations with large real-world as well as synthetic datasets. The experiments demonstrate the efficiency and scalability of our proposed methodologies

    Efficient And Scalable Evaluation Of Continuous, Spatio-temporal Queries In Mobile Computing Environments

    Get PDF
    A variety of research exists for the processing of continuous queries in large, mobile environments. Each method tries, in its own way, to address the computational bottleneck of constantly processing so many queries. For this research, we present a two-pronged approach at addressing this problem. Firstly, we introduce an efficient and scalable system for monitoring traditional, continuous queries by leveraging the parallel processing capability of the Graphics Processing Unit. We examine a naive CPU-based solution for continuous range-monitoring queries, and we then extend this system using the GPU. Additionally, with mobile communication devices becoming commodity, location-based services will become ubiquitous. To cope with the very high intensity of location-based queries, we propose a view oriented approach of the location database, thereby reducing computation costs by exploiting computation sharing amongst queries requiring the same view. Our studies show that by exploiting the parallel processing power of the GPU, we are able to significantly scale the number of mobile objects, while maintaining an acceptable level of performance. Our second approach was to view this research problem as one belonging to the domain of data streams. Several works have convincingly argued that the two research fields of spatiotemporal data streams and the management of moving objects can naturally come together. [IlMI10, ChFr03, MoXA04] For example, the output of a GPS receiver, monitoring the position of a mobile object, is viewed as a data stream of location updates. This data stream of location updates, along with those from the plausibly many other mobile objects, is received at a centralized server, which processes the streams upon arrival, effectively updating the answers to the currently active queries in real time. iv For this second approach, we present GEDS, a scalable, Graphics Processing Unit (GPU)-based framework for the evaluation of continuous spatio-temporal queries over spatiotemporal data streams. Specifically, GEDS employs the computation sharing and parallel processing paradigms to deliver scalability in the evaluation of continuous, spatio-temporal range queries and continuous, spatio-temporal kNN queries. The GEDS framework utilizes the parallel processing capability of the GPU, a stream processor by trade, to handle the computation required in this application. Experimental evaluation shows promising performance and shows the scalability and efficacy of GEDS in spatio-temporal data streaming environments. Additional performance studies demonstrate that, even in light of the costs associated with memory transfers, the parallel processing power provided by GEDS clearly counters and outweighs any associated costs. Finally, in an effort to move beyond the analysis of specific algorithms over the GEDS framework, we take a broader approach in our analysis of GPU computing. What algorithms are appropriate for the GPU? What types of applications can benefit from the parallel and stream processing power of the GPU? And can we identify a class of algorithms that are best suited for GPU computing? To answer these questions, we develop an abstract performance model, detailing the relationship between the CPU and the GPU. From this model, we are able to extrapolate a list of attributes common to successful GPU-based applications, thereby providing insight into which algorithms and applications are best suited for the GPU and also providing an estimated theoretical speedup for said GPU-based application

    λŒ€μš©λŸ‰ 데이터 탐색을 μœ„ν•œ 점진적 μ‹œκ°ν™” μ‹œμŠ€ν…œ 섀계

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(박사)--μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› :κ³΅κ³ΌλŒ€ν•™ 컴퓨터곡학뢀,2020. 2. μ„œμ§„μš±.Understanding data through interactive visualization, also known as visual analytics, is a common and necessary practice in modern data science. However, as data sizes have increased at unprecedented rates, the computation latency of visualization systems becomes a significant hurdle to visual analytics. The goal of this dissertation is to design a series of systems for progressive visual analytics (PVA)β€”a visual analytics paradigm that can provide intermediate results during computation and allow visual exploration of these resultsβ€”to address the scalability hurdle. To support the interactive exploration of data with billions of records, we first introduce SwiftTuna, an interactive visualization system with scalable visualization and computation components. Our performance benchmark demonstrates that it can handle data with four billion records, giving responsive feedback every few seconds without precomputation. Second, we present PANENE, a progressive algorithm for the Approximate k-Nearest Neighbor (AKNN) problem. PANENE brings useful machine learning methods into visual analytics, which has been challenging due to their long initial latency resulting from AKNN computation. In particular, we accelerate t-Distributed Stochastic Neighbor Embedding (t-SNE), a popular non-linear dimensionality reduction technique, which enables the responsive visualization of data with a few hundred columns. Each of these two contributions aims to address the scalability issues stemming from a large number of rows or columns in data, respectively. Third, from the users' perspective, we focus on improving the trustworthiness of intermediate knowledge gained from uncertain results in PVA. We propose a novel PVA concept, Progressive Visual Analytics with Safeguards, and introduce PVA-Guards, safeguards people can leave on uncertain intermediate knowledge that needs to be verified. We also present a proof-of-concept system, ProReveal, designed and developed to integrate seven safeguards into progressive data exploration. Our user study demonstrates that people not only successfully created PVA-Guards on ProReveal but also voluntarily used PVA-Guards to manage the uncertainty of their knowledge. Finally, summarizing the three studies, we discuss design challenges for progressive systems as well as future research agendas for PVA.ν˜„λŒ€ 데이터 μ‚¬μ΄μ–ΈμŠ€μ—μ„œ μΈν„°λž™ν‹°λΈŒν•œ μ‹œκ°ν™”λ₯Ό 톡해 데이터λ₯Ό μ΄ν•΄ν•˜λŠ” 것은 ν•„μˆ˜μ μΈ 뢄석 방법 쀑 ν•˜λ‚˜μ΄λ‹€. κ·ΈλŸ¬λ‚˜, 졜근 λ°μ΄ν„°μ˜ 크기가 폭발적으둜 μ¦κ°€ν•˜λ©΄μ„œ 데이터 크기둜 인해 λ°œμƒν•˜λŠ” 지연 μ‹œκ°„μ΄ μΈν„°λž™ν‹°λΈŒν•œ μ‹œκ°μ  뢄석에 큰 걸림돌이 λ˜μ—ˆλ‹€. λ³Έ μ—°κ΅¬μ—μ„œλŠ” μ΄λŸ¬ν•œ ν™•μž₯μ„± 문제λ₯Ό ν•΄κ²°ν•˜κΈ° μœ„ν•΄ 점진적 μ‹œκ°μ  뢄석(Progressive Visual Analytics)을 μ§€μ›ν•˜λŠ” 일련의 μ‹œμŠ€ν…œμ„ λ””μžμΈν•˜κ³  κ°œλ°œν•œλ‹€. μ΄λŸ¬ν•œ 점진적 μ‹œκ°μ  뢄석 μ‹œμŠ€ν…œμ€ 데이터 μ²˜λ¦¬κ°€ μ™„μ „νžˆ λλ‚˜μ§€ μ•Šλ”λΌλ„ 쀑간 뢄석 κ²°κ³Όλ₯Ό μ‚¬μš©μžμ—κ²Œ μ œκ³΅ν•¨μœΌλ‘œμ¨ λ°μ΄ν„°μ˜ 크기둜 인해 λ°œμƒν•˜λŠ” 지연 μ‹œκ°„ 문제λ₯Ό μ™„ν™”ν•  수 μžˆλ‹€. 첫째둜, μˆ˜μ‹­μ–΅ 건의 행을 κ°€μ§€λŠ” 데이터λ₯Ό μ‹œκ°μ μœΌλ‘œ 탐색할 수 μžˆλŠ” SwiftTuna μ‹œμŠ€ν…œμ„ μ œμ•ˆν•œλ‹€. 데이터 처리 및 μ‹œκ°μ  ν‘œν˜„μ˜ ν™•μž₯성을 λͺ©ν‘œλ‘œ 개발된 이 μ‹œμŠ€ν…œμ€, μ•½ 40μ–΅ 건의 행을 가진 데이터에 λŒ€ν•œ μ‹œκ°ν™”λ₯Ό μ „μ²˜λ¦¬ 없이 수 μ΄ˆλ§ˆλ‹€ μ—…λ°μ΄νŠΈν•  수 μžˆλŠ” κ²ƒμœΌλ‘œ λ‚˜νƒ€λ‚¬λ‹€. λ‘˜μ§Έλ‘œ, 근사적 k-μ΅œκ·Όμ ‘μ (Approximate k-Nearest Neighbor) 문제λ₯Ό μ μ§„μ μœΌλ‘œ κ³„μ‚°ν•˜λŠ” PANENE μ•Œκ³ λ¦¬μ¦˜μ„ μ œμ•ˆν•œλ‹€. 근사적 k-μ΅œκ·Όμ ‘μ  λ¬Έμ œλŠ” μ—¬λŸ¬ 기계 ν•™μŠ΅ κΈ°λ²•μ—μ„œ μ“°μž„μ—λ„ λΆˆκ΅¬ν•˜κ³  초기 계산 μ‹œκ°„μ΄ κΈΈμ–΄μ„œ μΈν„°λž™ν‹°λΈŒν•œ μ‹œμŠ€ν…œμ— μ μš©ν•˜κΈ° νž˜λ“  ν•œκ³„κ°€ μžˆμ—ˆλ‹€. PANENE μ•Œκ³ λ¦¬μ¦˜μ€ μ΄λŸ¬ν•œ κΈ΄ 초기 계산 μ‹œκ°„μ„ 획기적으둜 κ°œμ„ ν•˜μ—¬ λ‹€μ–‘ν•œ 기계 ν•™μŠ΅ 기법을 μ‹œκ°μ  뢄석에 ν™œμš©ν•  수 μžˆλ„λ‘ ν•œλ‹€. 특히, μœ μš©ν•œ λΉ„μ„ ν˜•μ  차원 κ°μ†Œ 기법인 t-뢄포 ν™•λ₯ μ  μž„λ² λ”©(t-Distributed Stochastic Neighbor Embedding)을 κ°€μ†ν•˜μ—¬ 수백 개의 차원을 κ°€μ§€λŠ” 데이터λ₯Ό λΉ λ₯Έ μ‹œκ°„ 내에 μ‚¬μ˜ν•  수 μžˆλ‹€. μœ„μ˜ 두 μ‹œμŠ€ν…œκ³Ό μ•Œκ³ λ¦¬μ¦˜μ΄ λ°μ΄ν„°μ˜ ν–‰ λ˜λŠ” μ—΄μ˜ 개수둜 μΈν•œ ν™•μž₯μ„± 문제λ₯Ό ν•΄κ²°ν•˜κ³ μž ν–ˆλ‹€λ©΄, μ„Έ 번째 μ‹œμŠ€ν…œμ—μ„œλŠ” 점진적 μ‹œκ°μ  λΆ„μ„μ˜ 신뒰도 문제λ₯Ό κ°œμ„ ν•˜κ³ μž ν•œλ‹€. 점진적 μ‹œκ°μ  λΆ„μ„μ—μ„œ μ‚¬μš©μžμ—κ²Œ μ£Όμ–΄μ§€λŠ” 쀑간 계산 κ²°κ³ΌλŠ” μ΅œμ’… 결과의 κ·Όμ‚¬μΉ˜μ΄λ―€λ‘œ λΆˆν™•μ‹€μ„±μ΄ μ‘΄μž¬ν•œλ‹€. λ³Έ μ—°κ΅¬μ—μ„œλŠ” μ„Έμ΄ν”„κ°€λ“œλ₯Ό μ΄μš©ν•œ 점진적 μ‹œκ°μ  뢄석(Progressive Visual Analytics with Safeguards)μ΄λΌλŠ” μƒˆλ‘œμš΄ κ°œλ…μ„ μ œμ•ˆν•œλ‹€. 이 κ°œλ…μ€ μ‚¬μš©μžκ°€ 점진적 νƒμƒ‰μ—μ„œ λ§ˆμ£Όν•˜λŠ” λΆˆν™•μ‹€ν•œ 쀑간 지식에 μ„Έμ΄ν”„κ°€λ“œλ₯Ό 남길 수 μžˆλ„λ‘ ν•˜μ—¬ νƒμƒ‰μ—μ„œ 얻은 μ§€μ‹μ˜ 정확도λ₯Ό μΆ”ν›„ 검증할 수 μžˆλ„λ‘ ν•œλ‹€. λ˜ν•œ, μ΄λŸ¬ν•œ κ°œλ…μ„ μ‹€μ œλ‘œ κ΅¬ν˜„ν•˜μ—¬ νƒ‘μž¬ν•œ ProReveal μ‹œμŠ€ν…œμ„ μ†Œκ°œν•œλ‹€. ProRevealλ₯Ό μ΄μš©ν•œ μ‚¬μš©μž μ‹€ν—˜μ—μ„œ μ‚¬μš©μžλ“€μ€ μ„Έμ΄ν”„κ°€λ“œλ₯Ό μ„±κ³΅μ μœΌλ‘œ λ§Œλ“€ 수 μžˆμ—ˆμ„ 뿐만 μ•„λ‹ˆλΌ, 쀑간 μ§€μ‹μ˜ λΆˆν™•μ‹€μ„±μ„ 닀루기 μœ„ν•΄ μ„Έμ΄ν”„κ°€λ“œλ₯Ό 자발적으둜 μ΄μš©ν•œλ‹€λŠ” 것을 μ•Œ 수 μžˆμ—ˆλ‹€. λ§ˆμ§€λ§‰μœΌλ‘œ, μœ„ μ„Έ 가지 μ—°κ΅¬μ˜ κ²°κ³Όλ₯Ό μ’…ν•©ν•˜μ—¬ 점진적 μ‹œκ°μ  뢄석 μ‹œμŠ€ν…œμ„ κ΅¬ν˜„ν•  λ•Œμ˜ λ””μžμΈμ  λ‚œμ œμ™€ ν–₯ν›„ 연ꡬ λ°©ν–₯을 λͺ¨μƒ‰ν•œλ‹€.CHAPTER1. Introduction 2 1.1 Background and Motivation 2 1.2 Thesis Statement and Research Questions 5 1.3 Thesis Contributions 5 1.3.1 Responsive and Incremental Visual Exploration of Large-scale Multidimensional Data 6 1.3.2 ProgressiveComputation of Approximate k-Nearest Neighbors and Responsive t-SNE 7 1.3.3 Progressive Visual Analytics with Safeguards 8 1.4 Structure of Dissertation 9 CHAPTER2. Related Work 11 2.1 Progressive Visual Analytics 11 2.1.1 Definitions 11 2.1.2 System Latency and Human Factors 13 2.1.3 Users, Tasks, and Models 15 2.1.4 Techniques, Algorithms, and Systems. 17 2.1.5 Uncertainty Visualization 19 2.2 Approaches for Scalable Visualization Systems 20 2.3 The k-Nearest Neighbor (KNN) Problem 22 2.4 t-Distributed Stochastic Neighbor Embedding 26 CHAPTER3. SwiTuna: Responsive and Incremental Visual Exploration of Large-scale Multidimensional Data 28 3.1 The SwiTuna Design 31 3.1.1 Design Considerations 32 3.1.2 System Overview 33 3.1.3 Scalable Visualization Components 36 3.1.4 Visualization Cards 40 3.1.5 User Interface and Interaction 42 3.2 Responsive Querying 44 3.2.1 Querying Pipeline 44 3.2.2 Prompt Responses 47 3.2.3 Incremental Processing 47 3.3 Evaluation: Performance Benchmark 49 3.3.1 Study Design 49 3.3.2 Results and Discussion 52 3.4 Implementation 56 3.5 Summary 56 CHAPTER4. PANENE:AProgressive Algorithm for IndexingandQuerying Approximate k-Nearest Neighbors 58 4.1 Approximate k-Nearest Neighbor 61 4.1.1 A Sequential Algorithm 62 4.1.2 An Online Algorithm 63 4.1.3 A Progressive Algorithm 66 4.1.4 Filtered AKNN Search 71 4.2 k-Nearest Neighbor Lookup Table 72 4.3 Benchmark. 78 4.3.1 Online and Progressive k-d Trees 78 4.3.2 k-Nearest Neighbor Lookup Tables 83 4.4 Applications 85 4.4.1 Progressive Regression and Density Estimation 85 4.4.2 Responsive t-SNE 87 4.5 Implementation 92 4.6 Discussion 92 4.7 Summary 93 CHAPTER5. ProReveal: Progressive Visual Analytics with Safeguards 95 5.1 Progressive Visual Analytics with Safeguards 98 5.1.1 Definition 98 5.1.2 Examples 101 5.1.3 Design Considerations 103 5.2 ProReveal 105 5.3 Evaluation 121 5.4 Discussion 127 5.5 Summary 130 CHAPTER6. Discussion 132 6.1 Lessons Learned 132 6.2 Limitations 135 CHAPTER7. Conclusion 137 7.1 Thesis Contributions Revisited 137 7.2 Future Research Agenda 139 7.3 Final Remarks 141 Abstract (Korean) 155 Acknowledgments (Korean) 157Docto
    • …
    corecore