10,140 research outputs found
Growing a Tree in the Forest: Constructing Folksonomies by Integrating Structured Metadata
Many social Web sites allow users to annotate the content with descriptive
metadata, such as tags, and more recently to organize content hierarchically.
These types of structured metadata provide valuable evidence for learning how a
community organizes knowledge. For instance, we can aggregate many personal
hierarchies into a common taxonomy, also known as a folksonomy, that will aid
users in visualizing and browsing social content, and also to help them in
organizing their own content. However, learning from social metadata presents
several challenges, since it is sparse, shallow, ambiguous, noisy, and
inconsistent. We describe an approach to folksonomy learning based on
relational clustering, which exploits structured metadata contained in personal
hierarchies. Our approach clusters similar hierarchies using their structure
and tag statistics, then incrementally weaves them into a deeper, bushier tree.
We study folksonomy learning using social metadata extracted from the
photo-sharing site Flickr, and demonstrate that the proposed approach addresses
the challenges. Moreover, comparing to previous work, the approach produces
larger, more accurate folksonomies, and in addition, scales better.Comment: 10 pages, To appear in the Proceedings of ACM SIGKDD Conference on
Knowledge Discovery and Data Mining(KDD) 201
The horse before the cart: improving the accuracy of taxonomic directions when building tag hierarchies
Content on the Web is huge and constantly growing, and building taxonomies for such content can help with navigation and organisation, but building taxonomies manually is costly and time-consuming. An alternative is to allow users to construct folksonomies: collective social classifications. Yet, folksonomies are inconsistent and their use for searching and browsing is limited. Approaches have been suggested for acquiring implicit hierarchical structures from folksonomies, however, but these approaches suffer from the โpopularity-generalityโ problem, in that popularity is assumed to be a proxy for generality, i.e. high-level taxonomic terms will occur more often than low-level ones. To tackle this problem, we propose in this paper an improved approach. It is based on the HeymannโBenz algorithm, and works by checking the taxonomic directions against a corpus of text. Our results show that popularity works as a proxy for generality in at most 90.91% of cases, but this can be improved to 95.45% using our approach, which should translate to higher-quality tag hierarchy structure
Multi-Target Prediction: A Unifying View on Problems and Methods
Multi-target prediction (MTP) is concerned with the simultaneous prediction
of multiple target variables of diverse type. Due to its enormous application
potential, it has developed into an active and rapidly expanding research field
that combines several subfields of machine learning, including multivariate
regression, multi-label classification, multi-task learning, dyadic prediction,
zero-shot learning, network inference, and matrix completion. In this paper, we
present a unifying view on MTP problems and methods. First, we formally discuss
commonalities and differences between existing MTP problems. To this end, we
introduce a general framework that covers the above subfields as special cases.
As a second contribution, we provide a structured overview of MTP methods. This
is accomplished by identifying a number of key properties, which distinguish
such methods and determine their suitability for different types of problems.
Finally, we also discuss a few challenges for future research
STT-RAM์ ์ด์ฉํ ์๋์ง ํจ์จ์ ์ธ ์บ์ ์ค๊ณ ๊ธฐ์
ํ์๋
ผ๋ฌธ (๋ฐ์ฌ)-- ์์ธ๋ํ๊ต ๋ํ์ : ๊ณต๊ณผ๋ํ ์ ๊ธฐยท์ปดํจํฐ๊ณตํ๋ถ, 2019. 2. ์ต๊ธฐ์.์ง๋ ์์ญ ๋
๊ฐ '๋ฉ๋ชจ๋ฆฌ ๋ฒฝ' ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํด ์จ ์นฉ ์บ์์ ํฌ๊ธฐ๋ ๊พธ์คํ ์ฆ๊ฐํด์๋ค. ํ์ง๋ง ์ง๊ธ๊น์ง ์บ์์ ์ฃผ๋ก ์ฌ์ฉ๋์ด ์จ ๋ฉ๋ชจ๋ฆฌ ๊ธฐ์ ์ธ SRAM์ ๋ฎ์ ์ง์ ๋์ ๋์ ๋๊ธฐ ์ ๋ ฅ ์๋ชจ๋ก ์ธํด ํฐ ์บ์๋ฅผ ๊ตฌ์ฑํ๋ ๋ฐ์๋ ์ ํฉํ์ง ์๋ค. ์ด๋ฌํ SRAM์ ๋จ์ ์ ๋ณด์ํ๊ธฐ ์ํด ๋ ๋์ ์ง์ ๋์ ๋ฎ์ ๋๊ธฐ ์ ๋ ฅ์ ์๋ชจํ๋ ์๋ก์ด ๋ฉ๋ชจ๋ฆฌ ๊ธฐ์ ์ธ STT-RAM์ผ๋ก SRAM์ ๋์ฒดํ๋ ๊ฒ์ด ์ ์๋์๋ค. ํ์ง๋ง STT-RAM์ ๋ฐ์ดํฐ๋ฅผ ์ธ ๋ ๋ง์ ์๋์ง์ ์๊ฐ์ ์๋นํ๊ธฐ ๋๋ฌธ์ ๋จ์ํ SRAM์ STT-RAM์ผ๋ก ๋์ฒดํ๋ ๊ฒ์ ์คํ๋ ค ์บ์ ์๋์ง ์๋น๋ฅผ ์ฆ๊ฐ์ํจ๋ค. ์ด๋ฌํ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํด ๋ณธ ๋
ผ๋ฌธ์์๋ STT-RAM์ ์ด์ฉํ ์๋์ง ํจ์จ์ ์ธ ์บ์ ์ค๊ณ ๊ธฐ์ ๋ค์ ์ ์ํ๋ค.
์ฒซ ๋ฒ์งธ, ๋ฐฐํ์ ์บ์ ๊ณ์ธต ๊ตฌ์กฐ์์ STT-RAM์ ํ์ฉํ๋ ๋ฐฉ๋ฒ์ ์ ์ํ์๋ค. ๋ฐฐํ์ ์บ์ ๊ณ์ธต ๊ตฌ์กฐ๋ ๊ณ์ธต ๊ฐ์ ์ค๋ณต๋ ๋ฐ์ดํฐ๊ฐ ์๊ธฐ ๋๋ฌธ์ ํฌํจ์ ์บ์ ๊ณ์ธต ๊ตฌ์กฐ์ ๋น๊ตํ์ฌ ๋ ํฐ ์ ํจ ์ฉ๋์ ๊ฐ์ง๋ง, ๋ฐฐํ์ ์บ์ ๊ณ์ธต ๊ตฌ์กฐ์์๋ ์์ ๋ ๋ฒจ ์บ์์์ ๋ด๋ณด๋ด์ง ๋ชจ๋ ๋ฐ์ดํฐ๋ฅผ ํ์ ๋ ๋ฒจ ์บ์์ ์จ์ผ ํ๋ฏ๋ก ๋ ๋ง์ ์์ ๋ฐ์ดํฐ๋ฅผ ์ฐ๊ฒ ๋๋ค. ์ด๋ฌํ ๋ฐฐํ์ ์บ์ ๊ณ์ธต ๊ตฌ์กฐ์ ํน์ฑ์ ์ฐ๊ธฐ ํน์ฑ์ด ๋จ์ ์ธ STT-RAM์ ํจ๊ป ํ์ฉํ๋ ๊ฒ์ ์ด๋ ต๊ฒ ํ๋ค. ์ด๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํด ๋ณธ ๋
ผ๋ฌธ์์๋ ์ฌ์ฌ์ฉ ๊ฑฐ๋ฆฌ ์์ธก์ ๊ธฐ๋ฐ์ผ๋ก ํ๋ SRAM/STT-RAM ํ์ด๋ธ๋ฆฌ๋ ์บ์ ๊ตฌ์กฐ๋ฅผ ์ค๊ณํ์๋ค.
๋ ๋ฒ์งธ, ๋นํ๋ฐ์ฑ STT-RAM์ ์ด์ฉํด ์บ์๋ฅผ ์ค๊ณํ ๋ ๊ณ ๋ คํด์ผ ํ ์ ๋ค์ ๋ํด ๋ถ์ํ์๋ค. STT-RAM์ ๋นํจ์จ์ ์ธ ์ฐ๊ธฐ ๋์์ ์ค์ด๊ธฐ ์ํด ๋ค์ํ ํด๊ฒฐ๋ฒ๋ค์ด ์ ์๋์๋ค. ๊ทธ์ค ํ ๊ฐ์ง๋ STT-RAM ์์๊ฐ ๋ฐ์ดํฐ๋ฅผ ์ ์งํ๋ ์๊ฐ์ ์ค์ฌ (ํ๋ฐ์ฑ STT-RAM) ์ฐ๊ธฐ ํน์ฑ์ ํฅ์ํ๋ ๋ฐฉ๋ฒ์ด๋ค. STT-RAM์ ์ ์ฅ๋ ๋ฐ์ดํฐ๋ฅผ ์๋ ๊ฒ์ ํ๋ฅ ์ ์ผ๋ก ๋ฐ์ํ๊ธฐ ๋๋ฌธ์ ์ ์ฅ๋ ๋ฐ์ดํฐ๋ฅผ ์์ ์ ์ผ๋ก ์ ์งํ๊ธฐ ์ํด์๋ ์ค๋ฅ ์ ์ ๋ถํธ(ECC)๋ฅผ ์ด์ฉํด ์ฃผ๊ธฐ์ ์ผ๋ก ์ค๋ฅ๋ฅผ ์ ์ ํด์ฃผ์ด์ผ ํ๋ค. ๋ณธ ๋
ผ๋ฌธ์์๋ STT-RAM ๋ชจ๋ธ์ ์ด์ฉํ์ฌ ํ๋ฐ์ฑ STT-RAM ์ค๊ณ ์์๋ค์ ๋ํด ๋ถ์ํ์๊ณ ์คํ์ ํตํด ํด๋น ์ค๊ณ ์์๋ค์ด ์บ์ ์๋์ง์ ์ฑ๋ฅ์ ์ฃผ๋ ์ํฅ์ ๋ณด์ฌ์ฃผ์๋ค.
๋ง์ง๋ง์ผ๋ก, ๋งค๋์ฝ์ด ์์คํ
์์์ ๋ถ์ฐ ํ์ด๋ธ๋ฆฌ๋ ์บ์ ๊ตฌ์กฐ๋ฅผ ์ค๊ณํ์๋ค. ๋จ์ํ ๊ธฐ์กด์ ํ์ด๋ธ๋ฆฌ๋ ์บ์์ ๋ถ์ฐ์บ์๋ฅผ ๊ฒฐํฉํ๋ฉด ํ์ด๋ธ๋ฆฌ๋ ์บ์์ ํจ์จ์ฑ์ ํฐ ์ํฅ์ ์ฃผ๋ SRAM ํ์ฉ๋๊ฐ ๋ฎ์์ง๋ค. ๋ฐ๋ผ์ ๊ธฐ์กด์ ํ์ด๋ธ๋ฆฌ๋ ์บ์ ๊ตฌ์กฐ์์์ ์๋์ง ๊ฐ์๋ฅผ ๊ธฐ๋ํ ์ ์๋ค. ๋ณธ ๋
ผ๋ฌธ์์๋ ๋ถ์ฐ ํ์ด๋ธ๋ฆฌ๋ ์บ์ ๊ตฌ์กฐ์์ SRAM ํ์ฉ๋๋ฅผ ๋์ผ ์ ์๋ ๋ ๊ฐ์ง ์ต์ ํ ๊ธฐ์ ์ธ ๋ฑ
ํฌ-๋ด๋ถ ์ต์ ํ์ ๋ฑ
ํฌ๊ฐ ์ต์ ํ ๊ธฐ์ ์ ์ ์ํ์๋ค. ๋ฑ
ํฌ-๋ด๋ถ ์ต์ ํ๋ highly-associative ์บ์๋ฅผ ํ์ฉํ์ฌ ๋ฑ
ํฌ ๋ด๋ถ์์ ์ฐ๊ธฐ ๋์์ด ๋ง์ ๋ฐ์ดํฐ๋ฅผ ๋ถ์ฐ์ํค๋ ๊ฒ์ด๊ณ ๋ฑ
ํฌ๊ฐ ์ต์ ํ๋ ์๋ก ๋ค๋ฅธ ์บ์ ๋ฑ
ํฌ์ ์ฐ๊ธฐ ๋์์ด ๋ง์ ๋ฐ์ดํฐ๋ฅผ ๊ณ ๋ฅด๊ฒ ๋ถ์ฐ์ํค๋ ์ต์ ํ ๋ฐฉ๋ฒ์ด๋ค.Over the last decade, the capacity of on-chip cache is continuously increased to mitigate the memory wall problem. However, SRAM, which is a dominant memory technology for caches, is not suitable for such a large cache because of its low density and large static power. One way to mitigate these downsides of the SRAM cache is replacing SRAM with a more efficient memory technology. Spin-Transfer Torque RAM (STT-RAM), one of the emerging memory technology, is a promising candidate for the alternative of SRAM. As a substitute of SRAM, STT-RAM can compensate drawbacks of SRAM with its non-volatility and small cell size. However, STT-RAM has poor write characteristics such as high write energy and long write latency and thus simply replacing SRAM to STT-RAM increases cache energy. To overcome those poor write characteristics of STT-RAM, this dissertation explores three different design techniques for energy-efficient cache using STT-RAM.
The first part of the dissertation focuses on combining STT-RAM with exclusive cache hierarchy. Exclusive caches are known to provide higher effective cache capacity than inclusive caches by removing duplicated copies of cache blocks across hierarchies. However, in exclusive cache hierarchies, every block evicted from the upper-level cache is written back to the last-level cache regardless of its dirtiness thereby incurring extra write overhead. This makes it challenging to use STT-RAM for exclusive last-level caches due to its high write energy and long write latency. To mitigate this problem, we design an SRAM/STT-RAM hybrid cache architecture based on reuse distance prediction.
The second part of the dissertation explores trade-offs in the design of volatile STT-RAM cache. Due to the inefficient write operation of STT-RAM, various solutions have been proposed to tackle this inefficiency. One of the proposed solutions is redesigning STT-RAM cell for better write characteristics at the cost of shortened retention time (i.e., volatile STT-RAM). Since the retention failure of STT-RAM has a stochastic property, an extra overhead of periodic scrubbing with error correcting code (ECC) is required to tolerate the failure. With an analysis based on analytic STT-RAM model, we have conducted extensive experiments on various volatile STT-RAM cache design parameters including scrubbing period, ECC strength, and target failure rate. The experimental results show the impact of the parameter variations on last-level cache energy and performance and provide a guideline for designing a volatile STT-RAM with ECC and scrubbing.
The last part of the dissertation proposes Benzene, an energy-efficient distributed SRAM/STT-RAM hybrid cache architecture for manycore systems running multiple applications. It is based on the observation that a naive application of hybrid cache techniques to distributed caches in a manycore architecture suffers from limited energy reduction due to uneven utilization of scarce SRAM. We propose two-level optimization techniques: intra-bank and inter-bank. Intra-bank optimization leverages highly-associative cache design, achieving more uniform distribution of writes within a bank. Inter-bank optimization evenly balances the amount of write-intensive data across the banks.Abstract i
Contents iii
List of Figures vii
List of Tables xi
Chapter 1 Introduction 1
1.1 Exclusive Last-Level Hybrid Cache 2
1.2 Designing Volatile STT-RAM Cache 4
1.3 Distributed Hybrid Cache 5
Chapter 2 Background 9
2.1 STT-RAM 9
2.1.1 Thermal Stability 10
2.1.2 Read and Write Operation of STT-RAM 11
2.1.3 Failures of STT-RAM 11
2.1.4 Volatile STT-RAM 13
2.1.5 Related Work 14
2.2 Exclusive Last-Level Hybrid Cache 18
2.2.1 Cache Hierarchies 18
2.2.2 Related Work 19
2.3 Distributed Hybrid Cache 21
2.3.1 Prediction Hybrid Cache 21
2.3.2 Distributed Cache Partitioning 22
2.3.3 Related Work 23
Chapter 3 Exclusive Last-Level Hybrid Cache 27
3.1 Motivation 27
3.1.1 Exclusive Cache Hierarchy 27
3.1.2 Reuse Distance 29
3.2 Architecture 30
3.2.1 Reuse Distance Predictor 30
3.2.2 Hybrid Cache Architecture 32
3.3 Evaluation 34
3.3.1 Methodology 34
3.3.2 LLC Energy Consumption 35
3.3.3 Main Memory Energy Consumption 38
3.3.4 Performance 39
3.3.5 Area Overhead 39
3.4 Summary 39
Chapter 4 Designing Volatile STT-RAM Cache 41
4.1 Analysis 41
4.1.1 Retention Failure of a Volatile STT-RAM Cell 41
4.1.2 Memory Array Design 43
4.2 Evaluation 45
4.2.1 Methodology 45
4.2.2 Last-Level Cache Energy 46
4.2.3 Performance 51
4.3 Summary 52
Chapter 5 Distributed Hybrid Cache 55
5.1 Motivation 55
5.2 Architecture 58
5.2.1 Intra-Bank Optimization 59
5.2.2 Inter-Bank Optimization 63
5.2.3 Other Optimizations 67
5.3 Evaluation Methodology 69
5.4 Evaluation Results 73
5.4.1 Energy Consumption and Performance 73
5.4.2 Analysis of Intra-bank Optimization 76
5.4.3 Analysis of Inter-bank Optimization 78
5.4.4 Impact of Inter-Bank Optimization on Network Energy 79
5.4.5 Sensitivity Analysis 80
5.4.6 Implementation Overhead 81
5.5 Summary 82
Chapter 6 Conculsion 85
Bibliography 88
์ด๋ก 101Docto
A Study on Performance and Power Efficiency of Dense Non-Volatile Caches in Multi-Core Systems
In this paper, we present a novel cache design based on Multi-Level Cell
Spin-Transfer Torque RAM (MLC STTRAM) that can dynamically adapt the set
capacity and associativity to use efficiently the full potential of MLC STTRAM.
We exploit the asymmetric nature of the MLC storage scheme to build cache lines
featuring heterogeneous performances, that is, half of the cache lines are
read-friendly, while the other is write-friendly. Furthermore, we propose to
opportunistically deactivate ways in underutilized sets to convert MLC to
Single-Level Cell (SLC) mode, which features overall better performance and
lifetime. Our ultimate goal is to build a cache architecture that combines the
capacity advantages of MLC and performance/energy advantages of SLC. Our
experiments show an improvement of 43% in total numbers of conflict misses, 27%
in memory access latency, 12% in system performance, and 26% in LLC access
energy, with a slight degradation in cache lifetime (about 7%) compared to an
SLC cache
Low Cost Quality of Service Multicast Routing in High Speed Networks
Many of the services envisaged for high speed networks, such as B-ISDN/ATM, will support real-time applications with large numbers of users. Examples of these types of application range from those used by closed groups, such as private video meetings or conferences, where all participants must be known to the sender, to applications used by open groups, such as video lectures, where partcipants need not be known by the sender. These types of application will require high volumes of network resources in addition to the real-time delay constraints on data delivery. For these reasons, several multicast routing heuristics have been proposed to support both interactive and distribution multimedia services, in high speed networks. The objective of such heuristics is to minimise the multicast tree cost while maintaining a real-time bound on delay. Previous evaluation work has compared the relative average performance of some of these heuristics and concludes that they are generally efficient, although some perform better for small multicast groups and others perform better for larger groups. Firstly, we present a detailed analysis and evaluation of some of these heuristics which illustrates that in some situations their average performance is reversed; a heuristic that in general produces efficient solutions for small multicasts may sometimes produce a more efficient solution for a particular large multicast, in a specific network. Also, in a limited number of cases using Dijkstra's algorithm produces the best result. We conclude that the efficiency of a heuristic solution depends on the topology of both the network and the multicast, and that it is difficult to predict. Because of this unpredictability we propose the integration of two heuristics with Dijkstra's shortest path tree algorithm to produce a hybrid that consistently generates efficient multicast solutions for all possible multicast groups in any network. These heuristics are based on Dijkstra's algorithm which maintains acceptable time complexity for the hybrid, and they rarely produce inefficient solutions for the same network/multicast. The resulting performance attained is generally good and in the rare worst cases is that of the shortest path tree. The performance of our hybrid is supported by our evaluation results. Secondly, we examine the stability of multicast trees where multicast group membership is dynamic. We conclude that, in general, the more efficient the solution of a heuristic is, the less stable the multicast tree will be as multicast group membership changes. For this reason, while the hybrid solution we propose might be suitable for use with closed user group multicasts, which are likely to be stable, we need a different approach for open user group multicasting, where group membership may be highly volatile. We propose an extension to an existing heuristic that ensures multicast tree stability where multicast group membership is dynamic. Although this extension decreases the efficiency of the heuristics solutions, its performance is significantly better than that of the worst case, a shortest path tree. Finally, we consider how we might apply the hybrid and the extended heuristic in current and future multicast routing protocols for the Internet and for ATM Networks.
Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code
This paper introduces Tiramisu, a polyhedral framework designed to generate
high performance code for multiple platforms including multicores, GPUs, and
distributed machines. Tiramisu introduces a scheduling language with novel
extensions to explicitly manage the complexities that arise when targeting
these systems. The framework is designed for the areas of image processing,
stencils, linear algebra and deep learning. Tiramisu has two main features: it
relies on a flexible representation based on the polyhedral model and it has a
rich scheduling language allowing fine-grained control of optimizations.
Tiramisu uses a four-level intermediate representation that allows full
separation between the algorithms, loop transformations, data layouts, and
communication. This separation simplifies targeting multiple hardware
architectures with the same algorithm. We evaluate Tiramisu by writing a set of
image processing, deep learning, and linear algebra benchmarks and compare them
with state-of-the-art compilers and hand-tuned libraries. We show that Tiramisu
matches or outperforms existing compilers and libraries on different hardware
architectures, including multicore CPUs, GPUs, and distributed machines.Comment: arXiv admin note: substantial text overlap with arXiv:1803.0041
- โฆ