23 research outputs found
Sensemaking for Broad Topics via Automated Extraction and Recursive Search
The availability of vast amounts of diverse information related to a broad topic makes it difficult and time-consuming for users to find and digest the right information regarding various low-level topics within the broader space. Current approaches to addressing these challenges include providing curated topical pages, relevant query refinement suggestions, list of subtopics, etc. However, these approaches do not scale and offer inadequate support for sensemaking. This disclosure describes automated techniques that extract information from online information sources by using a query related to a high-level topic to recursively formulate additional queries for subtopics to construct a hierarchical set of topics related to the broad query. The results can be utilized to provide a user interface using the hierarchical topic levels which can make it faster and easier for users to understand and navigate information regarding a high-level topic
Automated Extraction of Pivot Topics for Sideways Expansion of Search Scope
Users benefit from mechanisms that can help them refine their queries to facilitate searching for information connected to their underlying intent. Apart from refinements to narrow the scope of a query, users can benefit from suggestions that can help them pivot their information seeking by expanding their search sideways to related topics. This disclosure describes computational techniques for automated determination of suitable topics and/or queries for helping users expand the scope of their information search by pivoting to topics related to their query. The techniques involve selecting a meta-query, performing query expansion, identifying, aggregating, and deduplicating related entities. The identified entities are clustered and ranked to enable selection of particular entities that can be shown to users as pivot topics
The effect of deceptive idleness on disk schedulers
Disk schedulers in operating systems are generally work-conserving; they schedule a request immediately after the previous request has finished. Such schedulers need mul-tiple outstanding requests to make good decisions. Unfortunately, many applications issue synchronous, almost-continuous streams of read requests. This forces the sched-uler into making decisions too early, falsely assuming that the process has become momentarily idle. This phenomenon of deceptive idleness causes significant degrada-tion in performance and quality of service objectives on current systems. We solve deceptive idleness by designing and implementing a transparent, non-work-conserving scheduling framework for various scheduling policies. We evaluate this solution on mi-crobenchmarks and real workloads, and observe large benefits. The Apache webserver delivers 56 % and 16 % more throughput for two configurations. The Andrew Bench-mark runs faster by 8 % (54 % for the read-intensive phase). Variants of the TPC-B database benchmark exhibit improvements between 4 % and 60%. Proportional-share schedulers become empowered to efficiently deliver application-desired proportions
Advanced memory management and disk scheduling techniques for general-purpose operating systems
Operating systems have evolved into sophisticated, high-performance virtualizing platforms, to support and be fair towards concurrently running applications. However, since applications usually run oblivious of each other and prefer narrow system interfaces, they inadvertently contend for resources, resulting in inappropriate allocations and significant performance degradations. This dissertation identifies and eliminates two such problems: one we call rigidity in physical memory management which we solve using adaptive memory management, and a second we call deceptive idleness in disk schedulers that we solve through anticipatory disk scheduling .
Many applications, their libraries, and runtimes can trade memory consumption for performance by maintaining caches, triggering garbage collection, etc. However, due to ignorance of memory pressure in the system, they are forced to be conservative about memory usage. Adaptive memory management is a technique that informs applications of the severity of memory pressure via a metric that quantifies the cost of using memory. This enables applications to allocate memory liberally when available (with performance benefits of 20% to 300%); and to release it under contention. The system thus reaches an equilibrium that balances the impact of memory pressure on each application; adapts to avoid paging during load bursts and improves stability and responsiveness; and reduces the need for manual configuration of memory footprints. It also provides finer control on memory usage by adapting proportional to application priorities.
Disk schedulers generally schedule a request as soon as the previous request has finished. Unfortunately, many applications perform synchronous I/O by issuing a request after the previous request has been served. This causes the scheduler to suffer from deceptive idleness, a condition where it incorrectly assumes that the process has no further requests, and seeks to a request from another process. Anticipatory dish scheduling transparently solves this problem by sometimes injecting a small, controlled delay into the disk scheduler before it makes a scheduling decision, whenever it expects the current request to be quickly followed by another nearby request. This improves performance by up to 70% and enables proportional schedulers to achieve their contracts. Anticipatory scheduling has been ported to Linux, where it is now the default disk scheduler
The effect of deceptive idleness on disk schedulers
Disk schedulers in operating systems are generally work-conserving; they schedule a request immediately after the previous request has finished. Such schedulers need multiple outstanding requests to make good decisions. Unfortunately, many applications issue synchronous, almost-continuous streams of read requests. This forces the scheduler into making decisions too early, falsely assuming that the process has become momentarily idle. This phenomenon of deceptive idleness causes significant degradation in performance and quality of service objectives on current systems. We solve deceptive idleness by designing and implementing a transparent, non-work-conserving scheduling framework for various scheduling policies. We evaluate this solution on microbenchmarks and real workloads, and observe large benefits. The Apache webserver delivers 56% and 16% more throughput for two configurations. The Andrew Benchmark runs faster by 8% (54% for the read-intensive phase). Variants of the TPC-B database benchmark exhibit improvements between 4% and 60%. Proportional-share schedulers become empowered to efficiently deliver application-desired proportions
To appear in the 21th ACM Symposium on Principles of Distributed Computing (PODC 2002) Squirrel: A decentralized peer-to-peer web cache
This paper presents a decentralized, peer-to-peer web cache called Squirrel. The key idea is to enable web browsers on desktop machines to share their local caches, to form an efficient and scalable web cache, without the need for dedicated hardware and the associated administrative cost. We propose and evaluate decentralized web caching algorithms for Squirrel, and discover that it exhibits performance comparable to a centralized web cache in terms of hit ratio, bandwidth usage and latency. It also achieves the benefits of decentralization, such as being scalable, self-organizing and resilient to node failures, while imposing low overhead on the participating nodes. 1
\Lambda
ABSTRACT This paper presents a decentralized, peer-to-peer web cache called Squirrel. The key idea is to enable web browsers on desktop machines to share their local caches, to form an efficient and scalable web cache, without the need for dedicated hardware and the associated administrative cost. We propose and evaluate decentralized web caching algorithms for Squirrel, and discover that it exhibits performance comparable to a centralized web cache in terms of hit ratio, bandwidth usage and latency. It also achieves the benefits of decentralization, such as being scalable, self-organizing and resilient to node failures, while imposing low overhead on the participating nodes. 1. INTRODUCTION Web caching is a widely deployed technique to reduce the latency observed by web browsers, decrease the aggregate bandwidth consumption of an organization's network, and reduce the load incident on web servers on the Internet [5, 11, 22]. Web caches are often deployed on dedicated machines at the boundary of corporate networks, and at Internet service providers. This paper presents an alternative for the former case, in which client desktop machines themselves cooperate in a peer-to-peer fashion to provide the functionality of a web cache. This paper proposes decentralized algorithms for the web caching problem, and evaluates their performance against each other and against a traditional centralized web cache. The key idea in Squirrel is to facilitate mutual sharing of web objects among client nodes. Currently, web browsers on every node maintain a local cache of web objects recently accessed by the browser. Squirrel enables these nodes to export their local caches to other nodes in the corporate network, thus synthesizing a large shared virtual web cache. Each node then performs both web browsing and web caching
The Effect of Deceptive Idleness on Disk Schedulers
Disk schedulers in operating systems are generally work-conserving; they schedule a request immediately after the previous request has finished. Such schedulers need multiple outstanding requests to make good decisions. Unfortunately, many applications issue synchronous, almost-continuous streams of read requests. This forces the scheduler into making decisions too early, falsely assuming that the process has become momentarily idle. This phenomenon of deceptive idleness causes significant degradation in performance and quality of service objectives on current systems. We solve deceptive idleness by designing and implementing a transparent, non-work-conserving scheduling framework for various scheduling policies. We evaluate this solution on micro benchmarks and real workloads, and observe large benefits. The Apache web server delivers 56% and 16% more throughput for two configurations. The Andrew Benchmark runs faster by 8% (54% for the read-intensive phase). Variants of the TPC-B database benchmark exhibit improvements between 4% and 60%. Proportional-share schedulers become empowered to efficiently deliver application-desired proportions