13 research outputs found
Rack-Scale Memory Pooling for Datacenters
The rise of web-scale services has led to a staggering growth in user data on the Internet. To transform such a vast raw data into valuable information for the user and provide quality assurances, it is important to minimize access latency and enable in-memory processing. For more than a decade, the only practical way to accommodate for ever-growing data in memory has been to scale out server resources, which has led to the emergence of large-scale datacenters and distributed non-relational databases (NoSQL). Such horizontal scaling of resources translates to an increasing number of servers that participate in processing individual user requests. Typically, each user request results in hundreds of independent queries targeting different NoSQL nodes - servers, and the larger the number of servers involved, the higher the fan-out. To complete a single user request, all of the queries associated with that request have to complete first, and thus, the slowest query determines the completion time. Because of skewed popularity distributions and resource contention, the more servers we have, the harder it is to achieve high throughput and facilitate server utilization, without violating service level objectives. This thesis proposes rack-scale memory pooling (RSMP), a new scaling technique for future datacenters that reduces networking overheads and improves the performance of core datacenter software. RSMP is an approach to building larger, rack-scale capacity units for datacenters through specialized fabric interconnects with support for one-sided operations, and using them, in lieu of conventional servers (e.g. 1U), to scale out. We define an RSMP unit to be a server rack connecting 10s to 100s of servers to a secondary network enabling direct, low-latency access to the global memory of the rack. We, then, propose a new RSMP design - Scale-Out NUMA that leverages integration and a NUMA fabric to bridge the gap between local and remote memory to only 5× difference in access latency. Finally, we show how RSMP impacts NoSQL data serving, a key datacenter service used by most web-scale applications today. We show that using fewer larger data shards leads to less load imbalance and higher effective throughput, without violating applications¿ service level objectives. For example, by using Scale-Out NUMA, RSMP improves the throughput of a key-value store up to 8.2× over a traditional scale-out deployment
Recommended from our members
Logic, parallelism and semantic networks : the binary predicate execution model
This thesis develops the Binary Predicate Execution Model; a distributed, massively-parallel system for semantic networks and knowledge bases that is built on a subset of first-order predicate logic. The use of logic gives the model an easily-understood programming paradigm and a well-defined semantics of execution. When expressed in binary predicates, a simple graphical interpretation can be used. All program facts are represented in an assertion graph. Each vertex is associated with a term appearing in a fact and the edges are labeled with the predicate names. Similar graphs are also associated with each rule body and the query. Finding all possible solutions corresponds to finding all possible matches between the query graph and the assertion graph. Invoking a rule corresponds to substituting the graph of its body constrained by the dependencies between its arguments. This can be implemented in a parallel, message-passing fashion where the assertion graph vertices are active processing elements which asynchronously exchange messages identifying different parts of the query that remain to be matched and containing any binding information from previous matching required to accomplish this. The model is data-driven since every message can be immediately processed without the need for any centralized control or centralized memory. By restricting how functional terms can occur, distributed data structures and remote data look-ups for unification are eliminated. Thus, the model's performance on increasingly larger problems scales-up given increasingly larger machines in most cases. Architectural support for the model is investigated and simulation results of a relatively simple software implementation are reported. This suggests performance on the order of 10^5 logical inferences per second for 256 processing elements in an n-cube configuration. Further research directions, including that of increasing efficiency, are discussed
Models and algorithms for parallel text retrieval
Cataloged from PDF version of article.In the last decade, search engines became an integral part of our lives. The current
state-of-the-art in search engine technology relies on parallel text retrieval.
Basically, a parallel text retrieval system is composed of three components: a
crawler, an indexer, and a query processor. The crawler component aims to locate,
fetch, and store the Web pages in a local document repository. The indexer
component converts the stored, unstructured text into a queryable form, most
often an inverted index. Finally, the query processing component performs the
search over the indexed content. In this thesis, we present models and algorithms
for efficient Web crawling and query processing. First, for parallel Web
crawling, we propose a hybrid model that aims to minimize the communication
overhead among the processors while balancing the number of page download requests
and storage loads of processors. Second, we propose models for documentand
term-based inverted index partitioning. In the document-based partitioning
model, the number of disk accesses incurred during query processing is minimized
while the posting storage is balanced. In the term-based partitioning model, the
total amount of communication is minimized while, again, the posting storage
is balanced. Finally, we develop and evaluate a large number of algorithms for
query processing in ranking-based text retrieval systems. We test the proposed
algorithms over our experimental parallel text retrieval system, Skynet, currently
running on a 48-node PC cluster. In the thesis, we also discuss the design and
implementation details of another, somewhat untraditional, grid-enabled search
engine, SE4SEE. Among our practical work, we present the Harbinger text classification
system, used in SE4SEE for Web page classification, and the K-PaToH
hypergraph partitioning toolkit, to be used in the proposed models.Cambazoğlu, Berkant BarlaPh.D
Third International Symposium on Artificial Intelligence, Robotics, and Automation for Space 1994
The Third International Symposium on Artificial Intelligence, Robotics, and Automation for Space (i-SAIRAS 94), held October 18-20, 1994, in Pasadena, California, was jointly sponsored by NASA, ESA, and Japan's National Space Development Agency, and was hosted by the Jet Propulsion Laboratory (JPL) of the California Institute of Technology. i-SAIRAS 94 featured presentations covering a variety of technical and programmatic topics, ranging from underlying basic technology to specific applications of artificial intelligence and robotics to space missions. i-SAIRAS 94 featured a special workshop on planning and scheduling and provided scientists, engineers, and managers with the opportunity to exchange theoretical ideas, practical results, and program plans in such areas as space mission control, space vehicle processing, data analysis, autonomous spacecraft, space robots and rovers, satellite servicing, and intelligent instruments
Structural Diversity of Biological Ligands and their Binding Sites in Proteins
The phenomenon of molecular recognition, which underpins almost all biological processes, is dynamic,
complex and subtle. Establishing an interaction between a pair of molecules involves mutual structural
rearrangements guided by a highly convoluted energy landscape, the accurate mapping of which continues
to elude us. The analysis of interactions between proteins and small molecules has been a focus of intense
interest for many years, offering as it does the promise of increased insight into many areas of biology, and
the potential for greatly improved drug design methodologies. Computational methods for predicting which
types of ligand a given protein may bind, and what conformation two molecules will adopt once paired, are
particularly sought after.
The work presented in this thesis aims to quantify the amount of structural variability observed in the ways
in which proteins interact with ligands. This diversity is considered from two perspectives: to what extent
ligands bind to different proteins in distinct conformations, and the degree to which binding sites specific for
the same ligand have different atomic structures.
The first study could be of value to approaches which aim to predict the bound pose of a ligand, since
by cataloguing the range of conformations previously observed, it may be possible to better judge the
biological likelihood of a newly predicted molecular arrangement. The findings show that several common
biological ligands exhibit considerable conformational diversity when bound to proteins. Although binding
in predominantly extended conformations, the analysis presented here highlights several cases in which the
biological requirements of a given protein force its ligand to adopt a highly compact form. Comparing the
conformational diversity observed within several protein families, the hypothesis that homologous proteins
tend to bind ligands in a similar arrangement is generally upheld, but several families are identified in which
this is demonstrably not the case.
Consideration of diversity in the binding site itself, on the other hand, may be useful in guiding methods
which search for binding sites in uncharacterised protein structures: identifying those regions of known sites
which are less variable could help to focus the search only on the most important features. Analysis of the
diversity of a non-redundant dataset of adenine binding sites shows that a small number of key interactions are
conserved, with the majority of the fragment environment being highly variable. Just as ligand conformation
varies between protein families, so the degree of binding site diversity is observed to be significantly higher
in some families than others.
Taken together, the results of this work suggest that the repertoire of strategies produced by nature for the
purposes of molecular recognition are extremely extensive. Moreover, the importance of a given ligand
conformation or pattern of interaction appears to vary greatly depending on the function of the particular
group of proteins studied. As such, it is proposed that diversity analysis may form a significant part of future
large-scale studies of ligand-protein interactions
Recommended from our members
Critical Connections: Communication for the Future
The U.S. communication infrastructure is changing rapidly as a result of technological advances, deregulation, and an economic climate that is increasingly competitive. This change is affecting the way in which information is created, processed, transmitted, and provided to individuals and institutions. The report analyzes the implications of new communication technologies for business, politics, culture, and individuals, and suggests possible strategies and options for congressional consideration