102 research outputs found
Hypergraph-based data partitioning
Ankara : The Department of Computer Engineering and the Graduate School of Engineering and Science of Bilkent University, 2013.Thesis (Ph.D.) -- Bilkent University, 2013Includes bibliographical references leaves 96-103.A hypergraph is a general version of graph where the edges may connect any
number of vertices. By this flexibility, hypergraphs has a larger modeling power
that may allow accurate formulaion of many problems of combinatorial scientific
computing. This thesis discusses the use of hypergraph-based approaches to solve
problems that require data partitioning. The thesis is composed of three parts. In
the first part, we show how to implement hypergraph partitioning efficiently using
recursive graph bipartitioning. The remaining two parts show how to formulate
two important data partitioning problems in parallel computing as hypergraph
partitioning. The first problem is global inverted index partitioning for parallel
query processing and the second one is row-columnwise sparse matrix partitioning
for parallel matrix vector multiplication, where both multiplication and sparse
matrix partitioning schemes has novelty. In this thesis, we show that hypergraph
models achieve partitions with better quality.Kayaaslan, EnverPh.D
Document replication strategies for geographically distributed web search engines
Cataloged from PDF version of article.Large-scale web search engines are composed of multiple data centers that are geographically distant to each other. Typically, a user query is processed in a data center that is geographically close to the origin of the query, over a replica of the entire web index. Compared to a centralized, single-center search engine, this architecture offers lower query response times as the network latencies between the users and data centers are reduced. However, it does not scale well with increasing index sizes and query traffic volumes because queries are evaluated on the entire web index, which has to be replicated and maintained in all data centers. As a remedy to this scalability problem, we propose a document replication framework in which documents are selectively replicated on data centers based on regional user interests. Within this framework, we propose three different document replication strategies, each optimizing a different objective: reducing the potential search quality loss, the average query response time, or the total query workload of the search system. For all three strategies, we consider two alternative types of capacity constraints on index sizes of data centers. Moreover, we investigate the performance impact of query forwarding and result caching. We evaluate our strategies via detailed simulations, using a large query log and a document collection obtained from the Yahoo! web search engine. (C) 2012 Elsevier Ltd. All rights reserved
Exploiting the Bipartite Structure of Entity Grids for Document Coherence and Retrieval
International audienceDocument coherence describes how much sense text makes in terms of its logical organisation and discourse flow. Even though coherence is a relatively difficult notion to quantify precisely, it can be approximated automatically. This type of coherence modelling is not only interesting in itself, but also useful for a number of other text processing tasks, including Information Retrieval (IR), where adjusting the ranking of documents according to both their relevance and their coherence has been shown to increase retrieval effectiveness.The state of the art in unsupervised coherence modelling represents documents as bipartite graphs of sentences and discourse entities, and then projects these bipartite graphs into one–mode undirected graphs. However, one–mode projections may incur significant loss of the information present in the original bipartite structure. To address this we present three novel graph metrics that compute document coherence on the original bipartite graph of sentences and entities. Evaluation on standard settings shows that: (i) one of our coherence metrics beats the state of the art in terms of coherence accuracy; and (ii) all three of our coherence metrics improve retrieval effectiveness because, as closer analysis reveals, they capture aspects of document quality that go undetected by both keyword-based standard ranking and by spam filtering. This work contributes document coherence metrics that are theoretically principled, parameter-free, and useful to IR
Analyzing and enhancing OSKI for sparse matrix-vector multiplication
Sparse matrix-vector multiplication (SpMxV) is a kernel operation widely used
in iterative linear solvers. The same sparse matrix is multiplied by a dense
vector repeatedly in these solvers. Matrices with irregular sparsity patterns
make it difficult to utilize cache locality effectively in SpMxV computations.
In this work, we investigate single- and multiple-SpMxV frameworks for
exploiting cache locality in SpMxV computations. For the single-SpMxV
framework, we propose two cache-size-aware top-down row/column-reordering
methods based on 1D and 2D sparse matrix partitioning by utilizing the
column-net and enhancing the row-column-net hypergraph models of sparse
matrices. The multiple-SpMxV framework depends on splitting a given matrix into
a sum of multiple nonzero-disjoint matrices so that the SpMxV operation is
performed as a sequence of multiple input- and output-dependent SpMxV
operations. For an effective matrix splitting required in this framework, we
propose a cache-size-aware top-down approach based on 2D sparse matrix
partitioning by utilizing the row-column-net hypergraph model. The primary
objective in all of the three methods is to maximize the exploitation of
temporal locality. We evaluate the validity of our models and methods on a wide
range of sparse matrices by performing actual runs through using OSKI.
Experimental results show that proposed methods and models outperform
state-of-the-art schemes.Comment: arXiv admin note: substantial text overlap with arXiv:1202.385
Interview with Marjorie Harkins Buchanan Kiewit
In this interview with Julia Stringfellow, Marjorie Harkins Buchanan Kiewit, LU class of 1943, discusses her time as a student as well as her time on the Board of Trustees.https://lux.lawrence.edu/oralhistories/1036/thumbnail.jp
Co-infections and superinfections complicating COVID-19 in cancer patients: A multicentre, international study
Background: We aimed to describe the epidemiology, risk factors, and clinical outcomes of co-infections and superinfections in onco-hematological patients with COVID-19. Methods: International, multicentre cohort study of cancer patients with COVID-19. All patients were included in the analysis of co-infections at diagnosis, while only patients admitted at least 48 h were included in the analysis of superinfections. Results: 684 patients were included (384 with solid tumors and 300 with hematological malignancies). Co-infections and superinfections were documented in 7.8% (54/684) and 19.1% (113/590) of patients, respectively. Lower respiratory tract infections were the most frequent infectious complications, most often caused by Streptococcus pneumoniae and Pseudomonas aeruginosa. Only seven patients developed opportunistic infections. Compared to patients without infectious complications, those with infections had worse outcomes, with high rates of acute respiratory distress syndrome, intensive care unit (ICU) admission, and case-fatality rates. Neutropenia, ICU admission and high levels of C-reactive protein (CRP) were independent risk factors for infections. Conclusions: Infectious complications in cancer patients with COVID-19 were lower than expected, affecting mainly neutropenic patients with high levels of CRP and/or ICU admission. The rate of opportunistic infections was unexpectedly low. The use of empiric antimicrobials in cancer patients with COVID-19 needs to be optimized
Different epidemiology of bloodstream infections in COVID-19 compared to non-COVID-19 critically ill patients: A descriptive analysis of the Eurobact II study
Background: The study aimed to describe the epidemiology and outcomes of hospital-acquired bloodstream infections (HABSIs) between COVID-19 and non-COVID-19 critically ill patients. Methods: We used data from the Eurobact II study, a prospective observational multicontinental cohort study on HABSI treated in ICU. For the current analysis, we selected centers that included both COVID-19 and non-COVID-19 critically ill patients. We performed descriptive statistics between COVID-19 and non-COVID-19 in terms of patients’ characteristics, source of infection and microorganism distribution. We studied the association between COVID-19 status and mortality using multivariable fragility Cox models. Results: A total of 53 centers from 19 countries over the 5 continents were eligible. Overall, 829 patients (median age 65 years [IQR 55; 74]; male, n = 538 [64.9%]) were treated for a HABSI. Included patients comprised 252 (30.4%) COVID-19 and 577 (69.6%) non-COVID-19 patients. The time interval between hospital admission and HABSI was similar between both groups. Respiratory sources (40.1 vs. 26.0%, p < 0.0001) and primary HABSI (25.4% vs. 17.2%, p = 0.006) were more frequent in COVID-19 patients. COVID-19 patients had more often enterococcal (20.5% vs. 9%) and Acinetobacter spp. (18.8% vs. 13.6%) HABSIs. Bacteremic COVID-19 patients had an increased mortality hazard ratio (HR) versus non-COVID-19 patients (HR 1.91, 95% CI 1.49–2.45). Conclusions: We showed that the epidemiology of HABSI differed between COVID-19 and non-COVID-19 patients. Enterococcal HABSI predominated in COVID-19 patients. COVID-19 patients with HABSI had elevated risk of mortality. Trial registration ClinicalTrials.org number NCT03937245. Registered 3 May 2019
RED-BL: Evaluating dynamic workload relocation for data center networks
In this paper, we present RED-BL (Relocate Energy Demand to Better Locations), a framework to minimize the electricity cost for operating data center networks over consecutive intervals of fixed duration. Within each interval, RED-BL provides a mapping of workload to a set of geographically distributed data centers. To this end, RED-BL uses the geographical and temporal variations in electricity prices as exhibited by electrical energy markets. In addition, we incorporate the transition costs associated with a change in workload mapping from one interval to the next, over a planning window comprising multiple such intervals. This results in a sequence of workload mappings that is optimal over the entire planning window, even though the workload mapping in a given interval may not be locally optimal. Our evaluation of RED-BL uses electricity prices from the US markets and workload traces from live Internet applications with millions of users. We find that RED-BL can reduce the electric bill by as much as 45% compared to the case when the workload is uniformly distributed. When compared to existing workload relocation solutions, for a wide range of data center deployment sizes, RED-BL achieves electricity cost savings that are 8.28% higher, on average. This seemingly modest reduction can save millions of dollars for the operators. The cost of this saving is an inexpensive computation at the start of each planning window. © 2014 Elsevier B.V. All rights reserved
- …