Search CORE

30,822 research outputs found

A grid-based infrastructure for distributed retrieval

Author: A. Guttman
D. Kossmann
D.C. Blair
F. Simeoni
I. Foster
J. Callan
J. Risson
J.P. Callan
J.P. Callan
L. Si
L. Si
M. Kobayashi
M. Stonebraker
P. Niblett
R.R. Larson
W. Sun
Y. Manolopoulos
Y.E. Ioannidis
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

In large-scale distributed retrieval, challenges of latency, heterogeneity, and dynamicity emphasise the importance of infrastructural support in reducing the development costs of state-of-the-art solutions. We present a service-based infrastructure for distributed retrieval which blends middleware facilities and a design framework to ‘lift’ the resource sharing approach and the computational services of a European Grid platform into the domain of e-Science applications. In this paper, we give an overview of the DILIGENT Search Framework and illustrate its exploitation in the ﬁeld of Earth Science

Crossref

University of Strathclyde Institutional Repository

Phylogenetic surveillance of viral genetic diversity and the evolving molecular epidemiology of human immunodeficiency virus type 1

Author: de Oliveira Tulio
Dunn David
Gifford Robert J.
Kellam Paul
Pillay Deenan
Pybus Oliver G.
Rambaut Andrew
Vandamme Anne-Mieke
Publication venue: 'American Society for Microbiology'
Publication date: 01/01/2007
Field of study

With ongoing generation of viral genetic diversity and increasing levels of migration, the global human immunodeficiency virus type 1 (HIV-1) epidemic is becoming increasingly heterogeneous. In this study, we investigate the epidemiological characteristics of 5,675 HIV-1 pol gene sequences sampled from distinct infections in the United Kingdom. These sequences were phylogenetically analyzed in conjunction with 976 complete-genome and 3,201 pol gene reference sequences sampled globally and representing the broad range of HIV-1 genetic diversity, allowing us to estimate the probable geographic origins of the various strains present in the United Kingdom. A statistical analysis of phylogenetic clustering in this data set identified several independent transmission chains within the United Kingdom involving recently introduced strains and indicated that strains more commonly associated with infections acquired heterosexually in East Africa are spreading among men who have sex with men. Coalescent approaches were also used and indicated that the transmission chains that we identify originated in the late 1980s to early 1990s. Similar changes in the epidemiological structuring of HIV epidemics are likely to be taking in place in other industrialized nations with large immigrant populations. The framework implemented here takes advantage of the vast amount of routinely generated HIV-1 sequence data and can provide epidemiological insights not readily obtainable through standard surveillance methods

Crossref

PubMed Central

Edinburgh Research Explorer

DIAL UCLouvain

Object Distribution Networks for World-wide Document Circulation

Author: Lijding Maria Eva M.
Moldes Leandro Navarro
Righetti Claudio E.
Publication venue: Servicio de Publicaciones ETSI de Telecomunicacion
Publication date: 01/01/1997
Field of study

This paper presents an Object Distribution System (ODS), a distributed system inspired by the ultra-large scale distribution models used in everyday life (e.g. food or newspapers distribution chains). Beyond traditional mechanisms of approaching information to readers (e.g. caching and mirroring), this system enables the publication, classification and subscription to volumes of objects (e.g. documents, events). Authors submit their contents to publication agents. Classification authorities provide classification schemes to classify objects. Readers subscribe to topics or authors, and retrieve contents from their local delivery agent (like a kiosk or library, with local copies of objects). Object distribution is an independent process where objects circulate asynchronously among distribution agents. ODS is designed to perform specially well in an increasingly populated, widespread and complex Internet jungle, using weak consistency replication by object distribution, asynchronous replication, and local access to objects by clients. ODS is based on two independent virtual networks, one dedicated to the distribution (replication) of objects and the other to calculate optimised distribution chains to be applied by the first network

University of Twente Research Information

Islands of linkage in an ocean of pervasive recombination reveals two-speed evolution of human cytomegalovirus genomes

Author: Atkinson Claire
Balloux Francois
Breuer Judith
Brown Amanda C.
Brown Julianne R.
Christiansen Mette T.
Clark Duncan A.
Depledge Daniel P.
Einer-Jensen Katja
Griffiths Paul D.
Holdstock Jolyon
Lassalle Florent
Milne Richard S.B.
Reeves Matthew B.
Schutten Martin
Tutill Helena J.
van Loenen Freek B.
Verjans Georges M.G.M.
Williams Rachel J.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2016
Field of study

Human cytomegalovirus (HCMV) infects most of the population worldwide, persisting throughout the host's life in a latent state with periodic episodes of reactivation. While typically asymptomatic, HCMV can cause fatal disease among congenitally infected infants and immunocompromised patients. These clinical issues are compounded by the emergence of antiviral resistance and the absence of an effective vaccine, the development of which is likely complicated by the numerous immune evasins encoded by HCMV to counter the host's adaptive immune responses, a feature that facilitates frequent super-infections. Understanding the evolutionary dynamics of HCMV is essential for the development of effective new drugs and vaccines. By comparing viral genomes from uncultivated or low-passaged clinical samples of diverse origins, we observe evidence of frequent homologous recombination events, both recent and ancient, and no structure of HCMV genetic diversity at the whole-genome scale. Analysis of individual gene-scale loci reveals a striking dichotomy: while most of the genome is highly conserved, recombines essentially freely and has evolved under purifying selection, 21 genes display extreme diversity, structured into distinct genotypes that do not recombine with each other. Most of these hyper-variable genes encode glycoproteins involved in cell entry or escape of host immunity. Evidence that half of them have diverged through episodes of intense positive selection suggests that rapid evolution of hyper-variable loci is likely driven by interactions with host immunity. It appears that this process is enabled by recombination unlinking hyper-variable loci from strongly constrained neighboring sites. It is conceivable that viral mechanisms facilitating super-infection have evolved to promote recombination between diverged genotypes, allowing the virus to continuously diversify at key loci to escape immune detection, while maintaining a genome optimally adapted to its asymptomatic infectious lifecycle

Enlighten

Document replication strategies for geographically distributed web search engines

Author: Aykanat C.
Cambazoglu B. B.
Kayaaslan E.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

Cataloged from PDF version of article.Large-scale web search engines are composed of multiple data centers that are geographically distant to each other. Typically, a user query is processed in a data center that is geographically close to the origin of the query, over a replica of the entire web index. Compared to a centralized, single-center search engine, this architecture offers lower query response times as the network latencies between the users and data centers are reduced. However, it does not scale well with increasing index sizes and query traffic volumes because queries are evaluated on the entire web index, which has to be replicated and maintained in all data centers. As a remedy to this scalability problem, we propose a document replication framework in which documents are selectively replicated on data centers based on regional user interests. Within this framework, we propose three different document replication strategies, each optimizing a different objective: reducing the potential search quality loss, the average query response time, or the total query workload of the search system. For all three strategies, we consider two alternative types of capacity constraints on index sizes of data centers. Moreover, we investigate the performance impact of query forwarding and result caching. We evaluate our strategies via detailed simulations, using a large query log and a document collection obtained from the Yahoo! web search engine. (C) 2012 Elsevier Ltd. All rights reserved

Bilkent University Institutional Repository

A proposal for a coordinated effort for the determination of brainwide neuroanatomical connectivity in model organisms at a mesoscopic scale

Author: A MacKenzie-Graham
A Reiner
A Vercelli
A Visel
Allan Jones
AM Hattox
Arthur W. Toga
AW Toga
AY Hardan
B Egaas
B Horwitz
BL Davidson
Brett D. Mensh
Bruce W. Stillman
C Gustafson
C Kobbert
Caizhi Wu
CL Veenman
Claus C. Hilgetag
Clifford B. Saper
CR Gerfen
D Atasoy
DA Benson
Daniel G. Herrera
David C. Van Essen
David Kleinfeld
DC Van Essen
DC Van Essen
DL Sparks
E Miyashita
ED Jarvis
Edward G. Jones
EM Callaway
ES Lein
ET Bullmore
F Castelli
F Crick
G Aston-Jones
H Markram
Hans C. Breiter
Harvey J. Karten
HC Breiter
Helen Barbas
Hemant Bokil
Henry A. Lester
Hollis T. Cline
IR Wickersham
J DeFalco
J Dejerine
J Panksepp
J Panksepp
Jaak Panksepp
James D. Watson
Jason W. Bohland
JD Schmahmann
Jeremy D. Schmahmann
JF Démonet
JG Bjaalie
JG Bjaalie
JG Bjaalie
JG White
JL Lanciego
JM Lin
John C. Doyle
John M. Lin
Joseph L. Price
Joseph Safdieh
K Oishi
K Wernicke
Karel Svoboda
KE Stephan
KE Stephan
L Ng
L Stein
Larry W. Swanson
LM Coolen
M Bota
M Bota
M Bota
M Murias
MA Just
MD Johnson
MI Ekstrand
Michael Hawrylycz
Mihail Bota
MJ Swift
N Geschwind
Nicholas D. Schiff
O Sporns
Olaf Sporns
Partha P. Mitra
Peter J. Freed
PH Luppi
PJ Broser
R Kotter
R Kotter
Ralph J. Greenspan
RH Güting
RM Kelly
Rolf Kötter
RW Baughman
S Folstein
S Lillehaug
S Mikula
Shawn Mikula
Suzanne N. Haber
U Burgel
U Frith
V Grinevich
Z. Josh Huang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2009
Field of study

In this era of complete genomes, our knowledge of neuroanatomical circuitry remains surprisingly sparse. Such knowledge is however critical both for basic and clinical research into brain function. Here we advocate for a concerted effort to fill this gap, through systematic, experimental mapping of neural circuits at a mesoscopic scale of resolution suitable for comprehensive, brain-wide coverage, using injections of tracers or viral vectors. We detail the scientific and medical rationale and briefly review existing knowledge and experimental techniques. We define a set of desiderata, including brain-wide coverage; validated and extensible experimental techniques suitable for standardization and automation; centralized, open access data repository; compatibility with existing resources, and tractability with current informatics technology. We discuss a hypothetical but tractable plan for mouse, additional efforts for the macaque, and technique development for human. We estimate that the mouse connectivity project could be completed within five years with a comparatively modest budget.Comment: 41 page

Cold Spring Harbor Laboratory Institutional Repository

Boston University Institutional Repository (OpenBU)

Directory of Open Access Journals

Caltech Authors

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

Crossref

Harvard University - DASH

PubMed Central

GenoMeMeMusic: a Memetic-based Framework for Discovering the Musical Genome

Author: Vallati Mauro
Velardo Valerio
Publication venue
Publication date
Field of study

University of Huddersfield Repository

Deconvolving mutational patterns of poliovirus outbreaks reveals its intrinsic fitness landscape.

Author: Barton John P
Chakraborty Arup K
McKay Matthew R
Quadeer Ahmed A
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

Vaccination has essentially eradicated poliovirus. Yet, its mutation rate is higher than that of viruses like HIV, for which no effective vaccine exists. To investigate this, we infer a fitness model for the poliovirus viral protein 1 (vp1), which successfully predicts in vitro fitness measurements. This is achieved by first developing a probabilistic model for the prevalence of vp1 sequences that enables us to isolate and remove data that are subject to strong vaccine-derived biases. The intrinsic fitness constraints derived for vp1, a capsid protein subject to antibody responses, are compared with those of analogous HIV proteins. We find that vp1 evolution is subject to tighter constraints, limiting its ability to evade vaccine-induced immune responses. Our analysis also indicates that circulating poliovirus strains in unimmunized populations serve as a reservoir that can seed outbreaks in spatio-temporally localized sub-optimally immunized populations

DSpace@MIT

eScholarship - University of California

University of Melbourne Institutional Repository

An Efficient Holistic Data Distribution and Storage Solution for Online Social Networks

Author: Liu Guoxin
Publication venue: Clemson University Libraries
Publication date: 01/12/2015
Field of study

In the past few years, Online Social Networks (OSNs) have dramatically spread over the world. Facebook [4], one of the largest worldwide OSNs, has 1.35 billion users, 82.2% of whom are outside the US [36]. The browsing and posting interactions (text content) between OSN users lead to user data reads (visits) and writes (updates) in OSN datacenters, and Facebook now serves a billion reads and tens of millions of writes per second [37]. Besides that, Facebook has become one of the top Internet traﬃc sources [36] by sharing tremendous number of large multimedia ﬁles including photos and videos. The servers in datacenters have limited resources (e.g. bandwidth) to supply latency eﬃcient service for multimedia ﬁle sharing among the rapid growing users worldwide. Most online applications operate under soft real-time constraints (e.g., ≤ 300 ms latency) for good user experience, and its service latency is negatively proportional to its income. Thus, the service latency is a very important requirement for Quality of Service (QoS) to the OSN as a web service, since it is relevant to the OSN’s revenue and user experience. Also, to increase OSN revenue, OSN service providers need to constrain capital investment, operation costs, and the resource (bandwidth) usage costs. Therefore, it is critical for the OSN to supply a guaranteed QoS for both text and multimedia contents to users while minimizing its costs. To achieve this goal, in this dissertation, we address three problems. i) Data distribution among datacenters: how to allocate data (text contents) among data servers with low service latency and minimized inter-datacenter network load; ii) Eﬃcient multimedia ﬁle sharing: how to facilitate the servers in datacenters to eﬃciently share multimedia ﬁles among users; iii) Cost minimized data allocation among cloud storages: how to save the infrastructure (datacenters) capital investment and operation costs by leveraging commercial cloud storage services. Data distribution among datacenters. To serve the text content, the new OSN model, which deploys datacenters globally, helps reduce service latency to worldwide distributed users and release the load of the existing datacenters. However, it causes higher inter-datacenter communica-tion load. In the OSN, each datacenter has a full copy of all data, and the master datacenter updates all other datacenters, generating tremendous load in this new model. The distributed data storage, which only stores a user’s data to his/her geographically closest datacenters, simply mitigates the problem. However, frequent interactions between distant users lead to frequent inter-datacenter com-munication and hence long service latencies. Therefore, the OSNs need a data allocation algorithm among datacenters with minimized network load and low service latency. Eﬃcient multimedia ﬁle sharing. To serve multimedia ﬁle sharing with rapid growing user population, the ﬁle distribution method should be scalable and cost eﬃcient, e.g. minimiza-tion of bandwidth usage of the centralized servers. The P2P networks have been widely used for ﬁle sharing among a large amount of users [58, 131], and meet both scalable and cost eﬃcient re-quirements. However, without fully utilizing the altruism and trust among friends in the OSNs, current P2P assisted ﬁle sharing systems depend on strangers or anonymous users to distribute ﬁles that degrades their performance due to user selﬁsh and malicious behaviors. Therefore, the OSNs need a cost eﬃcient and trustworthy P2P-assisted ﬁle sharing system to serve multimedia content distribution. Cost minimized data allocation among cloud storages. The new trend of OSNs needs to build worldwide datacenters, which introduce a large amount of capital investment and maintenance costs. In order to save the capital expenditures to build and maintain the hardware infrastructures, the OSNs can leverage the storage services from multiple Cloud Service Providers (CSPs) with existing worldwide distributed datacenters [30, 125, 126]. These datacenters provide diﬀerent Get/Put latencies and unit prices for resource utilization and reservation. Thus, when se-lecting diﬀerent CSPs’ datacenters, an OSN as a cloud customer of a globally distributed application faces two challenges: i) how to allocate data to worldwide datacenters to satisfy application SLA (service level agreement) requirements including both data retrieval latency and availability, and ii) how to allocate data and reserve resources in datacenters belonging to diﬀerent CSPs to minimize the payment cost. Therefore, the OSNs need a data allocation system distributing data among CSPs’ datacenters with cost minimization and SLA guarantee. In all, the OSN needs an eﬃcient holistic data distribution and storage solution to minimize its network load and cost to supply a guaranteed QoS for both text and multimedia contents. In this dissertation, we propose methods to solve each of the aforementioned challenges in OSNs. Firstly, we verify the beneﬁts of the new trend of OSNs and present OSN typical properties that lay the basis of our design. We then propose Selective Data replication mechanism in Distributed Datacenters (SD3) to allocate user data among geographical distributed datacenters. In SD3,a datacenter jointly considers update rate and visit rate to select user data for replication, and further atomizes a user’s diﬀerent types of data (e.g., status update, friend post) for replication, making sure that a replica always reduces inter-datacenter communication. Secondly, we analyze a BitTorrent ﬁle sharing trace, which proves the necessity of proximity-and interest-aware clustering. Based on the trace study and OSN properties, to address the second problem, we propose a SoCial Network integrated P2P ﬁle sharing system for enhanced Eﬃciency and Trustworthiness (SOCNET) to fully and cooperatively leverage the common-interest, geographically-close and trust properties of OSN friends. SOCNET uses a hierarchical distributed hash table (DHT) to cluster common-interest nodes, and then further clusters geographically close nodes into a subcluster, and connects the nodes in a subcluster with social links. Thus, when queries travel along trustable social links, they also gain higher probability of being successfully resolved by proximity-close nodes, simultaneously enhancing eﬃciency and trustworthiness. Thirdly, to handle the third problem, we model the cost minimization problem under the SLA constraints using integer programming. According to the system model, we propose an Eco-nomical and SLA-guaranteed cloud Storage Service (ES3), which ﬁnds a data allocation and resource reservation schedule with cost minimization and SLA guarantee. ES3 incorporates (1) a data al-location and reservation algorithm, which allocates each data item to a datacenter and determines the reservation amount on datacenters by leveraging all the pricing policies; (2) a genetic algorithm based data allocation adjustment approach, which makes data Get/Put rates stable in each data-center to maximize the reservation beneﬁt; and (3) a dynamic request redirection algorithm, which dynamically redirects a data request from an over-utilized datacenter to an under-utilized datacenter with suﬃcient reserved resource when the request rate varies greatly to further reduce the payment. Finally, we conducted trace driven experiments on a distributed testbed, PlanetLab, and real commercial cloud storage (Amazon S3, Windows Azure Storage and Google Cloud Storage) to demonstrate the eﬃciency and eﬀectiveness of our proposed systems in comparison with other systems. The results show that our systems outperform others in the network savings and data distribution eﬃciency

Clemson University: TigerPrints