161 research outputs found

    Dynamic Trees with Almost-Optimal Access Cost

    Get PDF
    An optimal binary search tree for an access sequence on elements is a static tree that minimizes the total search cost. Constructing perfectly optimal binary search trees is expensive so the most efficient algorithms construct almost optimal search trees. There exists a long literature of constructing almost optimal search trees dynamically, i.e., when the access pattern is not known in advance. All of these trees, e.g., splay trees and treaps, provide a multiplicative approximation to the optimal search cost. In this paper we show how to maintain an almost optimal weighted binary search tree under access operations and insertions of new elements where the approximation is an additive constant. More technically, we maintain a tree in which the depth of the leaf holding an element e_i does not exceed min(log(W/w_i),log n)+O(1) where w_i is the number of times e_i was accessed and W is the total length of the access sequence. Our techniques can also be used to encode a sequence of m symbols with a dynamic alphabetic code in O(m) time so that the encoding length is bounded by m(H+O(1)), where H is the entropy of the sequence. This is the first efficient algorithm for adaptive alphabetic coding that runs in constant time per symbol

    Cache-oblivious dynamic dictionaries with update/query tradeoffs

    Get PDF
    Several existing cache-oblivious dynamic dictionaries achieve O(logB N) (or slightly better O(logB N over M )) memory transfers per operation, where N is the number of items stored, M is the memory size, and B is the block size, which matches the classic B-tree data structure. One recent structure achieves the same query bound and a sometimes-better amortized update bound of O (...) memory transfers. This paper presents a new data structure, the xDict, implementing predecessor queries in O(...)worstcase memory transfers and insertions and deletions in O (...) amortized memory transfers, for any constant " with 0 < epsilon < 1. For example, the xDict achieves subconstant amortized update cost when N = ..., whereas the B-tree’s ... is subconstant only when ... is subconstant only when N = .... The xDict attains the optimal tradeoff between insertions and queries, even in the broader external-memory model, for the range where inserts cost between (...) and O(1= lg3 N) memory transfers.Danish National Research Foundation (MADALGO (Center for Massive Data Algorithmics))National Science Foundation (U.S.) (NSF Grants CCF-0541209)National Science Foundation (U.S.) (NSF Grants CCF-0541209)Computing Innovation Fellow

    Entropy, Triangulation, and Point Location in Planar Subdivisions

    Get PDF
    A data structure is presented for point location in connected planar subdivisions when the distribution of queries is known in advance. The data structure has an expected query time that is within a constant factor of optimal. More specifically, an algorithm is presented that preprocesses a connected planar subdivision G of size n and a query distribution D to produce a point location data structure for G. The expected number of point-line comparisons performed by this data structure, when the queries are distributed according to D, is H + O(H^{2/3}+1) where H=H(G,D) is a lower bound on the expected number of point-line comparisons performed by any linear decision tree for point location in G under the query distribution D. The preprocessing algorithm runs in O(n log n) time and produces a data structure of size O(n). These results are obtained by creating a Steiner triangulation of G that has near-minimum entropy.Comment: 19 pages, 4 figures, lots of formula

    A transcriptional sketch of a primary human breast cancer by 454 deep sequencing

    Get PDF
    Background: The cancer transcriptome is difficult to explore due to the heterogeneity of quantitative and qualitative changes in gene expression linked to the disease status. An increasing number of "unconventional" transcripts, such as novel isoforms, non-coding RNAs, somatic gene fusions and deletions have been associated with the tumoral state. Massively parallel sequencing techniques provide a framework for exploring the transcriptional complexity inherent to cancer with a limited laboratory and financial effort. We developed a deep sequencing and bioinformatics analysis protocol to investigate the molecular composition of a breast cancer poly(A)+ transcriptome. This method utilizes a cDNA library normalization step to diminish the representation of highly expressed transcripts and biology-oriented bioinformatic analyses to facilitate detection of rare and novel transcripts. Results: We analyzed over 132,000 Roche 454 high-confidence deep sequencing reads from a primary human lobular breast cancer tissue specimen, and detected a range of unusual transcriptional events that were subsequently validated by RT-PCR in additional eight primary human breast cancer samples. We identified and validated one deletion, two novel ncRNAs (one intergenic and one intragenic), ten previously unknown or rare transcript isoforms and a novel gene fusion specific to a single primary tissue sample. We also explored the non-protein-coding portion of the breast cancer transcriptome, identifying thousands of novel non-coding transcripts and more than three hundred reads corresponding to the non-coding RNA MALAT1, which is highly expressed in many human carcinomas. Conclusion: Our results demonstrate that combining 454 deep sequencing with a normalization step and careful bioinformatic analysis facilitates the discovery and quantification of rare transcripts or ncRNAs, and can be used as a qualitative tool to characterize transcriptome complexity, revealing many hitherto unknown transcripts, splice isoforms, gene fusion events and ncRNAs, even at a relatively low sequence sampling

    A knowledge-based framework for service management

    Get PDF
    peer-reviewedThe purpose of this paper is to investigate how information and communication technologies are used for service standardisation, customisation, and modularisation by knowledge-intensive service firms through the development and empirical validation of a knowledge-based framework. This paper uses 59 in-depth interviews, observational data, and document analysis from case studies of three service-related departments in high-technology, multinational knowledge-intensive business services (KIBSs). Prior research does not conceptualise the relationships between service customisation, standardisation and modularisation. This paper seeks to overcome this gap by integrating insights from research on the role played by both knowledge and information and communication technologies (ICTs) to construct and validate a framework to deal with this gap. It outlines the implications for service firms' use of ICT to deal with increasing knowledge intensity as well as indicating the circumstances under which service knowledge is best customised, standardised and modularised. Further testing in other industries would prove useful in extending the usefulness and applicability of the findings. The originality of the paper lies in developing and validating the first framework to outline the relationship between how service knowledge is customised, standardised or modularised and indicating the associated issues and challenges. It emphasises the role of knowledge and technology. The value of this framework increases as more firms deal with increasing knowledge intensity in the services they provide and in their use of ICTs to reap the benefits of appropriate knowledge reuse.ACCEPTEDpeer-reviewe

    Genetic and environmental variation in continuous phenotypes in the ABCD Study®

    Get PDF
    Twin studies yield valuable insights into the sources of variation, covariation and causation in human traits. The ABCD Study® (abcdstudy.org) was designed to take advantage of four universities known for their twin research, neuroimaging, population-based sampling, and expertise in genetic epidemiology so that representative twin studies could be performed. In this paper we use the twin data to: (i) provide initial estimates of heritability for the wide range of phenotypes assessed in the ABCD Study using a consistent direct variance estimation approach, assuring that both data and methodology are sound; and (ii) provide an online resource for researchers that can serve as a reference point for future behavior genetic studies of this publicly available dataset. Data were analyzed from 772 pairs of twins aged 9-10 years at study inception, with zygosity determined using genotypic data, recruited and assessed at four twin hub sites. The online tool provides twin correlations and both standardized and unstandardized estimates of additive genetic, and environmental variation for 14,500 continuously distributed phenotypic features, including: structural and functional neuroimaging, neurocognition, personality, psychopathology, substance use propensity, physical, and environmental trait variables. The estimates were obtained using an unconstrained variance approach, so they can be incorporated directly into meta-analyses without upwardly biasing aggregate estimates. The results indicated broad consistency with prior literature where available and provided novel estimates for phenotypes without prior twin studies or those assessed at different ages. Effects of site, self-identified race/ethnicity, age and sex were statistically controlled. Results from genetic modeling of all 53,172 continuous variables, including 38,672 functional MRI variables, will be accessible via the user-friendly open-access web interface we have established, and will be updated as new data are released from the ABCD Study. This paper provides an overview of the initial results from the twin study embedded within the ABCD Study, an introduction to the primary research domains in the ABCD study and twin methodology, and an evaluation of the initial findings with a focus on data quality and suitability for future behavior genetic studies using the ABCD dataset. The broad introductory material is provided in recognition of the multidisciplinary appeal of the ABCD Study. While this paper focuses on univariate analyses, we emphasize the opportunities for multivariate, developmental and causal analyses, as well as those evaluating heterogeneity by key moderators such as sex, demographic factors and genetic background

    A Genetic Epidemiological Mega Analysis of Smoking Initiation in Adolescents

    Get PDF
    Introduction. Previous studies in adolescents were not adequately powered to accurately disentangle genetic and environmental influences on smoking initiation across adolescence. Methods. Mega-analysis of pooled genetically informative data on smoking initiation was performed, with structural equation modeling, to test equality of prevalence and correlations across cultural backgrounds, and to estimate the significance and effect size of genetic and environmental effects according to the classical twin study, in adolescent male and female twins from same-sex and opposite-sex twin pairs (N=19 313 pairs) between age 10 and 19, with 76 358 longitudinal assessments between 1983 and 2007, from 11 population-based twin samples from the US, Europe and Australia. Results. Although prevalences differed between samples, twin correlations did not, suggesting similar etiology of smoking initiation across developed countries. The estimate of additive genetic contributions to liability of smoking initiation increased from approximately 15% to 45% from age 13 to 19. Correspondingly, shared environmental factors accounted for a substantial proportion of variance in liability to smoking initiation at age 13 (70%) and gradually less by age 19 (40%). Conclusions. Both additive genetic and shared environmental factors significantly contribute to variance in smoking initiation throughout adolescence. The present study, the largest genetic epidemiological study on smoking initiation to date, found consistent results across 11 studies for the etiology of smoking initiation. Environmental factors, especially those shared by siblings in a family, primarily influence smoking initiation variance in early adolescence, while an increasing role of genetic factors is seen at later ages, which has important implications for prevention strategies. IMPLICATIONS: This is the first study to find evidence of genetic factors in liability to smoking initiation at ages as young as 12. It also shows the strongest evidence to date for decay of effects of the shared environment from early adolescence to young adulthood. We found remarkable consistency of twin correlations across studies reflecting similar etiology of liability to initiate smoking across different cultures and time periods. Thus familial factors strongly contribute to individual differences in who starts to smoke with a gradual increase in the impact of genetic factors and a corresponding decrease in that of the shared environment

    Genome-wide association study of lifetime cannabis use based on a large meta-analytic sample of 32330 subjects from the International Cannabis Consortium

    Get PDF
    Cannabis is the most widely produced and consumed illicit psychoactive substance worldwide. Occasional cannabis use can progress to frequent use, abuse and dependence with all known adverse physical, psychological and social consequences. Individual differences in cannabis initiation are heritable (40-48%). The International Cannabis Consortium was established with the aim to identify genetic risk variants of cannabis use. We conducted a meta-analysis of genome-wide association data of 13 cohorts (N=32 330) and four replication samples (N=5627). In addition, we performed a gene-based test of association, estimated single-nucleotide polymorphism (SNP)-based heritability and explored the genetic correlation between lifetime cannabis use and cigarette use using LD score regression. No individual SNPs reached genome-wide significance. Nonetheless, gene-based tests identified four genes significantly associated with lifetime cannabis use: NCAM1, CADM2, SCOC and KCNT2. Previous studies reported associations of NCAM1 with cigarette smoking and other substance use, and those of CADM2 with body mass index, processing speed and autism disorders, which are phenotypes previously reported to be associated with cannabis use. Furthermore, we showed that, combined across the genome, all common SNPs explained 13-20% (P<0.001) of the liability of lifetime cannabis use. Finally, there was a strong genetic correlation (rg=0.83; P=1.85 × 10(-8)) between lifetime cannabis use and lifetime cigarette smoking implying that the SNP effect sizes of the two traits are highly correlated. This is the largest meta-analysis of cannabis GWA studies to date, revealing important new insights into the genetic pathways of lifetime cannabis use. Future functional studies should explore the impact of the identified genes on the biological mechanisms of cannabis use

    Antimicrobial resistance among migrants in Europe: a systematic review and meta-analysis

    Get PDF
    BACKGROUND: Rates of antimicrobial resistance (AMR) are rising globally and there is concern that increased migration is contributing to the burden of antibiotic resistance in Europe. However, the effect of migration on the burden of AMR in Europe has not yet been comprehensively examined. Therefore, we did a systematic review and meta-analysis to identify and synthesise data for AMR carriage or infection in migrants to Europe to examine differences in patterns of AMR across migrant groups and in different settings. METHODS: For this systematic review and meta-analysis, we searched MEDLINE, Embase, PubMed, and Scopus with no language restrictions from Jan 1, 2000, to Jan 18, 2017, for primary data from observational studies reporting antibacterial resistance in common bacterial pathogens among migrants to 21 European Union-15 and European Economic Area countries. To be eligible for inclusion, studies had to report data on carriage or infection with laboratory-confirmed antibiotic-resistant organisms in migrant populations. We extracted data from eligible studies and assessed quality using piloted, standardised forms. We did not examine drug resistance in tuberculosis and excluded articles solely reporting on this parameter. We also excluded articles in which migrant status was determined by ethnicity, country of birth of participants' parents, or was not defined, and articles in which data were not disaggregated by migrant status. Outcomes were carriage of or infection with antibiotic-resistant organisms. We used random-effects models to calculate the pooled prevalence of each outcome. The study protocol is registered with PROSPERO, number CRD42016043681. FINDINGS: We identified 2274 articles, of which 23 observational studies reporting on antibiotic resistance in 2319 migrants were included. The pooled prevalence of any AMR carriage or AMR infection in migrants was 25·4% (95% CI 19·1-31·8; I2 =98%), including meticillin-resistant Staphylococcus aureus (7·8%, 4·8-10·7; I2 =92%) and antibiotic-resistant Gram-negative bacteria (27·2%, 17·6-36·8; I2 =94%). The pooled prevalence of any AMR carriage or infection was higher in refugees and asylum seekers (33·0%, 18·3-47·6; I2 =98%) than in other migrant groups (6·6%, 1·8-11·3; I2 =92%). The pooled prevalence of antibiotic-resistant organisms was slightly higher in high-migrant community settings (33·1%, 11·1-55·1; I2 =96%) than in migrants in hospitals (24·3%, 16·1-32·6; I2 =98%). We did not find evidence of high rates of transmission of AMR from migrant to host populations. INTERPRETATION: Migrants are exposed to conditions favouring the emergence of drug resistance during transit and in host countries in Europe. Increased antibiotic resistance among refugees and asylum seekers and in high-migrant community settings (such as refugee camps and detention facilities) highlights the need for improved living conditions, access to health care, and initiatives to facilitate detection of and appropriate high-quality treatment for antibiotic-resistant infections during transit and in host countries. Protocols for the prevention and control of infection and for antibiotic surveillance need to be integrated in all aspects of health care, which should be accessible for all migrant groups, and should target determinants of AMR before, during, and after migration. FUNDING: UK National Institute for Health Research Imperial Biomedical Research Centre, Imperial College Healthcare Charity, the Wellcome Trust, and UK National Institute for Health Research Health Protection Research Unit in Healthcare-associated Infections and Antimictobial Resistance at Imperial College London
    corecore