22 research outputs found

    Scholarly Needs for Text Analysis Resources: A User Assessment Study for the HathiTrust Research Center

    Get PDF
    The HathiTrust Research Center (HTRC) is undertaking a study to better understand the needs of current and potential users of the center’s tools and services for computational text analysis. In this paper, we report on the results of the first phase of the study, which consisted of interviews with scholars, administrators, and librarians whose work involves text data mining. Our study reveals that text analysis workflows are specific to the individual research project and are often nonlinear. In spite of, and in some cases because of, the wealth of textual data available, scholars find it most difficult to locate, access, and curate textual data for their research. While the goals of the study directly relate to research and development for the HTRC, our results are useful for other large-scale data providers developing solutions for allowing computational access to their content

    HathiTrust Research Center User Requirements Study White Paper

    Get PDF
    This paper presents findings from an investigation into trends and practices in humanities and social sciences research that incorporates text data mining. As affiliates of the HathiTrust Research Center (HTRC), the purpose of our study was to illuminate researcher needs and expectations for text data, tools, and training for text mining in order to better understand our current and potential user community. Results of our study have and will continue to inform development of HTRC tools and services for computational text analysis.Ope

    From karyotypes to precision genomics in 9p deletion and duplication syndromes

    Get PDF
    While 9p deletion and duplication syndromes have been studied for several years, small sample sizes and minimal high-resolution data have limited a comprehensive delineation of genotypic and phenotypic characteristics. In this study, we examined genetic data from 719 individuals in the worldwide 9p Network Cohort: a cohort seven to nine times larger than any previous study of 9p. Most breakpoints occur in bands 9p22 and 9p24, accounting for 35% and 38% of all breakpoints, respectively. Bands 9p11 and 9p12 have the fewest breakpoints, with each accounting for 0.6% of all breakpoints. The most common phenotype in 9p deletion and duplication syndromes is developmental delay, and we identified eight known neurodevelopmental disorder genes in 9p22 and 9p24. Since it has been previously reported that some individuals have a secondary structural variant related to the 9p variant, we examined our cohort for these variants and found 97 events. The top secondary variant involved 9q in 14 individuals (1.9%), including ring chromosomes and inversions. We identified a gender bias with significant enrichment for females (p = 0.0006) that may arise from a sex reversal in some individuals with 9p deletions. Genes on 9p were characterized regarding function, constraint metrics, and protein-protein interactions, resulting in a prioritized set of genes for further study. Finally, we achieved precision genomics in one child with a complex 9p structural variation using modern genomic technologies, demonstrating that long-read sequencing will be integral for some cases. Our study is the largest ever on 9p-related syndromes and provides key insights into genetic factors involved in these syndromes

    Hundreds of variants clustered in genomic loci and biological pathways affect human height

    Get PDF
    Most common human traits and diseases have a polygenic pattern of inheritance: DNA sequence variants at many genetic loci influence the phenotype. Genome-wide association (GWA) studies have identified more than 600 variants associated with human traits, but these typically explain small fractions of phenotypic variation, raising questions about the use of further studies. Here, using 183,727 individuals, we show that hundreds of genetic variants, in at least 180 loci, influence adult height, a highly heritable and classic polygenic trait. The large number of loci reveals patterns with important implications for genetic studies of common human diseases and traits. First, the 180 loci are not random, but instead are enriched for genes that are connected in biological pathways (P = 0.016) and that underlie skeletal growth defects (P < 0.001). Second, the likely causal gene is often located near the most strongly associated variant: in 13 of 21 loci containing a known skeletal growth gene, that gene was closest to the associated variant. Third, at least 19 loci have multiple independently associated variants, suggesting that allelic heterogeneity is a frequent feature of polygenic traits, that comprehensive explorations of already-discovered loci should discover additional variants and that an appreciable fraction of associated loci may have been identified. Fourth, associated variants are enriched for likely functional effects on genes, being over-represented among variants that alter amino-acid structure of proteins and expression levels of nearby genes. Our data explain approximately 10% of the phenotypic variation in height, and we estimate that unidentified common variants of similar effect sizes would increase this figure to approximately 16% of phenotypic variation (approximately 20% of heritable variation). Although additional approaches are needed to dissect the genetic architecture of polygenic human traits fully, our findings indicate that GWA studies can identify large numbers of loci that implicate biologically relevant genes and pathways.

    International genome-wide meta-analysis identifies new primary biliary cirrhosis risk loci and targetable pathogenic pathways.

    Get PDF
    Primary biliary cirrhosis (PBC) is a classical autoimmune liver disease for which effective immunomodulatory therapy is lacking. Here we perform meta-analyses of discovery data sets from genome-wide association studies of European subjects (n=2,764 cases and 10,475 controls) followed by validation genotyping in an independent cohort (n=3,716 cases and 4,261 controls). We discover and validate six previously unknown risk loci for PBC (Pcombined<5 × 10(-8)) and used pathway analysis to identify JAK-STAT/IL12/IL27 signalling and cytokine-cytokine pathways, for which relevant therapies exist

    The development and validation of a scoring tool to predict the operative duration of elective laparoscopic cholecystectomy

    Get PDF
    Background: The ability to accurately predict operative duration has the potential to optimise theatre efficiency and utilisation, thus reducing costs and increasing staff and patient satisfaction. With laparoscopic cholecystectomy being one of the most commonly performed procedures worldwide, a tool to predict operative duration could be extremely beneficial to healthcare organisations. Methods: Data collected from the CholeS study on patients undergoing cholecystectomy in UK and Irish hospitals between 04/2014 and 05/2014 were used to study operative duration. A multivariable binary logistic regression model was produced in order to identify significant independent predictors of long (> 90 min) operations. The resulting model was converted to a risk score, which was subsequently validated on second cohort of patients using ROC curves. Results: After exclusions, data were available for 7227 patients in the derivation (CholeS) cohort. The median operative duration was 60 min (interquartile range 45–85), with 17.7% of operations lasting longer than 90 min. Ten factors were found to be significant independent predictors of operative durations > 90 min, including ASA, age, previous surgical admissions, BMI, gallbladder wall thickness and CBD diameter. A risk score was then produced from these factors, and applied to a cohort of 2405 patients from a tertiary centre for external validation. This returned an area under the ROC curve of 0.708 (SE = 0.013, p  90 min increasing more than eightfold from 5.1 to 41.8% in the extremes of the score. Conclusion: The scoring tool produced in this study was found to be significantly predictive of long operative durations on validation in an external cohort. As such, the tool may have the potential to enable organisations to better organise theatre lists and deliver greater efficiencies in care

    International genome-wide meta-analysis identifies new primary biliary cirrhosis risk loci and targetable pathogenic pathways

    Get PDF

    Global and regional molecular epidemiology of HIV-1, 1990-2015: a systematic review, global survey, and trend analysis

    No full text
    BACKGROUND: Global genetic diversity of HIV-1 is a major challenge to the development of HIV vaccines. We aimed to estimate the regional and global distribution of HIV-1 subtypes and recombinants during 1990-2015. METHODS: We searched PubMed, EMBASE (Ovid), CINAHL (Ebscohost), and Global Health (Ovid) for HIV-1 subtyping studies published between Jan 1, 1990, and Dec 31, 2015. We collected additional unpublished HIV-1 subtyping data through a global survey. We included prevalence studies with HIV-1 subtyping data collected during 1990-2015. We grouped countries into 14 regions and analysed data for four time periods (1990-99, 2000-04, 2005-09, and 2010-15). The distribution of HIV-1 subtypes, circulating recombinant forms (CRFs), and unique recombinant forms (URFs) in individual countries was weighted according to the UNAIDS estimates of the number of people living with HIV (PLHIV) in each country to generate regional and global estimates of HIV-1 diversity in each time period. The primary outcome was the number of samples designated as HIV-1 subtypes A, B, C, D, F, G, H, J, K, CRFs, and URFs. The systematic review is registered with PROSPERO, number CRD42017067164. FINDINGS: This systematic review and global survey yielded 2203 datasets with 383 519 samples from 116 countries in 1990-2015. Globally, subtype C accounted for 46·6% (16 280 897/34 921 639 of PLHIV) of all HIV-1 infections in 2010-15. Subtype B was responsible for 12·1% (4 235 299/34 921 639) of infections, followed by subtype A (10·3%; 3 587 003/34 921 639), CRF02_AG (7·7%; 2 705 110/34 921 639), CRF01_AE (5·3%; 1 840 982/34 921 639), subtype G (4·6%; 1 591 276/34 921 639), and subtype D (2·7%; 926 255/34 921 639). Subtypes F, H, J, and K combined accounted for 0·9% (311 332/34 921 639) of infections. Other CRFs accounted for 3·7% (1 309 082/34 921 639), bringing the proportion of all CRFs to 16·7% (5 844 113/34 921 639). URFs constituted 6·1% (2 134 405/34 921 639), resulting in recombinants accounting for 22·8% (7 978 517/34 921 639) of all global HIV-1 infections. The distribution of HIV-1 subtypes and recombinants changed over time in countries, regions, and globally. At a global level during 2005-15, subtype B increased, subtypes A and D were stable, and subtypes C and G and CRF02_AG decreased. CRF01_AE, other CRFs, and URFs increased, leading to a consistent increase in the global proportion of recombinants over time. INTERPRETATION: Global and regional HIV diversity is complex and evolving, and is a major challenge to HIV vaccine development. Surveillance of the global molecular epidemiology of HIV-1 remains crucial for the design, testing, and implementation of HIV vaccines. FUNDING: None.status: publishe

    Pretreatment prediction of response to ursodeoxycholic acid in primary biliary cholangitis: development and validation of the UDCA Response Score

    Get PDF
    Background: Treatment guidelines recommend a stepwise approach to primary biliary cholangitis: all patients begin treatment with ursodeoxycholic acid (UDCA) monotherapy and those with an inadequate biochemical response after 12 months are subsequently considered for second-line therapies. However, as a result, patients at the highest risk can wait the longest for effective treatment. We determined whether UDCA response can be accurately predicted using pretreatment clinical parameters. Methods: We did logistic regression analysis of pretreatment variables in a discovery cohort of patients in the UK with primary biliary cholangitis to derive the best-fitting model of UDCA response, defined as alkaline phosphatase less than 1·67 times the upper limit of normal (ULN), measured after 12 months of treatment with UDCA. We validated the model in an external cohort of patients with primary biliary cholangitis and treated with UDCA in Italy. Additionally, we assessed correlations between model predictions and key histological features, such as biliary injury and fibrosis, on liver biopsy samples. Findings: 2703 participants diagnosed with primary biliary cholangitis between Jan 1, 1998, and May 31, 2015, were included in the UK-PBC cohort for derivation of the model. The following pretreatment parameters were associated with lower probability of UDCA response: higher alkaline phosphatase concentration (p&lt;0·0001), higher total bilirubin concentration (p=0·0003), lower aminotransferase concentration (p=0·0012), younger age (p&lt;0·0001), longer interval from diagnosis to the start of UDCA treatment (treatment time lag, p&lt;0·0001), and worsening of alkaline phosphatase concentration from diagnosis (p&lt;0·0001). Based on these variables, we derived a predictive score of UDCA response. In the external validation cohort, 460 patients diagnosed with primary biliary cholangitis were treated with UDCA, with follow-up data until May 31, 2016. In this validation cohort, the area under the receiver operating characteristic curve for the score was 0·83 (95% CI 0·79–0·87). In 20 liver biopsy samples from patients with primary biliary cholangitis, the UDCA response score was associated with ductular reaction (r=–0·556, p=0·0130) and intermediate hepatocytes (probability of response was 0·90 if intermediate hepatocytes were absent vs 0·51 if present). Interpretation: We have derived and externally validated a model based on pretreatment variables that accurately predicts UDCA response. Association with histological features provides face validity. This model provides a basis to explore alternative approaches to treatment stratification in patients with primary biliary cholangitis. Funding: UK Medical Research Council and University of Milan-Bicocca
    corecore