1,187 research outputs found

    Seqenv : linking sequences to environments through text mining

    Get PDF
    Understanding the distribution of taxa and associated traits across different environments is one of the central questions in microbial ecology. High-throughput sequencing (HTS) studies are presently generating huge volumes of data to address this biogeographical topic. However, these studies are often focused on specific environment types or processes leading to the production of individual, unconnected datasets. The large amounts of legacy sequence data with associated metadata that exist can be harnessed to better place the genetic information found in these surveys into a wider environmental context. Here we introduce a software program, seqenv, to carry out precisely such a task. It automatically performs similarity searches of short sequences against the ‘‘nt’’ nucleotide database provided by NCBI and, out of every hit, extracts–if it is available–the textual metadata field. After collecting all the isolation sources from all the search results, we run a text mining algorithm to identify and parse words that are associated with the Environmental Ontology (EnvO) controlled vocabulary. This, in turn, enables us to determine both in which environments individual sequences or taxa have previously been observed and, by weighted summation of those results, to summarize complete samples. We present two demonstrative applications of seqenv to a survey of ammonia oxidizing archaea as well as to a plankton paleome dataset from the Black Sea. These demonstrate the ability of the tool to reveal novel patterns in HTS How to cite this article Sinclair et al. (2016), Seqenv: linking sequences to environments through text mining. PeerJ 4:e2690; DOI 10.7717/peerj.2690 and its utility in the fields of environmental source tracking, paleontology, and studies of microbial biogeography

    Development of an interactive genome browser to visualize and analyse large scale genomic data

    Get PDF
    Genomic bioinformatics is a growing and developing field. Indeed, data analysis is becoming an integrative and essential part of any quantitative biological experiment as the technologies evolve and the wet lab methods used generate larger and larger quantities of data. Yet few standards have emerged and a plethora of analytical tools exist, none of which are established as a standard. The difficulties arise early on, even before processing any genomic data, as one first needs to visualize it. Several visualization methods exist, such as the UCSC genome browser, IGB or Argo, but none offer a satisfying interface or set of tools. Stemming from a pre-existing project at the bioinformatics and biostatistics core facility, this study presents a new solution to the multiple difficulties that at present beleaguer the field. A novel genome visualization tool is proposed where the user interface remains simple and incorporates a set of common statistical analysis functions. The software produced, entitled gFeatMiner, is capable of processing large scale genomic datasets for computing descriptive statistics and manipulate them in several ways. The program makes use of modern technologies and infrastructure paving the way for its development into an advanced data mining tool. In the second part of this study, a practical application is worked out. Examining the genes coding for ribosomal proteins in the model organism yeast (Saccharomyces cerevisiae) and using several available sets of data including multiple transcription factor binding profiles in vivo and in vitro, RNA polymerase activity and nucleosome enrichment, we attempt to better understand and reveal cellular mechanisms by clustering the numerous genes together using different criteria and machine learning strategie

    Does Ability Affect Alignment in Second Language Tutorial Dialogue?

    Get PDF
    The role of alignment between interlocutors in second language learning is different to that in fluent conversational dialogue. Learners gain linguistic skill through increased alignment, yet the extent to which they can align will be constrained by their ability. Tutors may use alignment to teach and encourage the student, yet still must push the student and correct their errors, decreasing alignment. To understand how learner ability interacts with alignment, we measure the influence of ability on lexical priming, an indicator of alignment. We find that lexical priming in learner-tutor dialogues differs from that in conversational and task-based dialogues, and we find evidence that alignment increases with ability and with word complexit

    The cytotoxic T cell proteome and its shaping by the kinase mTOR

    Get PDF
    High-resolution mass spectrometry maps the cytotoxic T lymphocyte (CTL) proteome and the impact of mammalian target of rapamycin complex 1 (mTORC1) on CTLs. The CTL proteome was dominated by metabolic regulators and granzymes and mTORC1 selectively repressed and promoted expression of subset of CTL proteins (~10%). These included key CTL effector molecules, signaling proteins and a subset of metabolic enzymes. Proteomic data highlighted the potential for mTORC1 negative control of phosphatidylinositol (3,4,5)-trisphosphate (PtdIns(3,4,5)P(3)) production in CTL. mTORC1 was shown to repress PtdIns(3,4,5)P(3) production and to determine the mTORC2 requirement for activation of the kinase Akt. Unbiased proteomic analysis thus provides a comprehensive understanding of CTL identity and mTORC1 control of CTL function

    Phosphoproteomic differences in major depressive disorder postmortem brains indicate effects on synaptic function

    Get PDF
    There is still a lack in the molecular comprehension of major depressive disorder (MDD) although this condition affects approximately 10% of the world population. Protein phosphorylation is a posttranslational modification that regulates approximately one-third of the human proteins involved in a range of cellular and biological processes such as cellular signaling. Whereas phosphoproteome studies have been carried out extensively in cancer research, few such investigations have been carried out in studies of psychiatric disorders. Here, we present a comparative phosphoproteome analysis of postmortem dorsolateral prefrontal cortex tissues from 24 MDD patients and 12 control donors. Tissue extracts were analyzed using liquid chromatography mass spectrometry in a data-independent manner (LC-MSE). Our analyses resulted in the identification of 5,195 phosphopeptides, corresponding to 802 non-redundant proteins. Ninety of these proteins showed differential levels of phosphorylation in tissues from MDD subjects compared to controls, being 20 differentially phosphorylated in at least 2 peptides. The majority of these phosphorylated proteins were associated with synaptic transmission and cellular architecture not only pointing out potential biomarker candidates but mainly shedding light to the comprehension of MDD pathobiology

    The JRC Forest Carbon Model: description of EU-CBM-HAT

    Get PDF
    The forest carbon model EU-CBM-HAT enables the assessment of forests CO2 emissions and removals under scenarios of forest management, natural disturbances, forest-related conversions and roundwood destinations (industrial roundwood and fuelwood). This model provides for a rule-based harvest distribution based on standing availability in each time step simulated, i.e. status of forest, and applicable silvicultural practices, e.g. eligible age range, periodicity, intervention intensity. eu_cbm_hat core package integrates three packages: libcbm (as a C++ rewrite of CBM-CFS3 Version 1.2) as the forest growth and disturbances simulator (developed by Forest Carbon Accounting team of the Canadian Forest Service), “COMBO”, as the tool for combination of scenarios, and “HAT”, as the harvest allocation tool (both in Python, developed by the JRC). The eu_cbm_hat is open-source (released and maintained by the JRC), with a dependency to open-source libcbm (released and maintained by CFS). The development incorporated into EU-CBM-HAT provides for an increased transparency of the modelling chain for forest-related applications associated with GHG reporting and mitigation strategies. The model was designed to support policy formulation, implementation and evaluation as well as scientific investigations. This report provides both for the scientific background behind the development and the user guidance (building on CBM-CFS3 user’s guide).JRC.D.1 - Bio-econom

    PI3Kδ and primary immunodeficiencies.

    Get PDF
    Primary immunodeficiencies are inherited disorders of the immune system, often caused by the mutation of genes required for lymphocyte development and activation. Recently, several studies have identified gain-of-function mutations in the phosphoinositide 3-kinase (PI3K) genes PIK3CD (which encodes p110δ) and PIK3R1 (which encodes p85α) that cause a combined immunodeficiency syndrome, referred to as activated PI3Kδ syndrome (APDS; also known as p110δ-activating mutation causing senescent T cells, lymphadenopathy and immunodeficiency (PASLI)). Paradoxically, both loss-of-function and gain-of-function mutations that affect these genes lead to immunosuppression, albeit via different mechanisms. Here, we review the roles of PI3Kδ in adaptive immunity, describe the clinical manifestations and mechanisms of disease in APDS and highlight new insights into PI3Kδ gleaned from these patients, as well as implications of these findings for clinical therapy

    Relationship between blood lead concentration and nutritional status among Malay primary school children in Kuala Lumpur, Malaysia.

    Get PDF
    A cross-sectional study was conducted to identify the relationship between blood lead concentration and nutritional status among primary school children in Kuala Lumpur. A total of 225 Malay students, 113 male and 112 female, aged 6.3 to 9.8 were selected through a stratified random sampling method. The random blood samples were collected and blood lead concentration was measured by a Graphite Furnace Atomic Absorption Spectrophotometer. The nutrient intake was determined by the 24-hour Dietary Recall method and Food Frequency Questionnaire. An anthropometric assessment was reported according to growth indices (z-scores of weight-for-age, height-for-age, and weight-for-height). The mean blood lead concentration was low (3.4 ± 1.91 ug/dL) and was significantly different between gender. Only 14.7% of the respondents fulfilled the daily energy requirement. The protein and iron intakes were adequate for a majority of the children. However, 34.7% of the total children showed inadequate intake of calcium. The energy, protein, fat and carbohydrate intakes were significantly different by gender, that is, males had better intake than females. Majority of respondents had normal mean z-score of growth indices. Ten percent of the respondents were underweight, 2.8% wasted and 5.4% stunted. Multiple linear regression showed inverse significant relationships between blood lead concentration with children's age (β= -0.647, p<0.001) and per capita income (β=-0.001, p=0.018). There were inverse significant relationships between blood lead concentration with children's age (β=-0.877, p=0.001) and calcium intake (β= -0.011,p=0.014) and positive significant relationship with weight-for-height (β=0.326, p=0.041) among those with inadequate calcium intake. Among children with inadequate energy intake, children's age (β= -0.621, p< 0.001), per capita income (β= -0.001,p=0.025) and protein intake (β= -0.019, p=0.027) were inversely and significantly related with blood lead concentration. In conclusion, nutritional status might affect the children's absorption of lead and further investigation is required for confirmation
    corecore