64 research outputs found

    A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing

    Get PDF
    Transcriptomics (at the level of single cells, tissues and/or whole organisms) underpins many fields of biomedical science, from understanding the basic cellular function in model organisms, to the elucidation of the biological events that govern the development and progression of human diseases, and the exploration of the mechanisms of survival, drug-resistance and virulence of pathogens. Next-generation sequencing (NGS) technologies are contributing to a massive expansion of transcriptomics in all fields and are reducing the cost, time and performance barriers presented by conventional approaches. However, bioinformatic tools for the analysis of the sequence data sets produced by these technologies can be daunting to researchers with limited or no expertise in bioinformatics. Here, we constructed a semi-automated, bioinformatic workflow system, and critically evaluated it for the analysis and annotation of large-scale sequence data sets generated by NGS. We demonstrated its utility for the exploration of differences in the transcriptomes among various stages and both sexes of an economically important parasitic worm (Oesophagostomum dentatum) as well as the prediction and prioritization of essential molecules (including GTPases, protein kinases and phosphatases) as novel drug target candidates. This workflow system provides a practical tool for the assembly, annotation and analysis of NGS data sets, also to researchers with a limited bioinformatic expertise. The custom-written Perl, Python and Unix shell computer scripts used can be readily modified or adapted to suit many different applications. This system is now utilized routinely for the analysis of data sets from pathogens of major socio-economic importance and can, in principle, be applied to transcriptomics data sets from any organism

    A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing

    Get PDF
    Transcriptomics (at the level of single cells, tissues and/or whole organisms) underpins many fields of biomedical science, from understanding the basic cellular function in model organisms, to the elucidation of the biological events that govern the development and progression of human diseases, and the exploration of the mechanisms of survival, drug-resistance and virulence of pathogens. Next-generation sequencing (NGS) technologies are contributing to a massive expansion of transcriptomics in all fields and are reducing the cost, time and performance barriers presented by conventional approaches. However, bioinformatic tools for the analysis of the sequence data sets produced by these technologies can be daunting to researchers with limited or no expertise in bioinformatics. Here, we constructed a semi-automated, bioinformatic workflow system, and critically evaluated it for the analysis and annotation of large-scale sequence data sets generated by NGS. We demonstrated its utility for the exploration of differences in the transcriptomes among various stages and both sexes of an economically important parasitic worm (Oesophagostomum dentatum) as well as the prediction and prioritization of essential molecules (including GTPases, protein kinases and phosphatases) as novel drug target candidates. This workflow system provides a practical tool for the assembly, annotation and analysis of NGS data sets, also to researchers with a limited bioinformatic expertise. The custom-written Perl, Python and Unix shell computer scripts used can be readily modified or adapted to suit many different applications. This system is now utilized routinely for the analysis of data sets from pathogens of major socio-economic importance and can, in principle, be applied to transcriptomics data sets from any organism

    Galaxy And Mass Assembly (GAMA): stellar mass estimates

    Get PDF
    This paper describes the first catalogue of photometrically derived stellar mass estimates for intermediate-redshift (z < 0.65; median z= 0.2) galaxies in the Galaxy And Mass Assembly (GAMA) spectroscopic redshift survey. These masses, as well as the full set of ancillary stellar population parameters, will be made public as part of GAMA data release 2. Although the GAMA database does include near-infrared (NIR) photometry, we show that the quality of our stellar population synthesis fits is significantly poorer when these NIR data are included. Further, for a large fraction of galaxies, the stellar population parameters inferred from the optical-plus-NIR photometry are formally inconsistent with those inferred from the optical data alone. This may indicate problems in our stellar population library, or NIR data issues, or both; these issues will be addressed for future versions of the catalogue. For now, we have chosen to base our stellar mass estimates on optical photometry only. In light of our decision to ignore the available NIR data, we examine how well stellar mass can be constrained based on optical data alone. We use generic properties of stellar population synthesis models to demonstrate that restframe colour alone is in principle a very good estimator of stellar mass-to-light ratio, M*/Li. Further, we use the observed relation between restframe (g−i) and M*/Li for real GAMA galaxies to argue that, modulo uncertainties in the stellar evolution models themselves, (g−i) colour can in practice be used to estimate M*/Li to an accuracy of ≲0.1 dex (1σ). This ‘empirically calibrated' (g−i)-M*/Li relation offers a simple and transparent means for estimating galaxies' stellar masses based on minimal data, and so provides a solid basis for other surveys to compare their results to z≲0.4 measurements from GAM

    Galaxy and Mass Assembly: FUV, NUV, ugrizYJHK Petrosian, Kron and Sérsic photometry

    Get PDF
    In order to generate credible 0.1-2 μm spectral energy distributions, the Galaxy and Mass Assembly (GAMA) project requires many gigabytes of imaging data from a number of instruments to be reprocessed into a standard format. In this paper, we discuss the software infrastructure we use, and create self-consistent ugrizYJHK photometry for all sources within the GAMA sample. Using UKIDSS and SDSS archive data, we outline the pre-processing necessary to standardize all images to a common zero-point, the steps taken to correct for the seeing bias across the data set and the creation of gigapixel-scale mosaics of the three 4 × 12 deg2 GAMA regions in each filter. From these mosaics, we extract source catalogues for the GAMA regions using elliptical Kron and Petrosian matched apertures. We also calculate Sérsic magnitudes for all galaxies within the GAMA sample using sigma, a galaxy component modelling wrapper for galfit 3. We compare the resultant photometry directly and also calculate the r-band galaxy luminosity function for all photometric data sets to highlight the uncertainty introduced by the photometric method. We find that (1) changing the object detection threshold has a minor effect on the best-fitting Schechter parameters of the overall population (M*± 0.055 mag, α± 0.014, ϕ*± 0.0005 h3 Mpc−3); (2) there is an offset between data sets that use Kron or Petrosian photometry, regardless of the filter; (3) the decision to use circular or elliptical apertures causes an offset in M* of 0.20 mag; (4) the best-fitting Schechter parameters from total-magnitude photometric systems (such as SDSS modelmag or Sérsic magnitudes) have a steeper faint-end slope than photometric systems based upon Kron or Petrosian measurements; and (5) our Universe's total luminosity density, when calculated using Kron or Petrosian r-band photometry, is underestimated by at least 15 per cen

    Finding Our Way through Phenotypes

    Get PDF
    Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, their current form inhibits productive data analysis. The lack of a community-wide, consensus-based, human- and machine-interpretable language for describing phenotypes and their genomic and environmental contexts is perhaps the most pressing scientific bottleneck to integration across many key fields in biology, including genomics, systems biology, development, medicine, evolution, ecology, and systematics. Here we survey the current phenomics landscape, including data resources and handling, and the progress that has been made to accurately capture relevant data descriptions for phenotypes. We present an example of the kind of integration across domains that computable phenotypes would enable, and we call upon the broader biology community, publishers, and relevant funding agencies to support efforts to surmount today's data barriers and facilitate analytical reproducibility

    Massively Parallel Sequencing and Analysis of the Necator americanus Transcriptome

    Get PDF
    The blood-feeding hookworm Necator americanus infects hundreds of millions of people. To elucidate fundamental molecular biological aspects of this hookworm, the transcriptome of adult Necator americanus was studied using next-generation sequencing and in silico analyses. Contigs (n = 19,997) were assembled from the sequence data; 6,771 of them had known orthologues in the free-living nematode Caenorhabditis elegans, and most encoded proteins with WD40 repeats (10.6%), proteinase inhibitors (7.8%) or calcium-binding EF-hand proteins (6.7%). Bioinformatic analyses inferred that C. elegans homologues are involved mainly in biological pathways linked to ribosome biogenesis (70%), oxidative phosphorylation (63%) and/or proteases (60%). Comparative analyses of the transcriptomes of N. americanus and the canine hookworm, Ancylostoma caninum, revealed qualitative and quantitative differences. Essential molecules were predicted using a combination of orthology mapping and functional data available for C. elegans. Further analyses allowed the prioritization of 18 predicted drug targets which did not have human homologues. These candidate targets were inferred to be linked to mitochondrial metabolism or amino acid synthesis. This investigation provides detailed insights into the transcriptome of the adult stage of N. americanus

    The endocrine tumor summit 2008: appraising therapeutic approaches for acromegaly and carcinoid syndrome

    Get PDF
    The Endocrine Tumor Summit convened in December 2008 to address 6 statements prepared by panel members that reflect important questions in the treatment of acromegaly and carcinoid syndrome. Data pertinent to each of the statements were identified through review of pertinent literature by one of the 9-member panel, enabling a critical evaluation of the statements and the evidence supporting or refuting them. Three statements addressed the validity of serum growth hormone (GH) and insulin-like growth factor-I (IGF-I) concentrations as indicators or predictors of disease in acromegaly. Statements regarding the effects of preoperative somatostatin analog use on pituitary surgical outcomes, their effects on hormone and symptom control in carcinoid syndrome, and the efficacy of extended dosing intervals were reviewed. Panel opinions, based on the level of available scientific evidence, were polled. Finally, their views were compared with those of surveyed community-based endocrinologists and neurosurgeons

    The DESI survey validation : results from visual inspection of bright galaxies, luminous red galaxies, and emission line galaxies

    Get PDF
    Funding: TWL was supported by the Ministry of Science and Technology (MOST 111-2112-M-002-015-MY3), the Ministry of Education, Taiwan (MOE Yushan Young Scholar grant NTU-110VV007), National Taiwan University research grants (NTU CC-111L894806, NTU- 111L7318), and NSF grant AST-1911140. DMA acknowledges the Science Technology and Facilities Council (STFC) for support through grant code ST/T000244/1. This research is supported by the Director, Office of Science, Office of High Energy Physics of the U.S. Department of Energy under Contract No. DE–AC02–05CH11231, and by the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility under the same contract; additional support for DESI is provided by the U.S. National Science Foundation, Division of Astronomical Sciences under Contract No. AST-0950945 to the NSF’s National Optical-Infrared Astronomy Research Laboratory; the Science and Technologies Facilities Council of the United Kingdom; the Gordon and Betty Moore Foundation; the Heising-Simons Foundation; the French Alternative Energies and Atomic Energy Commission (CEA); the National Council of Science and Technology of Mexico (CONACYT); the Ministry of Science and Innovation of Spain (MICINN), and by the DESI Member Institutions: https://www.desi.lbl.gov/ collaborating-institutions.The Dark Energy Spectroscopic Instrument (DESI) Survey has obtained a set of spectroscopic measurements of galaxies for validating the final survey design and target selections. To assist these tasks, we visually inspect (VI) DESI spectra of approximately 2,500 bright galaxies, 3,500 luminous red galaxies, and 10,000 emission line galaxies, to obtain robust redshift identifications. We then utilize the VI redshift information to characterize the performance of the DESI operation. Based on the VI catalogs, our results show that the final survey design yields samples of bright galaxies, luminous red galaxies, and emission line galaxies with purity greater than 99%. Moreover, we demonstrate that the precision of the redshift measurements is approximately 10 km/s for bright galaxies and emission line galaxies and approximately 40 km/s for luminous red galaxies. The average redshift accuracy is within 10 km/s for the three types of galaxies. The VI process also helps to improve the quality of the DESI data by identifying spurious spectral features introduced by the pipeline. Finally, we show examples of unexpected real astronomical objects, such as Lyman α emitters and strong lensing candidates, identified by VI. These results demonstrate the importance and utility of visually inspecting data from incoming and upcoming surveys, especially during their early operation phases.Publisher PDFPeer reviewe
    corecore