3,959 research outputs found
Curriculum Guidelines for Undergraduate Programs in Data Science
The Park City Math Institute (PCMI) 2016 Summer Undergraduate Faculty Program
met for the purpose of composing guidelines for undergraduate programs in Data
Science. The group consisted of 25 undergraduate faculty from a variety of
institutions in the U.S., primarily from the disciplines of mathematics,
statistics and computer science. These guidelines are meant to provide some
structure for institutions planning for or revising a major in Data Science
Recovering complete and draft population genomes from metagenome datasets.
Assembly of metagenomic sequence data into microbial genomes is of fundamental value to improving our understanding of microbial ecology and metabolism by elucidating the functional potential of hard-to-culture microorganisms. Here, we provide a synthesis of available methods to bin metagenomic contigs into species-level groups and highlight how genetic diversity, sequencing depth, and coverage influence binning success. Despite the computational cost on application to deeply sequenced complex metagenomes (e.g., soil), covarying patterns of contig coverage across multiple datasets significantly improves the binning process. We also discuss and compare current genome validation methods and reveal how these methods tackle the problem of chimeric genome bins i.e., sequences from multiple species. Finally, we explore how population genome assembly can be used to uncover biogeographic trends and to characterize the effect of in situ functional constraints on the genome-wide evolution
Causality, Information and Biological Computation: An algorithmic software approach to life, disease and the immune system
Biology has taken strong steps towards becoming a computer science aiming at
reprogramming nature after the realisation that nature herself has reprogrammed
organisms by harnessing the power of natural selection and the digital
prescriptive nature of replicating DNA. Here we further unpack ideas related to
computability, algorithmic information theory and software engineering, in the
context of the extent to which biology can be (re)programmed, and with how we
may go about doing so in a more systematic way with all the tools and concepts
offered by theoretical computer science in a translation exercise from
computing to molecular biology and back. These concepts provide a means to a
hierarchical organization thereby blurring previously clear-cut lines between
concepts like matter and life, or between tumour types that are otherwise taken
as different and may not have however a different cause. This does not diminish
the properties of life or make its components and functions less interesting.
On the contrary, this approach makes for a more encompassing and integrated
view of nature, one that subsumes observer and observed within the same system,
and can generate new perspectives and tools with which to view complex diseases
like cancer, approaching them afresh from a software-engineering viewpoint that
casts evolution in the role of programmer, cells as computing machines, DNA and
genes as instructions and computer programs, viruses as hacking devices, the
immune system as a software debugging tool, and diseases as an
information-theoretic battlefield where all these forces deploy. We show how
information theory and algorithmic programming may explain fundamental
mechanisms of life and death.Comment: 30 pages, 8 figures. Invited chapter contribution to Information and
Causality: From Matter to Life. Sara I. Walker, Paul C.W. Davies and George
Ellis (eds.), Cambridge University Pres
Exploring the relationship between the Engineering and Physical Sciences and the Health and Life Sciences by advanced bibliometric methods
We investigate the extent to which advances in the health and life sciences
(HLS) are dependent on research in the engineering and physical sciences (EPS),
particularly physics, chemistry, mathematics, and engineering. The analysis
combines two different bibliometric approaches. The first approach to analyze
the 'EPS-HLS interface' is based on term map visualizations of HLS research
fields. We consider 16 clinical fields and five life science fields. On the
basis of expert judgment, EPS research in these fields is studied by
identifying EPS-related terms in the term maps. In the second approach, a
large-scale citation-based network analysis is applied to publications from all
fields of science. We work with about 22,000 clusters of publications, each
representing a topic in the scientific literature. Citation relations are used
to identify topics at the EPS-HLS interface. The two approaches complement each
other. The advantages of working with textual data compensate for the
limitations of working with citation relations and the other way around. An
important advantage of working with textual data is in the in-depth qualitative
insights it provides. Working with citation relations, on the other hand,
yields many relevant quantitative statistics. We find that EPS research
contributes to HLS developments mainly in the following five ways: new
materials and their properties; chemical methods for analysis and molecular
synthesis; imaging of parts of the body as well as of biomaterial surfaces;
medical engineering mainly related to imaging, radiation therapy, signal
processing technology, and other medical instrumentation; mathematical and
statistical methods for data analysis. In our analysis, about 10% of all EPS
and HLS publications are classified as being at the EPS-HLS interface. This
percentage has remained more or less constant during the past decade
- …