76,833 research outputs found
Curriculum Guidelines for Undergraduate Programs in Data Science
The Park City Math Institute (PCMI) 2016 Summer Undergraduate Faculty Program
met for the purpose of composing guidelines for undergraduate programs in Data
Science. The group consisted of 25 undergraduate faculty from a variety of
institutions in the U.S., primarily from the disciplines of mathematics,
statistics and computer science. These guidelines are meant to provide some
structure for institutions planning for or revising a major in Data Science
Incorporating molecular data in fungal systematics: a guide for aspiring researchers
The last twenty years have witnessed molecular data emerge as a primary
research instrument in most branches of mycology. Fungal systematics, taxonomy,
and ecology have all seen tremendous progress and have undergone rapid,
far-reaching changes as disciplines in the wake of continual improvement in DNA
sequencing technology. A taxonomic study that draws from molecular data
involves a long series of steps, ranging from taxon sampling through the
various laboratory procedures and data analysis to the publication process. All
steps are important and influence the results and the way they are perceived by
the scientific community. The present paper provides a reflective overview of
all major steps in such a project with the purpose to assist research students
about to begin their first study using DNA-based methods. We also take the
opportunity to discuss the role of taxonomy in biology and the life sciences in
general in the light of molecular data. While the best way to learn molecular
methods is to work side by side with someone experienced, we hope that the
present paper will serve to lower the learning threshold for the reader.Comment: Submitted to Current Research in Environmental and Applied Mycology -
comments most welcom
Creating a Relational Distributed Object Store
In and of itself, data storage has apparent business utility. But when we can
convert data to information, the utility of stored data increases dramatically.
It is the layering of relation atop the data mass that is the engine for such
conversion. Frank relation amongst discrete objects sporadically ingested is
rare, making the process of synthesizing such relation all the more
challenging, but the challenge must be met if we are ever to see an equivalent
business value for unstructured data as we already have with structured data.
This paper describes a novel construct, referred to as a relational distributed
object store (RDOS), that seeks to solve the twin problems of how to
persistently and reliably store petabytes of unstructured data while
simultaneously creating and persisting relations amongst billions of objects.Comment: 12 pages, 5 figure
Privacy and Confidentiality in an e-Commerce World: Data Mining, Data Warehousing, Matching and Disclosure Limitation
The growing expanse of e-commerce and the widespread availability of online
databases raise many fears regarding loss of privacy and many statistical
challenges. Even with encryption and other nominal forms of protection for
individual databases, we still need to protect against the violation of privacy
through linkages across multiple databases. These issues parallel those that
have arisen and received some attention in the context of homeland security.
Following the events of September 11, 2001, there has been heightened attention
in the United States and elsewhere to the use of multiple government and
private databases for the identification of possible perpetrators of future
attacks, as well as an unprecedented expansion of federal government data
mining activities, many involving databases containing personal information. We
present an overview of some proposals that have surfaced for the search of
multiple databases which supposedly do not compromise possible pledges of
confidentiality to the individuals whose data are included. We also explore
their link to the related literature on privacy-preserving data mining. In
particular, we focus on the matching problem across databases and the concept
of ``selective revelation'' and their confidentiality implications.Comment: Published at http://dx.doi.org/10.1214/088342306000000240 in the
Statistical Science (http://www.imstat.org/sts/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …