Search CORE

128,127 research outputs found

Curriculum Guidelines for Undergraduate Programs in Data Science

Author: Agarwal Mahesh
Averett Maia
Baumer Benjamin
Bray Andrew
Bressoud Thomas
Bryant Lance
Cheng Lei
De Veaux Richard
Francis Amanda
Gould Robert
Kim Albert Y.
Kretchmar Matt
Lu Qin
Moskol Ann
Nolan Deborah
Pelayo Roberto
Raleigh Sean
Sethi Ricky J.
Sondjaja Mutiara
Tiruviluamala Neelesh
Uhlig Paul
Washington Talitha
Wesley Curtis
White David
Ye Ping
Publication venue: 'Annual Reviews'
Publication date: 01/01/2017
Field of study

The Park City Math Institute (PCMI) 2016 Summer Undergraduate Faculty Program met for the purpose of composing guidelines for undergraduate programs in Data Science. The group consisted of 25 undergraduate faculty from a variety of institutions in the U.S., primarily from the disciplines of mathematics, statistics and computer science. These guidelines are meant to provide some structure for institutions planning for or revising a major in Data Science

arXiv.org e-Print Archive

Smith College: Smith ScholarWorks

Mathematics Is Biology's Next Microscope, Only Better; Biology Is Mathematics' Next Physics, Only Better

Author: Anderson
Anderson
Benzer
Cohen
Delwiche
Delwiche
Delwiche
Erdös
Euler
Ewens
Fisher
Hardy
Hastings
Johnson
Jungck
Kendall
Kendall
Kingman
Kingman
Kolmogorov
Li
Liu
Luria
Markov
Pearson
Rai
Scherf
Turing
Verhulst
Wein
Weinberg
Weinstein
Yule
Publication venue: Public Library of Science
Publication date: 01/12/2004
Field of study

Joel Cohen offers a historical and prospective analysis of the relationship between mathematics and biolog

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Data exploration systems for databases

Author: Greene Richard J.
Hield Christopher
Publication venue
Publication date
Field of study

Data exploration systems apply machine learning techniques, multivariate statistical methods, information theory, and database theory to databases to identify significant relationships among the data and summarize information. The result of applying data exploration systems should be a better understanding of the structure of the data and a perspective of the data enabling an analyst to form hypotheses for interpreting the data. This paper argues that data exploration systems need a minimum amount of domain knowledge to guide both the statistical strategy and the interpretation of the resulting patterns discovered by these systems

NASA Technical Reports Server

Statistical and fuzzy approach for database security

Author: Lu G
Lü K
Yi J
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

A new type of database anomaly is described by addressing the concept of Cumulated Anomaly in this paper. Dubiety-Determining Model (DDM), which is a detection model basing on statistical and fuzzy set theories for Cumulated Anomaly, is proposed. DDM can measure the dubiety degree of each database transaction quantitatively. Software system architecture to support the DDM for monitoring database transactions is designed. We also implemented the system and tested it. Our experimental results show that the DDM method is feasible and effective

Crossref

Brunel University Research Archive

Privacy and Confidentiality in an e-Commerce World: Data Mining, Data Warehousing, Matching and Disclosure Limitation

Author: Fienberg Stephen E.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2006
Field of study

The growing expanse of e-commerce and the widespread availability of online databases raise many fears regarding loss of privacy and many statistical challenges. Even with encryption and other nominal forms of protection for individual databases, we still need to protect against the violation of privacy through linkages across multiple databases. These issues parallel those that have arisen and received some attention in the context of homeland security. Following the events of September 11, 2001, there has been heightened attention in the United States and elsewhere to the use of multiple government and private databases for the identification of possible perpetrators of future attacks, as well as an unprecedented expansion of federal government data mining activities, many involving databases containing personal information. We present an overview of some proposals that have surfaced for the search of multiple databases which supposedly do not compromise possible pledges of confidentiality to the individuals whose data are included. We also explore their link to the related literature on privacy-preserving data mining. In particular, we focus on the matching problem across databases and the concept of ``selective revelation'' and their confidentiality implications.Comment: Published at http://dx.doi.org/10.1214/088342306000000240 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Spectral analysis of gene expression profiles using gene networks

Author: Barillot Emmanuel
Dutreix Marie
Rapaport Franck
Vert Jean-Philippe
Zinovyev Andrei
Publication venue
Publication date: 26/03/2006
Field of study

Microarrays have become extremely useful for analysing genetic phenomena, but establishing a relation between microarray analysis results (typically a list of genes) and their biological significance is often difficult. Currently, the standard approach is to map a posteriori the results onto gene networks to elucidate the functions perturbed at the level of pathways. However, integrating a priori knowledge of the gene networks could help in the statistical analysis of gene expression data and in their biological interpretation. Here we propose a method to integrate a priori the knowledge of a gene network in the analysis of gene expression data. The approach is based on the spectral decomposition of gene expression profiles with respect to the eigenfunctions of the graph, resulting in an attenuation of the high-frequency components of the expression profiles with respect to the topology of the graph. We show how to derive unsupervised and supervised classification algorithms of expression profiles, resulting in classifiers with biological relevance. We applied the method to the analysis of a set of expression profiles from irradiated and non-irradiated yeast strains. It performed at least as well as the usual classification but provides much more biologically relevant results and allows a direct biological interpretation

arXiv.org e-Print Archive

HAL-MINES ParisTech