12 research outputs found
Recommended from our members
Data pre-processing for the preterm prediction study MFMU dataset
Preterm birth is a major public health problem with profound implications on society. There would be extreme value in being able to identify women at risk of preterm birth during the course of their pregnancy. Previous research has largely focused on individual risk factors correlated with preterm birth (e.g. prior preterm birth, race, and infection) and less on combining these factors in a way to understand the complex etiologies of preterm birth. We attempt to address this gap by conducting a deeper analysis of the preterm prediction study data collected by the NICHD Maternal Fetal Medicine Units (MFMU) Network, a high-quality data for over 3,000 singleton pregnancies having detailed study visits and biospecimen collection at 24, 26, 28 and 30 weeks gestation. Reports from this dataset used relatively straightforward biostatitistical methodologies such as relative risk assessments to measure associations between risk factors and PTB (Maternal Fetal Medicine Units Net- work. Biostatistical Coordinating Center NICHD Networks, 1995). These methods include descriptive statistics, Pearson correlation, Fisher’s exact tests and linear/logistic regression where risk factors are studied independent of each other. In order to perform detailed experiments on this data using non-linear Support Vector Machines and other machine learning (ML) methodologies, it is necessary to complete several pre-processing steps that we describe in this report
Analytics for Power Grid Distribution Reliability in New York City
We summarize the first major effort to use analytics for preemptive maintenance and repair of an electrical distribution network. This is a large-scale multiyear effort between scientists and students at Columbia University and the Massachusetts Institute of Technology and engineers from the Consolidated Edison Company of New York (Con Edison), which operates the world’s oldest and largest underground electrical system. Con Edison’s preemptive maintenance programs are less than a decade old and are made more effective with the use of analytics developing alongside them. Some of the data we used for our projects are historical records dating as far back as the 1880s, and some of the data are free-text documents typed by Con Edison dispatchers. The operational goals of this work are to assist with Con Edison’s preemptive inspection and repair program and its vented-cover replacement program. This has a continuing impact on the public safety, operating costs, and reliability of electrical service in New York City
Machine Learning for the New York City Power Grid
Power companies can benefit from the use of knowledge discovery methods and statistical machine learning for preventive maintenance. We introduce a general process for transforming historical electrical grid data into models that aim to predict the risk of failures for components and systems. These models can be used directly by power companies to assist with prioritization of maintenance and repair work. Specialized versions of this process are used to produce (1) feeder failure rankings, (2) cable, joint, terminator, and transformer rankings, (3) feeder Mean Time Between Failure (MTBF) estimates, and (4) manhole events vulnerability rankings. The process in its most general form can handle diverse, noisy, sources that are historical (static), semi-real-time, or real-time, incorporates state-of-the-art machine learning algorithms for prioritization (supervised ranking or MTBF), and includes an evaluation of results via cross-validation and blind test. Above and beyond the ranked lists and MTBF estimates are business management interfaces that allow the prediction capability to be integrated directly into corporate planning and decision support; such interfaces rely on several important properties of our general modeling approach: that machine learning features are meaningful to domain experts, that the processing of data is transparent, and that prediction results are accurate enough to support sound decision making. We discuss the challenges in working with historical electrical grid data that were not designed for predictive purposes. The “rawness” of these data contrasts with the accuracy of the statistical models that can be obtained from the process; these models are sufficiently accurate to assist in maintaining New York City's electrical grid
Report cards for manholes: Eliciting expert feedback for a learning task
We present a manhole profiling tool, developed as part of the Columbia/Con Edison machine learning project on manhole event prediction, and discuss its role in evaluating our machine learning model in three important ways: elimination of outliers, elimination of falsely predictive features, and assessment of the quality of the model. The model produces a ranked list of tens of thousands of manholes in Manhattan, where the ranking criterion is vulnerability to serious events such as fires, explosions and smoking manholes. Con Edison set two goals for the model, namely accuracy and intuitiveness, and this tool made it possible for us to address both of these goals. The tool automatically assembles a "report card" or "profile" highlighting data associated with a given manhole. Prior to the processing work that underlies the profiling tool, case studies of a single manhole took several days and resulted in an incomplete study; locating manholes such as those we present in this work would have been extremely difficult. The model is currently assisting Con Edison in determining repair priorities for the secondary electrical grid
Analytics for Power Grid Distribution Reliability in New York City
We summarize the first major effort to use analytics for preemptive maintenance and repair of an electrical distribution network. This is a large-scale multiyear effort between scientists and students at Columbia University and the Massachusetts Institute of Technology and engineers from the Consolidated Edison Company of New York (Con Edison), which operates the world's oldest and largest underground electrical system. Con Edison's preemptive maintenance programs are less than a decade old and are made more effective with the use of analytics developing alongside them. Some of the data we used for our projects are historical records dating as far back as the 1880s, and some of the data are free-text documents typed by Con Edison dispatchers. The operational goals of this work are to assist with Con Edison's preemptive inspection and repair program and its vented-cover replacement program. This has a continuing impact on the public safety, operating costs, and reliability of electrical service in New York City