Search CORE

31 research outputs found

Leveraging "The Wisdom of the Crowds" for Efficient Tagging and Retrieval of documents from the Historic Newspaper Archive

Author: Haimonti Dutta
Haimonti Dutta
Manoj Pooleery
Megha Gupta
William Chan
Publication venue: 'Modern Language Association'
Publication date: 01/01/2013
Field of study

Computers may have defeated humans in chess and arithmetic, but there are many areas where the human mind still excels such as visual cognition and language processing (Comm. of ACM, Vol 52, No 3, March 2009). If one mind is good, it has been argued that several minds are likely to be superior in certain tasks than individuals and even experts. This project aims to leverage the wisdom of the crowds (von Ahn, 2008) to collaboratively tag historical newspaper articles in the holdings of the New York Public Library (NYPL). Patrons and scholars will be encouraged to generate custom tags for articles they read and use often; these will be integrated into a meta-data library and evaluated for their contribution to improving retrieval performance. The text in the newspaper articles along with user-generated tags will be subjected to statistical analysis and machine learning for automatic categorization

Humanities Commons

Recommended from our members

Susceptibility Ranking of Electrical Feeders: A Case Study

Author: Boulanger Albert
Boulanger Albert G.
Dutta Haimonti
Dutta Haimonti
Gross Philip
Gross Philip N.
Salleb-Aouissi Ansaf
Salleb-Aouissi Ansaf
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2008
Field of study

Ranking problems arise in a wide range of real world applications where an ordering on a set of examples is preferred to a classification model. These applications include collaborative filtering, information retrieval and ranking components of a system by susceptibility to failure. In this paper, we present an ongoing project to rank the feeder cables of a major metropolitan area's electrical grid according to their susceptibility to outages. We describe our framework and the application of machine learning ranking methods, using scores from Support Vector Machines (SVM), RankBoost and Martingale Boosting. Finally, we present our experimental results and the lessons learned from this challenging real-world application

Columbia University Academic Commons

NILMTK: An Open Source Toolkit for Non-intrusive Load Monitoring

Author: Batra Nipun
Dutta Haimonti
Kelly Jack
Knottenbelt William
Parson Oliver
Rogers Alex
Singh Amarjeet
Srivastava Mani
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/04/2014
Field of study

Non-intrusive load monitoring, or energy disaggregation, aims to separate household energy consumption data collected from a single point of measurement into appliance-level consumption data. In recent years, the field has rapidly expanded due to increased interest as national deployments of smart meters have begun in many countries. However, empirically comparing disaggregation algorithms is currently virtually impossible. This is due to the different data sets used, the lack of reference implementations of these algorithms and the variety of accuracy metrics employed. To address this challenge, we present the Non-intrusive Load Monitoring Toolkit (NILMTK); an open source toolkit designed specifically to enable the comparison of energy disaggregation algorithms in a reproducible manner. This work is the first research to compare multiple disaggregation approaches across multiple publicly available data sets. Our toolkit includes parsers for a range of existing data sets, a collection of preprocessing algorithms, a set of statistics for describing data sets, two reference benchmark disaggregation algorithms and a suite of accuracy metrics. We demonstrate the range of reproducible analyses which are made possible by our toolkit, including the analysis of six publicly available data sets and the evaluation of both benchmark disaggregation algorithms across such data sets.Comment: To appear in the fifth International Conference on Future Energy Systems (ACM e-Energy), Cambridge, UK. 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Spiral - Imperial College Digital Repository

Demo Abstract: NILMTK v0.2: A Non-intrusive Load Monitoring Toolkit for Large Scale Data Sets

Author: Batra Nipun
Dutta Haimonti
Kelly Jack
Knottenbelt William
Parson Oliver
Rogers Alex
Singh Amarjeet
Srivastava Mani
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 09/11/2014
Field of study

In this demonstration, we present an open source toolkit for evaluating non-intrusive load monitoring research; a field which aims to disaggregate a household's total electricity consumption into individual appliances. The toolkit contains: a number of importers for existing public data sets, a set of preprocessing and statistics functions, a benchmark disaggregation algorithm and a set of metrics to evaluate the performance of such algorithms. Specifically, this release of the toolkit has been designed to enable the use of large data sets by only loading individual chunks of the whole data set into memory at once for processing, before combining the results of each chunk.Comment: 1st ACM International Conference on Embedded Systems For Energy-Efficient Buildings, 201

arXiv.org e-Print Archive

Crossref