31 research outputs found
Leveraging "The Wisdom of the Crowds" for Efficient Tagging and Retrieval of documents from the Historic Newspaper Archive
Computers may have defeated humans in chess and arithmetic, but there are many areas where the human mind still excels such as visual cognition and language processing (Comm. of ACM, Vol 52, No 3, March 2009). If one mind is good, it has been argued that several minds are likely to be superior in certain tasks than individuals and even experts. This project aims to leverage the wisdom of the crowds (von Ahn, 2008) to collaboratively tag historical newspaper articles in the holdings of the New York Public Library (NYPL). Patrons and scholars will be encouraged to generate custom tags for articles they read and use often; these will be integrated into a meta-data library and evaluated for their contribution to improving retrieval performance. The text in the newspaper articles along with user-generated tags will be subjected to statistical analysis and machine learning for automatic categorization
Recommended from our members
Susceptibility Ranking of Electrical Feeders: A Case Study
Ranking problems arise in a wide range of real world applications where an ordering on a set of examples is preferred to a classification model. These applications include collaborative filtering, information retrieval and ranking components of a system by susceptibility to failure. In this paper, we present an ongoing project to rank the feeder cables of a major metropolitan area's electrical grid according to their susceptibility to outages. We describe our framework and the application of machine learning ranking methods, using scores from Support Vector Machines (SVM), RankBoost and Martingale Boosting. Finally, we present our experimental results and the lessons learned from this challenging real-world application
NILMTK: An Open Source Toolkit for Non-intrusive Load Monitoring
Non-intrusive load monitoring, or energy disaggregation, aims to separate
household energy consumption data collected from a single point of measurement
into appliance-level consumption data. In recent years, the field has rapidly
expanded due to increased interest as national deployments of smart meters have
begun in many countries. However, empirically comparing disaggregation
algorithms is currently virtually impossible. This is due to the different data
sets used, the lack of reference implementations of these algorithms and the
variety of accuracy metrics employed. To address this challenge, we present the
Non-intrusive Load Monitoring Toolkit (NILMTK); an open source toolkit designed
specifically to enable the comparison of energy disaggregation algorithms in a
reproducible manner. This work is the first research to compare multiple
disaggregation approaches across multiple publicly available data sets. Our
toolkit includes parsers for a range of existing data sets, a collection of
preprocessing algorithms, a set of statistics for describing data sets, two
reference benchmark disaggregation algorithms and a suite of accuracy metrics.
We demonstrate the range of reproducible analyses which are made possible by
our toolkit, including the analysis of six publicly available data sets and the
evaluation of both benchmark disaggregation algorithms across such data sets.Comment: To appear in the fifth International Conference on Future Energy
Systems (ACM e-Energy), Cambridge, UK. 201
Demo Abstract: NILMTK v0.2: A Non-intrusive Load Monitoring Toolkit for Large Scale Data Sets
In this demonstration, we present an open source toolkit for evaluating
non-intrusive load monitoring research; a field which aims to disaggregate a
household's total electricity consumption into individual appliances. The
toolkit contains: a number of importers for existing public data sets, a set of
preprocessing and statistics functions, a benchmark disaggregation algorithm
and a set of metrics to evaluate the performance of such algorithms.
Specifically, this release of the toolkit has been designed to enable the use
of large data sets by only loading individual chunks of the whole data set into
memory at once for processing, before combining the results of each chunk.Comment: 1st ACM International Conference on Embedded Systems For
Energy-Efficient Buildings, 201