26,957 research outputs found
Improving aircraft maintenance, repair, and overhaul: A novel text mining approach
Aircraft Maintenance, Repair and Overhaul (MRO) feedback commonly includes an engineer’s complex text-based inspection report. Capturing and normalizing the content of these textual descriptions is vital to cost and quality benchmarking, and provides information to facilitate continuous improvement of MRO process and analytics. As data analysis and mining tools requires highly normalized data, raw textual data is inadequate. This paper offers a textual-mining solution to efficiently analyse bulk textual feedback data.
Despite replacement of the same parts and/or sub-parts, the actual service cost for the same repair is often distinctly different from similar previously jobs. Regular expression algorithms were incorporated with an aircraft MRO glossary dictionary in order to help provide additional information concerning the reason for cost variation. Professional terms and conventions were included within the dictionary to avoid ambiguity and improve the outcome of the result. Testing results show that most descriptive inspection reports can be appropriately interpreted, allowing extraction of highly normalized data. This additional normalized data strongly supports data analysis and data mining, whilst also increasing the accuracy of future quotation costing. This solution has been effectively used by a large aircraft MRO agency with positive results
PMLB: A Large Benchmark Suite for Machine Learning Evaluation and Comparison
The selection, development, or comparison of machine learning methods in data
mining can be a difficult task based on the target problem and goals of a
particular study. Numerous publicly available real-world and simulated
benchmark datasets have emerged from different sources, but their organization
and adoption as standards have been inconsistent. As such, selecting and
curating specific benchmarks remains an unnecessary burden on machine learning
practitioners and data scientists. The present study introduces an accessible,
curated, and developing public benchmark resource to facilitate identification
of the strengths and weaknesses of different machine learning methodologies. We
compare meta-features among the current set of benchmark datasets in this
resource to characterize the diversity of available data. Finally, we apply a
number of established machine learning methods to the entire benchmark suite
and analyze how datasets and algorithms cluster in terms of performance. This
work is an important first step towards understanding the limitations of
popular benchmarking suites and developing a resource that connects existing
benchmarking standards to more diverse and efficient standards in the future.Comment: 14 pages, 5 figures, submitted for review to JML
OpenML Benchmarking Suites
Machine learning research depends on objectively interpretable, comparable,
and reproducible algorithm benchmarks. Therefore, we advocate the use of
curated, comprehensive suites of machine learning tasks to standardize the
setup, execution, and reporting of benchmarks. We enable this through software
tools that help to create and leverage these benchmarking suites. These are
seamlessly integrated into the OpenML platform, and accessible through
interfaces in Python, Java, and R. OpenML benchmarking suites are (a) easy to
use through standardized data formats, APIs, and client libraries; (b)
machine-readable, with extensive meta-information on the included datasets; and
(c) allow benchmarks to be shared and reused in future studies. We also present
a first, carefully curated and practical benchmarking suite for classification:
the OpenML Curated Classification benchmarking suite 2018 (OpenML-CC18)
- …