4,426,808 research outputs found
Data sets and data quality in software engineering
OBJECTIVE - to assess the extent and types of techniques used to manage quality within software engineering data sets. We consider this a particularly interesting question in the context of initiatives to promote sharing and secondary analysis of data sets.
METHOD - we perform a systematic review of available empirical software engineering studies.
RESULTS - only 23 out of the many hundreds of studies assessed, explicitly considered data quality.
CONCLUSIONS - first, the community needs to consider the quality and appropriateness of the data set being utilised; not all data sets are equal. Second, we need more research into means of identifying, and ideally repairing, noisy cases. Third, it should become routine to use sensitivity analysis to assess conclusion stability with respect to the assumptions that must be made concerning noise levels
Data Engineering for the Analysis of Semiconductor Manufacturing Data
We have analyzed manufacturing data from several different semiconductor
manufacturing plants, using decision tree induction software called
Q-YIELD. The software generates rules for predicting when a given product
should be rejected. The rules are intended to help the process engineers
improve the yield of the product, by helping them to discover the causes
of rejection. Experience with Q-YIELD has taught us the importance of
data engineering -- preprocessing the data to enable or facilitate
decision tree induction. This paper discusses some of the data engineering
problems we have encountered with semiconductor manufacturing data.
The paper deals with two broad classes of problems: engineering the features
in a feature vector representation and engineering the definition of the
target concept (the classes). Manufacturing process data present special
problems for feature engineering, since the data have multiple levels of
granularity (detail, resolution). Engineering the target concept is important,
due to our focus on understanding the past, as opposed to the more common
focus in machine learning on predicting the future
Data Mining to Support Engineering Design Decision
The design and maintenance of an aero-engine generates a significant amount of documentation. When designing new engines, engineers must obtain knowledge gained from maintenance of existing engines to identify possible areas of concern. Firstly, this paper investigate the use of advanced business intelligence tenchniques to solve the problem of knowledge transfer from maintenance to design of aeroengines. Based on data availability and quality, various models were deployed. An association model was used to uncover hidden trends among parts involved in maintenance events. Classification techniques comprising of various algorithms was employed to determine severity of events. Causes of high severity events that lead to major financial loss was traced with the help of summarization techniques. Secondly this paper compares and evaluates the business intelligence approach to solve the problem of knowledge transfer with solutions available from the Semantic Web. The results obtained provide a compelling need to have data mining support on RDF/OWL-based warehoused data
Knowledge Engineering from Data Perspective: Granular Computing Approach
The concept of rough set theory is a mathematical approach to uncertainly and vagueness in data analysis, introduced by Zdzislaw Pawlak in 1980s. Rough set theory assumes the underlying structure of knowledge is a partition. We have extended Pawlak’s concept of knowledge to coverings. We have taken a soft approach regarding any generalized subset as a basic knowledge. We regard a covering as basic knowledge from which the theory of knowledge approximations and learning, knowledge dependency and reduct are developed
Engineering Workflow: The Process in Product Data Technology
The prevailing paradigm for enterprises in the new decade is undoubtedly speed. This enterprise view is driven by the availability of e-business technology that enables new forms of collaboration between companies. The rapid developments in e-business also have an impact on the future of engineering organizations. This paper focuses on the early phases of a product’s life cycle, i.e. between initial concept and release to manufacturing. New engineering workflow capabilities are presented, that have been tailored to speed up the engineering of new products
- …
