Search CORE

890,923 research outputs found

Recommended from our members

A classification of data quality assessment and improvement methods

Author: Borek A
Oberhofer M
Woodall P
Publication venue: International Journal of Information Quality
Publication date: 01/01/2014
Field of study

Data quality (DQ) assessment and improvement in larger information systems would often not be feasible without using suitable “DQ methods”, which are algorithms that can be automatically executed by computer systems to detect and/or correct problems in datasets. Currently, these methods are already essential, and they will be of even greater importance as the quantity of data in organisational systems grows. This paper provides a review of existing methods for both DQ assessment and improvement and classifies them according to the DQ problem and problem context. Six gaps have been identified in the classification, where no current DQ methods exist, and these show where new methods are required as a guide for future research and DQ tool development.This is the accepted manuscript. It's currently embargoed pending publication by Inderscience

Nottingham Trent Institutional Repository (IRep)

Apollo (Cambridge)

Towards Sweetness Classification of Orange Cultivars Using Short‑Wave NIR Spectroscopy

Author: Alanazi Eisa
Ghafoor Abdul
Imran Muhammad
Islam Tiwana Mohsin
Malik Amanullah
Mirza Alina
Qureshi Waqar Shahid
Zeb Ayesha
Publication venue: Technological University Dublin
Publication date: 01/01/2023
Field of study

The global orange industry constantly faces new technical challenges to meet consumer demands for quality fruits. Instead of traditional subjective fruit quality assessment methods, the interest in the horticulture industry has increased in objective, quantitative, and non-destructive assessment methods. Oranges have a thick peel which makes their non-destructive quality assessment challenging. This paper evaluates the potential of short-wave NIR spectroscopy and direct sweetness classification approach for Pakistani cultivars of orange, i.e., Red-Blood, Mosambi, and Succari. The correlation between quality indices, i.e., Brix, titratable acidity (TA), Brix: TA and BrimA (Brix minus acids), sensory assessment of the fruit, and short-wave NIR spectra, is analysed. Mix cultivar oranges are classified as sweet, mixed, and acidic based on short-wave NIR spectra. Short-wave NIR spectral data were obtained using the industry standard F-750 fruit quality meter (310–1100 nm). Reference Brix and TA measurements were taken using standard destructive testing methods. Reference taste labels i.e., sweet, mix, and acidic, were acquired through sensory evaluation of samples. For indirect fruit classification, partial least squares regression models were developed for Brix, TA, Brix: TA, and BrimA estimation with a correlation coefficient of 0.57, 0.73, 0.66, and 0.55, respectively, on independent test data. The ensemble classifier achieved 81.03% accuracy for three classes (sweet, mixed, and acidic) classification on independent test data for direct fruit classification. A good correlation between NIR spectra and sensory assessment is observed as compared to quality indices. A direct classification approach is more suitable for a machine-learning-based orange sweetness classification using NIR spectroscopy than the estimation of quality indices

Arrow@TUDublin

PubMed Central

Recent development in electronic nose data processing for beef quality assessment

Author: Sarno Riyanarto
Wijaya Dedy Rahman
Publication venue: 'Universitas Ahmad Dahlan'
Publication date: 01/02/2019
Field of study

Beef is kind of perishable food that easily to decay. Hence, a rapid system for beef quality assessment is needed to guarantee the quality of beef. In the last few years, electronic nose (e-nose) is developed for beef spoilage detection. In this paper, we discuss the challenges of e-nose application to beef quality assessment, especially in e-nose data processing. We also provide a summary of our previous studies that explains several methods to deal with gas sensor noise, sensor array optimization problem, beef quality classification, and prediction of the microbial population in beef sample. This paper might be useful for researchers and practitioners to understand the challenges and methods of e-nose data processing for beef quality assessment

Journal of Education and Learning (EduLearn)

TELKOMNIKA (Telecommunication Computing Electronics and Control)

UAD Journal Management System

Large-scale nonlinear dimensionality reduction for network intrusion detection

Author: Hamid Yasir
Journaux Ludovic
Lee John Aldo
Nabi Bushra
Sautot Lucile
Sugumaran M
Publication venue: HAL CCSD
Publication date: 24/04/2017
Field of study

International audienceNetwork intrusion detection (NID) is a complex classification problem. In this paper, we combine classification with recent and scalable nonlinear dimensionality reduction (NLDR) methods. Classification and DR are not necessarily adversarial, provided adequate cluster magnification occurring in NLDR methods like

t

-SNE: DR mitigates the curse of dimensionality, while cluster magnification can maintain class separability. We demonstrate experimentally the effectiveness of the approach by analyzing and comparing results on the big KDD99 dataset, using both NLDR quality assessment and classification rate for SVMs and random forests. Since data involves features of mixed types (numerical and categorical), the use of Gower's similarity coefficient as metric further improves the results over the classical similarity metric

HAL-uB

HAL Descartes

HAL-CIRAD

Multisource and temporal variability in Portuguese hospital administrative datasets: Data quality implications

Author: Caballero Ismael
Freitas Alberto
Lobo Mariana
Lopes Fernando
Pinto Andreia
Souza Júlio
Sáez Silvestre Carlos
Vasco Santos Joao
Viana Joao
Publication venue: Elsevier
Publication date: 01/12/2022
Field of study

[EN] Background: Unexpected variability across healthcare datasets may indicate data quality issues and thereby affect the credibility of these data for reutilization. No gold-standard reference dataset or methods for variability assessment are usually available for these datasets. In this study, we aim to describe the process of discovering data quality implications by applying a set of methods for assessing variability between sources and over time in a large hospital database. Methods: We described and applied a set of multisource and temporal variability assessment methods in a large Portuguese hospitalization database, in which variation in condition-specific hospitalization ratios derived from clinically coded data were assessed between hospitals (sources) and over time. We identified condition-specific admissions using the Clinical Classification Software (CCS), developed by the Agency of Health Care Research and Quality. A Statistical Process Control (SPC) approach based on funnel plots of condition-specific standardized hospitalization ratios (SHR) was used to assess multisource variability, whereas temporal heat maps and Information-Geometric Temporal (IGT) plots were used to assess temporal variability by displaying temporal abrupt changes in data distributions. Results were presented for the 15 most common inpatient conditions (CCS) in Portugal. Main findings: Funnel plot assessment allowed the detection of several outlying hospitals whose SHRs were much lower or higher than expected. Adjusting SHR for hospital characteristics, beyond age and sex, considerably affected the degree of multisource variability for most diseases. Overall, probability distributions changed over time for most diseases, although heterogeneously. Abrupt temporal changes in data distributions for acute myocardial infarction and congestive heart failure coincided with the periods comprising the transition to the International Classification of Diseases, 10th revision, Clinical Modification, whereas changes in the DiagnosisRelated Groups software seem to have driven changes in data distributions for both acute myocardial infarction and liveborn admissions. The analysis of heat maps also allowed the detection of several discontinuities at hospital level over time, in some cases also coinciding with the aforementioned factors. Conclusions: This paper described the successful application of a set of reproducible, generalizable and systematic methods for variability assessment, including visualization tools that can be useful for detecting abnormal patterns in healthcare data, also addressing some limitations of common approaches. The presented method for multisource variability assessment is based on SPC, which is an advantage considering the lack of gold standard for such process. Properly controlling for hospital characteristics and differences in case-mix for estimating SHR is critical for isolating data quality-related variability among data sources. The use of IGT plots provides an advantage over common methods for temporal variability assessment due its suitability for multitype and multimodal data, which are common characteristics of healthcare data. The novelty of this work is the use of a set of methods to discover new data quality insights in healthcare data.The authors would like to thank the Central Authority for Health Services, I.P. (ACSS) for providing access to the data. The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was financed by FEDER-Fundo Europeu de Desenvolvimento Regional funds through the COMPETE 2020-Operacional Programme for Competitiveness and Internationalisation (POCI) and by Portuguese funds through FCT- Fundacao para a Ciencia e a Tecnologia in the framework of the project POCI-01-0145-FEDER-030766 ("1st.IndiQare-Quality indicators in primary health care: validation and implementation of quality indicators as an assessment and comparison tool") . In addition, we would like to thank to projects GEMA (SBPLY/17/180501/000293) -Generation and Evaluation of Models for Data Quality, and ADAGIO (SBPLY/21/180501/000061) - Alarcos Data Governance framework and systems generation, both funded by the Department of Education, Culture and Sports of the JCCM and FEDER; and to AETHER-UCLM: A smart data holistic approach for context -aware data analytics focused on Quality and Security project (Ministerio de Ciencia e Innovacion, PID2020- 112540RB-C42) . CSS thanks the Universitat Politecnica de Valencia contract no. UPV-SUB.2-1302 and FONDO SUPERA COVID-19 by CRUE- Santander Bank grant "Severity Subgroup Discovery and Classification on COVID-19 Real World Data through Machine Learning and Data Quality assessment (SUBCOVERWD-19) ."Souza, J.; Caballero, I.; Vasco Santos, J.; Lobo, M.; Pinto, A.; Viana, J.; Sáez Silvestre, C.... (2022). Multisource and temporal variability in Portuguese hospital administrative datasets: Data quality implications. Journal of Biomedical Informatics. 136:1-11. https://doi.org/10.1016/j.jbi.2022.10424211113

RiuNet

COST 733 - WG4: Applications of weather type classification

Author: Bardossy A.
Bertalanic R.
Bogucka M.
Cahyanova M.
Caian M.
Casado M.J.
Demuzere M.
Fleig A.
Frei C.
Georgescu F.
Godlowska J.
Kassomenos P.
Latinov L.
Pastor M.A.
Pianko-Kluczynska K.
Pongracz R.
Prudhomme C.
Schiemann R.
Sepp M.
Stefan S.
Tallaksen L
Tomaszewska A.M.
Ustrnul Z.
Publication venue
Publication date: 01/01/2008
Field of study

The main objective of the COST Action 733 is to achieve a general numerical method for assessing, comparing and classifying typical weather situations in the European regions. To accomplish this goal, different workgroups are established, each with their specific aims: WG1: Existing methods and applications (finished); WG2: Implementation and development of weather types classification methods; WG3: Comparison of selected weather types classifications; WG4: Testing methods for various applications. The main task of Workgroup 4 (WG4) in COST 733 implies the testing of the selected weather type methods for various classifications. In more detail, WG4 focuses on the following topics:• Selection of dedicated applications (using results from WG1), • Performance of the selected applications using available weather types provided by WG2, • Intercomparison of the application results as a results of different methods • Final assessment of the results and uncertainties, • Presentation and release of results to the other WGs and external interested • Recommend specifications for a new (common) method WG2 Introduction In order to address these specific aims, various applications are selected and WG4 is divided in subgroups accordingly: 1.Air quality 2. Hydrology (& Climatological mapping) 3. Forest fires 4. Climate change and variability 5. Risks and hazards Simultaneously, the special attention is paid to the several wide topics concerning some other COST Actions such as: phenology (COST725), biometeorology (COST730), agriculture (COST 734) and mesoscale modelling and air pollution (COST728). Sub-groups are established to find advantages and disadvantages of different classification methods for different applications. Focus is given to data requirements, spatial and temporal scale, domain area, specifi

NERC Open Research Archive