2 research outputs found

    Apoio à avaliação da qualidade de dados em eScience : uma abordagem baseada em proveniência

    Get PDF
    Orientador: Claudia Maria Bauzer MedeirosTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Qualidade dos dados é um problema recorrente em todos os domínios da ciência. Os experimentos analisam e manipulam uma grande quantidade de conjuntos de dados gerando novos dados para serem (re) utilizados por outros experimentos. A base para a obtenção de bons resultados científicos está fortemente associada ao grau de qualidade de tais da- dos. No entanto, os dados utilizados nos experimentos são manipulados por uma diversa variedade de usuários, os quais visam interesses diferentes de pesquisa, utilizando seus próprios vocabulários, metodologias de trabalho, modelos, e necessidades de amostragem. Considerando este cenário, um desafio em ciência da computação é oferecer soluções que auxiliem aos cientistas na avaliação da qualidade dos seus dados. Diferentes esforços têm sido propostos abordando a avaliação de qualidade. Alguns trabalhos salientam que os atributos de proveniência dos dados poderiam ser utilizados para avaliar qualidade. No entanto, a maioria destas iniciativas aborda a avaliação de um atributo de qualidade específico, frequentemente focando em valores atômicos de dados. Isto reduz a aplicabilidade destas abordagens. Apesar destes esforços, há uma necessidade de novas soluções que os cientistas possam adotar para avaliar o quão bons seus dados são. Nesta pesquisa de doutorado, apresentamos uma abordagem para lidar com este problema, a qual explora a noção de proveniência de dados. Ao contrário de outras abordagens, nossa proposta combina os atributos de qualidade especificados dentro de um contexto pelos especialistas e os metadados que descrevem a proveniência de um conjunto de dados. As principais contribuições deste trabalho são: (i) a especificação de um framework que aproveita a proveniência dos dados para obter informação de qualidade, (ii) uma metodologia associada a este framework que descreve os procedimentos para apoiar a avaliação da qualidade, (iii) a proposta de dois modelos diferentes de proveniência que possibilitem a captura das informações de proveniência, para cenários fixos e extensíveis, e (iv) a validação dos itens (i) a (iii), com suas discussões via estudos de caso em agricultura e biodiversidadeAbstract: Data quality is a recurrent concern in all scientific domains. Experiments analyze and manipulate several kinds of datasets, and generate data to be (re)used by other experiments. The basis for obtaining good scientific results is highly associated with the degree of quality of such datasets. However, data involved with the experiments are manipulated by a wide range of users, with distinct research interests, using their own vocabularies, work methodologies, models, and sampling needs. Given this scenario, a challenge in computer science is to come up with solutions that help scientists to assess the quality of their data. Different efforts have been proposed addressing the estimation of quality. Some of these efforts outline that data provenance attributes should be used to evaluate quality. However, most of these initiatives address the evaluation of a specific quality attribute, frequently focusing on atomic data values, thereby reducing the applicability of these approaches. Taking this scenario into account, there is a need for new solutions that scientists can adopt to assess how good their data are. In this PhD research, we present an approach to attack this problem based on the notion of data provenance. Unlike other similar approaches, our proposal combines quality attributes specified within a context by specialists and metadata on the provenance of a data set. The main contributions of this work are: (i) the specification of a framework that takes advantage of data provenance to derive quality information; (ii) a methodology associated with this framework that outlines the procedures to support the assessment of quality; (iii) the proposal of two different provenance models to capture provenance information, for fixed and extensible scenarios; and (iv) validation of items (i) through (iii), with their discussion via case studies in agriculture and biodiversityDoutoradoCiência da ComputaçãoDoutora em Ciência da Computaçã

    Data Quality In Agriculture Applications

    No full text
    Data quality is a common concern in a wide range of domains. Since agriculture plays an important role in the Brazilian economy, it is crucial that the data be useful and with a proper level of quality for the decision making process, planning activities, among others. Nevertheless, this requirement is not often taken into account when different systems and databases are modeled. This work presents a review about data quality issues covering some efforts in agriculture and geospatial science to tackle these issues. The goal is to help researchers and practitioners to design better applications. In particular, we focus on the different dimensions of quality and the approaches that are used to measure them.128139Babu, S., Widom, J., Continuous queries over data streams (2001) SIGMOD Rec., 30 (3), pp. 109-120Ballou, D., Wang, R., Pazer, H., Tayi, G.K., Modeling information manufacturing systems to determine information product quality (1998) Manage. Sci., 44, pp. 462-484Barbosa, I., Casanova, M.A., Trust indicator for decisions based on geospatial data (2011) Proc. XII Brazilian Symposium on GeoInformatics, pp. 49-60Blake, R., Mangiameli, P., The effects and interactions of data quality and problem complexity on classification (2011) J. Data and Information Quality, pp. 281-828Bobrowski, M., Marré, M., Yankelevich, D., A homogeneous framework to measure data quality (1999) Proc. IQ, pp. 115-124. , MIT(2012) Center of Advanced Studies in Applied Economics, , http://cepea.esalq.usp.br/pib/, Accessed in June 2012Chapman, A.D., (2005) Principles of Data Quality, , Global Biodiversity Information Facility, CopenhagenChrisman, N.R., The role of quality information in the long-term functioning of a geographic information system (1984) Cartographica, 21 (2-3), pp. 79-87Congalton, R.G., Green, K., (2009) Assessing the Accuracy of Remotely Sensed Data: Principles and Practices, (13). , CRC Press, Boca Raton, FL, 2 edition(2012) Food and Agriculture Organization of the United Nations, , www.fao.org/countrystat, CountrySTAT Accessed on March 2012Dai, C., Lin, D., Bertino, E., Kantarcioglu, M., An approach to evaluate data trustworthiness based on data provenance (2008) Proc. of the 5th VLDB Workshop on Secure Data Management, pp. 82-98. , Berlin, Heidelberg. Springer-Verlag(2008) EFarms, , http://proj.lis.ic.unicamp.br/efarms/, Accessed in June 2012Land quality indicators and their use in sustainable agriculture and rural development (1997) FAO Land and Water Bulletin, , FAO Accessed in January 2012(2012) Food and Agriculture Organization of the United Nations, , http://www.fao.org/, Accessed on March 2012Content standard for digital geospatial metadata FGDC-STD-001-1998 (1998) Technical Report, US Geological Survey, , FGDCGoodchild, M.F., Li, L., Assuring the quality of volunteered geographic information (2012) Spatial Statistics, 1, pp. 110-120Hartig, O., Zhao, J., Using web data provenance for quality assessment (2009) Proc. of the Workshop on Semantic Web and Provenance Management at ISWC(2003) Data Quality Assessment Framework, , http://dsbb.imf.org/, International Monetary Fund Accessed on January 2012(2003) Geographic Information - Metadata, , http://www.iso.org/iso/, 19115 Accessed on January 2012Kyeyago, F.O., Zake, E.M., Mayinza, S., The construction of an international agricultural data quality assessment framework (ADQAF) (2010) The 5th Int. Conf. on Agricultural Statistics (ICAS V)mLee, Y.W., Strong, D.M., Kahn, B.K., Wang, R.Y., AIMQ: A methodology for information quality assessment (2002) Information & Management, 40 (2), pp. 133-146Lunetta, R.S., Lyon, J.G., (2004) Remote Sensing and GIS Accuracy Assessment, , CRC PressMadnick, S., Zhu, H., Improving data quality through effective use of data semantics (2006) Data Knowl. Eng., 59, pp. 460-475Madnick, S.E., Wang, R.Y., Lee, Y.W., Zhu, H., Overview and framework for data and information quality research (2009) J. Data and Information Quality, pp. 121-222Medeiros, C.B., De Alencar, A.C., Data quality and interoperability in GIS (1999) Proc. of GeoInfo, , In portugueseMoraes, R.A., Rocha, J.V., Imagens de coeficiente de qualidade (Quality) e de confiabilidade (Reliability) para seleção de pixels em imagens de NDVI do sensor MODIS para monitoramento da cana-de-açúcar no estado de São Paulo (2011) Proc. of Brazilian Remote Sensing SymposiumNaumann, F., From databases to information systems - Information quality makes the difference (2001) Proc. IQNaumann, F., Rolker, C., Assessment methods for information quality criteria (2000) IQ, pp. 148-162. , MITParssian, A., Managerial decision support with knowledge of accuracy and completeness of the relational aggregate functions (2006) Decis. Support Syst., 42, pp. 1494-1502Pierce, E.M., Assessing data quality with control matrices (2004) Commun. ACM, 47, pp. 82-86Pipino, L.L., Lee, Y.W., Wang, R.Y., Data quality assessment (2002) Commun. ACM, 45, pp. 211-218Prat, N., Madnick, S., Measuring data believability: A provenance approach (2008) Proc. of the 41st Hawaii Int. Conf. on System Sciences, p. 393Redman, T.C., (2001) Data Quality: The Field Guide, , Digital Pr. [u.a.]Scholten, H., Ten Cate, A.J.U., Quality assessment of the simulation modeling process (1999) Comput. Electron. Agric., 22 (2-3), pp. 199-208Shankaranarayanan, G., Cai, Y., Supporting data quality management in decision-making (2006) Decis. Support Syst., 42, pp. 302-317(2009) TIPS 12: Data Quality Standards, , http://www.usaid.gov/policy/evalweb/documents/TIPS-DataQualityStandards. pdf, U.S. Agency for International Development Accessed in January 2012Wang, R.Y., Strong, D.M., Beyond accuracy: What data quality means to data consumers (1996) Journal of Management Information Systems, 12 (4), pp. 5-34Widom, J., Trio: A system for integrated management of data, accuracy, and lineage (2005) Proc. of the 2nd Biennial Conf. on Innovative Data Systems Research (CIDR)Xie, J., Burstein, F., Using machine learning to support resource quality assessment: An adaptive attribute-based approach for health information portals (2011) Proc. of the 16th Int. Conf. on Database Systems for Advanced Application
    corecore