4 research outputs found

    A New Estimator of Intrinsic Dimension Based on the Multipoint Morisita Index

    Full text link
    The size of datasets has been increasing rapidly both in terms of number of variables and number of events. As a result, the empty space phenomenon and the curse of dimensionality complicate the extraction of useful information. But, in general, data lie on non-linear manifolds of much lower dimension than that of the spaces in which they are embedded. In many pattern recognition tasks, learning these manifolds is a key issue and it requires the knowledge of their true intrinsic dimension. This paper introduces a new estimator of intrinsic dimension based on the multipoint Morisita index. It is applied to both synthetic and real datasets of varying complexities and comparisons with other existing estimators are carried out. The proposed estimator turns out to be fairly robust to sample size and noise, unaffected by edge effects, able to handle large datasets and computationally efficient

    AMIC:An Adaptive Information Theoretic Method to Identify Multi-Scale Temporal Correlations in Big Time Series Data

    Get PDF

    Analysis Of Large Scale Climate Data: How Well Climate Change Models And Data From Real Sensor Networks Agree?

    No full text
    Research on global warming and climate changes has attracted a huge attention of the scientific community and of the media in general, mainly due to the social and economic impacts they pose over the entire planet. Climate change simulation models have been developed and improved to provide reliable data, which are employed to forecast effects of increasing emissions of greenhouse gases on a future global climate. The data generated by each model simulation amount to Terabytes of data, and demand fast and scalable methods to process them. In this context, we propose a new process of analysis aimed at discriminating between the temporal behavior of the data generated by climate models and the real climate observations gathered from groundbased meteorological station networks. Our approach combines fractal data analysis and the monitoring of real and model-generated data streams to detect deviations on the intrinsic correlation among the time series defined by different climate variables. Our measurements were made using series from a regional climate model and the corresponding real data from a network of sensors from meteorological stations existing in the analyzed region. The results show that our approach can correctly discriminate the data either as real or as simulated, even when statistical tests fail. Those results suggest that there is still room for improvement of the state-of-the-art climate change models, and that the fractalbased concepts may contribute for their improvement, besides being a fast, parallelizable, and scalable approach.517526Comite Gestor da Internet no Brazil (CGI.BR),Nucleo de Informatcao e Coordenacao do Ponto BR (NIC.BR),BR PETROBRAS,Banco do Brasil,MicrosoftAhlgren, P., Jarneving, B., Rousseau, R., Requirements for a cocitation similarity measure, with special reference to pearson's correlation coefficient (2003) Journal of the American Society for Information Science and Technology, 54 (6), pp. 550-560Alves, L.M., Marengo, J.A., Assessment of regional seasonal predictability using the PRECIS regional climate modeling system over south america (2010) Theoretical and Applied Climatology, 100, pp. 337-350Ambrizzi, T.E.A., Cenarios regionalizados de clima no brasil para o seculo xxi: Projecoes de clima usando tres modelos regionais: Relatorio 3 (2007) Technical Report, MMA, , BrasiliaAssad, E.D., Pinto, H.S., Zullo, J.J., Impacts of global warming in the brazilian agroclimatic risk zoning (2007) A Contribution to Understanding the Regional Impacts of Global Change in South America, pp. 175-182. , Sao Paulo, Brazil, Instituto de Estudos Avancados da USPBaioco, G.B., Traina, A.J.M., Traina, C., Mamcost: Global and local estimates leading to robust cost estimation of similarity queries (2007) SSDBM 2007, pp. 6-16. , Ban, Canada, ACM PressBarbara, D., Chen, P., Fractal mining - self similarity-based clustering and its applications (2010) Data Mining and Knowledge Discovery Handbook, pp. 573-589. , O. Maimon and L. Rokach, editors, SpringerBarbara, D., Chen, P., Using the fractal dimension to cluster datasets (2000) ACM SIGKDD, pp. 260-264. , Boston, MABlack, T., The new nmc mesoscale eta/cptec model: Description and forecast examples (1994) Forecasting, 9, pp. 265-278Bohm, C., A cost model for query processing in high dimensional data spaces (2000) ACM TODS, 25 (2), pp. 129-178Chakrabarti, D., Faloutsos, C., F4: Large-scale automated forecasting using fractals (2002) CIKM, 1, pp. 2-9. , McLean, VA - EUA, ACM PressChou, S.C., Marengo, J.A., Lyra, A.A., Sueiro, G., Pesquero, J.F., Alves, L.M., Kay, G., Tavares, P., Downscaling of south america present climate driven by 4-member hadcm3 runs (2007) Springer - ClimDyn, 25, pp. 33-59Cordeiro, R.L.F., Traina, A.J.M., Faloutsos, C., Traina, C., Finding clusters in subspaces of very large, multi-dimensional datasets (2010) Proceedings of the 26th International Conference on Data Engineering (ICDE 2010), pp. 625-636. , Long Beach, California, USA, IEEECordeiro, R.L.F., Traina, A.J.M., Faloutsos, C., Traina, C., Halite: Fast and scalable multiresolution local-correlation clustering (2013) IEEE Trans. Knowl. Data Eng., 25 (2), pp. 387-401Djuric, Dusan, (1994) Weather Analysis - Chapter I, , Prentice-Hall IncFaloutsos, C., Kamel, I., Beyond uniformity and independence: Analysis of r-trees using the concept of fractal dimension (1994) ACM PODS, pp. 4-13. , Minneapolis, MNFaloutsos, C., Seeger, B., Traina, A.J.M., Traina, C., Spatial join selectivity using power laws (2000) Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD'00), pp. 177-188. , Dallas, USA, MayField, C.B., Barros, V., Stocker, T.F., Qin, D., Dokken, D.J., Ebi, K.L., Mastrandrea, M.D., Midgley, P.M., Managing the risks of extreme events and disasters to advance climate change adaptation (2012) A Special Report of Working Groups I and II of the Intergovernmental Panel on Climate Change, , editors. Cambridge University PressForster, P., Ramaswamy, V., Artaxo, P., Berntsen, T., Betts, R., Fahey, D.W., Haywood, J., Dorland, R.V., Changes in atmospheric constituents and in radioactive forcing (2007) Climate Change 2007: The Physical Science Basis, , Contribution of Working Group I to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge University Press, 2007(2007) Climate Change 2007: Fourth Assessment Report (AR4), , Intergovernmental Panel on Climate Change { IPCC. Cambridge University Press, Cambridge, UK(2007) Climate Change 2007: Summary for Policymakers, , Intergovernmental Panel on Climate Change { IPCC. Cambridge Univ. Press., Formally agreed statement of the IPCC concerning key findings and uncertainties contained in the Working Group contributions to the Fourth Assessment Report(2007) Intergovernmental Panel on Climate Change, , http://www.ipcc.ch/ipccreports/index.htm, accessed: March, 2009Legates, D.R., Mccabe, G.J., Evaluating the use of goodness-of-t measures in hydrologic and hydroclimatic model validation (1999) Water Resources Res., 35 (1), pp. 233-241Marengo, J.A., Jones, R., Alves, L.M., Valverde, M.C., Future change of temperature and precipitation extremes in south america as derived from the PRECIS regional climate modeling system (2009) International Journal of Climatology, 29 (15), pp. 2241-2255Moriasi, D.N., Arnold, J.G., Liew, M.W.V., Bingner, R.L., Harmel, R.D., Veith, T.L., Model evaluation guidelines for systematic quanti-cation of accuracy in watershed simulations (2007) Transactions of the ASABE, 50 (3), pp. 885-900(2012) Nasa Research Finds 2010 Tied for Warmest Year on Record, , http://www.nasa.gov/topics/earth/features/2010-warmest-year.html, Online]. Available at:, Day of access: September 1Petersen, (1956) Weather Analysis and ForecastingPinto, H.S., Assad, E.D., (2008) Global Warming and the New Geography of Agricultural Production in Brazil, p. 42. , Brasilia, Brazil, The British EmbassyPinto, H.S., Assad, E.D., Impacts of climate change on brazilian agriculture (2012) Brazil: Assessment of the Vulnerability and Impacts of Climate Change on Brazilian Agriculture, , Development report for World Bank Project P118037Schroeder, M., (1991) Fractals, , Chaos, Power Laws. W. H. Freeman and CompanySousa, E.P.M., Traina, C., Traina, A.J.M., Faloutsos, C., Measuring evolving data streams' behavior through their intrinsic dimension (2007) New Generation Computing Journal, 25, pp. 33-59Sousa, E.P.M., Traina, C., Traina, A.J.M., Wu, L., Faloutsos, C., A fast and effective method to find correlations among attributes in databases (2007) DMKD, 14 (3), pp. 367-407(2008) Understanding and Responding to Climate Change: Highlights of National Academies Reports, , The National Academies. The National AcademiesTraina, C., Sousa, E.P.M., Traina, A.J.M., Using fractals in data mining (2005) New Generation of Data Mining Applications, 1, pp. 599-630. , In M. M. Kantardzic and J. Zurada, editors, (Chapter 24). Wiley/IEEE PressTraina, C., Traina, A.J.M., Wu, L., Faloutsos, C., Fast feature selection using fractal dimension (2010) Journal of Information and Data Management - JIDM, 1 (1), pp. 3-16Traina, C., Traina, A.J.M., Wu, L., Faloutsos, C., Fast feature selection using fractal dimension - ten years later (2010) Journal of Information and Data Management - JIDM, 1 (1), pp. 17-20Willmott, C.J., On the validation of models (1981) Physical Geography, 2, pp. 184-194Willmott, C.J., (1984) On the Evaluation of Model Performance in Physical Geography, , Gaile and Willmott, eds. NorwellWillmott, C.J., Davis, R., Feddema, J., Klink, K., Legates, D., Rowe, C., Ackleson, S., O'Donnell, J., Statistics for the evaluation and comparison of models (1985) Journal of Geophysical Research, 90, pp. 8995-9005. , SeptWillmott, C.J., Robeson, S.M., Matsuura, K., A refined index of model performance (2012) International Journal of Climatology, 32 (13), pp. 2088-209
    corecore