19,355 research outputs found

    Considering Currency in Decision Trees in the Context of Big Data

    Get PDF
    In the current age of big data, decision trees are one of the most commonly applied data mining methods. However, for reliable results they require up-to-date input data, which is not always given in reality. We present a two-phase approach based on probability theory for considering currency of stored data in decision trees. Our approach is efficient and thus suitable for big data applications. Moreover, it is independent of the particular decision tree classifier. Finally, it is context-specific since the decision tree structure and supplemental data are taken into account. We demonstrate the benefits of the novel approach by applying it to three datasets. The results show a substantial increase in the classification success rate as opposed to not considering currency. Thus, applying our approach prevents wrong classification and consequently wrong decisions

    An evolutionary model to mine high expected utility patterns from uncertain databases

    Get PDF
    In recent decades, mobile or the Internet of Thing (IoT) devices are dramatically increasing in many domains and applications. Thus, a massive amount of data is generated and produced. Those collected data contain a large amount of interesting information (i.e., interestingness, weight, frequency, or uncertainty), and most of the existing and generic algorithms in pattern mining only consider the single object and precise data to discover the required information. Meanwhile, since the collected information is huge, and it is necessary to discover meaningful and up-to-date information in a limit and particular time. In this paper, we consider both utility and uncertainty as the majority objects to efficiently mine the interesting high expected utility patterns (HEUPs) in a limit time based on the multi-objective evolutionary framework. The benefits of the designed model (called MOEA-HEUPM) can discover the valuable HEUPs without pre-defined threshold values (i.e., minimum utility and minimum uncertainty) in the uncertain environment. Two encoding methodologies are also considered in the developed MOEA-HEUPM to show its effectiveness. Based on the developed MOEA-HEUPM model, the set of non-dominated HEUPs can be discovered in a limit time for decision-making. Experiments are then conducted to show the effectiveness and efficiency of the designed MOEA-HEUPM model in terms of convergence, hypervolume and number of the discovered patterns compared to the generic approaches.acceptedVersio

    A systematic review of data quality issues in knowledge discovery tasks

    Get PDF
    Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafío mas fundamental es la exploración de los grandes volúmenes de datos y la extracción de conocimiento útil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisión sistemática de los asuntos de calidad de datos en las áreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrícola conocida como la roya del café.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust

    Application of the Real Options in Engineering Design and Decision Making: Focus on Mine Design and Planning at Operational Level

    Get PDF
    Flexibility and adaptability are essential for long-term corporate success, and real options (RO) is the preferred tool for analysis. This research argues that uncertainty is a source of value as the opportunities that it presents can be leveraged by having a flexible system. As a contribution to knowledge, a relationship between the beta and flexibility index was derived, RO identification framework for mine operational decision-making was proposed and predictive data analytics was utilised to create managerial flexibility

    Sequence Classification Based on Delta-Free Sequential Pattern

    Get PDF
    International audienceSequential pattern mining is one of the most studied and challenging tasks in data mining. However, the extension of well-known methods from many other classical patterns to sequences is not a trivial task. In this paper we study the notion of δ-freeness for sequences. While this notion has extensively been discussed for itemsets, this work is the first to extend it to sequences. We define an efficient algorithm devoted to the extraction of δ-free sequential patterns. Furthermore, we show the advantage of the δ-free sequences and highlight their importance when building sequence classifiers, and we show how they can be used to address the feature selection problem in statistical classifiers, as well as to build symbolic classifiers which optimizes both accuracy and earliness of predictions

    Post-drought decline of the Amazon carbon sink

    Get PDF
    Amazon forests have experienced frequent and severe droughts in the past two decades. However, little is known about the large-scale legacy of droughts on carbon stocks and dynamics of forests. Using systematic sampling of forest structure measured by LiDAR waveforms from 2003 to 2008, here we show a significant loss of carbon over the entire Amazon basin at a rate of 0.3 ± 0.2 (95% CI) PgC yr−1 after the 2005 mega-drought, which continued persistently over the next 3 years (2005–2008). The changes in forest structure, captured by average LiDAR forest height and converted to above ground biomass carbon density, show an average loss of 2.35 ± 1.80 MgC ha−1 a year after (2006) in the epicenter of the drought. With more frequent droughts expected in future, forests of Amazon may lose their role as a robust sink of carbon, leading to a significant positive climate feedback and exacerbating warming trends.The research was partially supported by NASA Terrestrial Ecology grant at the Jet Propulsion Laboratory, California Institute of Technology and partial funding to the UCLA Institute of Environment and Sustainability from previous National Aeronautics and Space Administration and National Science Foundation grants. The authors thank NSIDC, BYU, USGS, and NASA Land Processes Distributed Active Archive Center (LP DAAC) for making their data available. (NASA Terrestrial Ecology grant at the Jet Propulsion Laboratory, California Institute of Technology)Published versio
    • …
    corecore