916 research outputs found

    Tree-Mining: Understanding Applications and Challenges

    Get PDF
    Tree-mining is an essential system of techniques and software technologies for multi-level and multi-angled operations in databases. Pertaining to the purview of this manuscript, several applications of various sub-techniques of tree mining have been explored. The current write-up is aimed at investigating the major applications and challenges of different types and techniques of tree mining, as there have been patchy and scanty investigations so far in this context. To accomplish these tasks, the author has reviewed some of the latest and most pertinent research articles of the last two decades to investigate the titled aspects of this technique

    A Predictive Model Of Avian Influenza Among Poultry In Egypt

    Get PDF
    Background: Avian Influenza (H5N1) has become entrenched in Egypt since its emergence in 2006. Control measures have failed and surveillance systems remain inadequate. A relatively new method for regression called Random Forests is presented here with the goal of providing accurate and timely predictions of the weekly number of outbreaks in each of the Egyptian governorates. Methods: Predictions were generated from Random Forests models using outbreak data from the FAO EMPRES-i database, and local weather data from Weather Underground. This data was lagged by one and two weeks in order to make prospective predictions with the current week\u27s data in the future. Model performance was assessed using a variety of methods. Results: The percent of the variance in observed outbreaks explained by the model in each of the governorates varied greatly, ranging between 20 and 60 percent in governorates with high and medium outbreak activity. The models typically predicted poorly in governorates with low activity. Linear regression of the observed outbreaks on the predicted values provided evidence that while outbreaks were consistently underpredicted across all governorates, predictions in some models tracked observed outbreaks quite accurately. Discussion: The varying levels of model performance in each of the governorates raises many questions about why this is. While we cannot deduce these reasons from the models themselves, public health officials can use the lessons learned here as a guide to focus future research to better understand what is occurring. Predictive models can be used to evaluate local surveillance systems, and find additional covariates for the model to determine the spatio-temporal risk of avian influenza. As a result of better surveillance data and more complete models, control and prevention measures may be more effectively put in place where and when they are needed most

    Human dynamics in the age of big data: a theory-data-driven approach

    Get PDF
    The revolution of information and communication technology (ICT) in the past two decades have transformed the world and people’s lives with the ways that knowledge is produced. With the advancements in location-aware technologies, a large volume of data so-called “big data” is now available through various sources to explore the world. This dissertation examines the potential use of such data in understanding human dynamics by focusing on both theory- and data-driven approaches. Specifically, human dynamics represented by communication and activities is linked to geographic concepts of space and place through social media data to set a research platform for effective use of social media as an information system. Three case studies covering these conceptual linkages are presented to (1) identify communication patterns on social media; (2) identify spatial patterns of activities in urban areas and detect events; and (3) explore urban mobility patterns. The first case study examines the use of and communication dynamics on Twitter during Hurricane Sandy utilizing survey and data analytics techniques. Twitter was identified as a valuable source of disaster-related information. Additionally, the results shed lights on the most significant information that can be derived from Twitter during disasters and the need for establishing bi-directional communications during such events to achieve an effective communication. The second case study examines the potential of Twitter in identifying activities and events and exploring movements during Hurricane Sandy utilizing both time-geographic information and qualitative social media text data. The study provides insights for enhancing situational awareness during natural disasters. The third case study examines the potential of Twitter in modeling commuting trip distribution in New York City. By integrating both traditional and social media data and utilizing machine learning techniques, the study identified Twitter as a valuable source for transportation modeling. Despite the limitations of social media such as the accuracy issue, there is tremendous opportunity for geographers to enrich their understanding of human dynamics in the world. However, we will need new research frameworks, which integrate geographic concepts with information systems theories to theorize the process. Furthermore, integrating various data sources is the key to future research and will need new computational approaches. Addressing these computational challenges, therefore, will be a crucial step to extend the frontier of big data knowledge from a geographic perspective. KEYWORDS: Big data, social media, Twitter, human dynamics, VGI, natural disasters, Hurricane Sandy, transportation modeling, machine learning, situational awareness, NYC, GI

    Using random forests to diagnose aviation turbulence.

    Get PDF
    mospheric turbulence poses a significant hazard to aviation, with severe encounters costing airlines millions of dollars per year in compensation, aircraft damage, and delays due to required post-event inspections and repairs. Moreover, attempts to avoid turbulent airspace cause flight delays and en route deviations that increase air traffic controller workload, disrupt schedules of air crews and passengers and use extra fuel. For these reasons, the Federal Aviation Administration and the National Aeronautics and Space Administration have funded the development of automated turbulence detection, diagnosis and forecasting products. This paper describes a methodology for fusing data from diverse sources and producing a real-time diagnosis of turbulence associated with thunderstorms, a significant cause of weather delays and turbulence encounters that is not well-addressed by current turbulence forecasts. The data fusion algorithm is trained using a retrospective dataset that includes objective turbulence reports from commercial aircraft and collocated predictor data. It is evaluated on an independent test set using several performance metrics including receiver operating characteristic curves, which are used for FAA turbulence product evaluations prior to their deployment. A prototype implementation fuses data from Doppler radar, geostationary satellites, a lightning detection network and a numerical weather prediction model to produce deterministic and probabilistic turbulence assessments suitable for use by air traffic managers, dispatchers and pilots. The algorithm is scheduled to be operationally implemented at the National Weather Service's Aviation Weather Center in 2014. Document type: Articl

    Data mining as a tool for environmental scientists

    Get PDF
    Over recent years a huge library of data mining algorithms has been developed to tackle a variety of problems in fields such as medical imaging and network traffic analysis. Many of these techniques are far more flexible than more classical modelling approaches and could be usefully applied to data-rich environmental problems. Certain techniques such as Artificial Neural Networks, Clustering, Case-Based Reasoning and more recently Bayesian Decision Networks have found application in environmental modelling while other methods, for example classification and association rule extraction, have not yet been taken up on any wide scale. We propose that these and other data mining techniques could be usefully applied to difficult problems in the field. This paper introduces several data mining concepts and briefly discusses their application to environmental modelling, where data may be sparse, incomplete, or heterogenous

    Rainfall in the urban area and its impact on climatology and population growth

    Get PDF
    Due to the scarcity of studies linking the variability of rainfall and population growth in the capital cities of Northeastern Brazil (NEB), the purpose of this study is to evaluate the variability and multiscale interaction (annual and seasonal), and in addition, to detect their trends and the impact of urban growth. For this, monthly rainfall data between 1960 and 2020 were used. In addition, the detection of rainfall trends on annual and seasonal scales was performed using the Mann–Kendall (MK) test and compared with the phases of El Niño-Southern Oscillation (ENSO) and Pacific Decadal Oscillation (PDO). The relationship between population growth data and rainfall data for different decades was established. Results indicate that the variability of multiscale urban rainfall is directly associated with the ENSO and PDO phases, followed by the performance of rain-producing meteorological systems in the NEB. In addition, the anthropic influence is shown in the relational pattern between population growth and the variability of decennial rainfall in the capitals of the NEB. However, no capital showed a significant trend of increasing annual rainfall (as in the case of Aracaju, Maceió, and Salvador). The observed population increase in the last decades in the capitals of the NEB and the notable decreasing trend of rainfall could compromise the region’s water security. Moreover, if there is no strategic planning about water bodies, these changes in the rainfall pattern could be compromising
    corecore