380,112 research outputs found

    Classification in Geographical Information Systems

    Full text link

    Modeling Taxi Drivers' Behaviour for the Next Destination Prediction

    Full text link
    In this paper, we study how to model taxi drivers' behaviour and geographical information for an interesting and challenging task: the next destination prediction in a taxi journey. Predicting the next location is a well studied problem in human mobility, which finds several applications in real-world scenarios, from optimizing the efficiency of electronic dispatching systems to predicting and reducing the traffic jam. This task is normally modeled as a multiclass classification problem, where the goal is to select, among a set of already known locations, the next taxi destination. We present a Recurrent Neural Network (RNN) approach that models the taxi drivers' behaviour and encodes the semantics of visited locations by using geographical information from Location-Based Social Networks (LBSNs). In particular, RNNs are trained to predict the exact coordinates of the next destination, overcoming the problem of producing, in output, a limited set of locations, seen during the training phase. The proposed approach was tested on the ECML/PKDD Discovery Challenge 2015 dataset - based on the city of Porto -, obtaining better results with respect to the competition winner, whilst using less information, and on Manhattan and San Francisco datasets.Comment: preprint version of a paper submitted to IEEE Transactions on Intelligent Transportation System

    Application of Text Summarization techniques to the Geographical Information Retrieval task

    Get PDF
    Automatic Text Summarization has been shown to be useful for Natural Language Processing tasks such as Question Answering or Text Classification and other related fields of computer science such as Information Retrieval. Since Geographical Information Retrieval can be considered as an extension of the Information Retrieval field, the generation of summaries could be integrated into these systems by acting as an intermediate stage, with the purpose of reducing the document length. In this manner, the access time for information searching will be improved, while at the same time relevant documents will be also retrieved. Therefore, in this paper we propose the generation of two types of summaries (generic and geographical) applying several compression rates in order to evaluate their effectiveness in the Geographical Information Retrieval task. The evaluation has been carried out using GeoCLEF as evaluation framework and following an Information Retrieval perspective without considering the geo-reranking phase commonly used in these systems. Although single-document summarization has not performed well in general, the slight improvements obtained for some types of the proposed summaries, particularly for those based on geographical information, made us believe that the integration of Text Summarization with Geographical Information Retrieval may be beneficial, and consequently, the experimental set-up developed in this research work serves as a basis for further investigations in this field.This work has been partially funded by the European Commission under the Seventh (FP7-2007-2013) Framework Programme for Research and Technological Development through the FIRST project (FP7-287607). It has also been partially supported by a grant from the Fondo Europeo de Desarrollo Regional (FEDER), projects TEXT-MESS 2.0 (TIN2009-13391-C04-01) and TEXT-COOL 2.0 (TIN2009-13391-C04-02) from the Spanish Government, a Grant from the Valencian Government, project "Desarrollo de Técnicas Inteligentes e Interactivas de Minería de Textos" (PROMETEO/2009/119), and a Grant No. ACOMP/2011/001

    The impact of location of the uptake of telephone based healthcare

    Get PDF
    Telephone healthcare systems have been put forward as a key strategy to overcome geographical disadvantage, however, evidence has suggested that usage decreases with increasing rurality. This research aimed to identify geographical high and low areas of usage of NHS Direct, a leading telephone healthcare provider worldwide to determine if usage is influenced by rurality. National call data was collected (January, 2011) from the NHS Direct Clinical Assessment System for all 0845 4647 calls in England, UK (N=360,137). Data extracted for analysis included; unit postcode of patient, type of call, date of call, time of call and final disposition. Calls were mapped using GIS mapping software using full postcode, aggregated by population estimate by local authority to determine confidence intervals across two thresholds by call rate. Uptake rate Output Area Classification (OAC) group profiles was performed using the chi-square goodness of fit. The majority of calls were ‘symptomatic’ (N=280,055; 74.8%) i.e. calls that were triaged by an expert nurse, with the remaining 25.2% of calls health/ medicine information only (N=94,430). NHS Direct were able to manage through self-care advice and health information 43.5 of all calls made (N=99,367) with no onward referral needed. Geographical pattern of calls were highest for more urbanised areas with significant higher call usage found in larger cities. Lower observed usage was found in areas that are more rural of which were characterised by above average older populations. This was supported by geo-segmentation, which highlighted that rural and older communities had the lowest expected uptake rate. There is a variation of usage of NHS Direct relating to rurality, which suggests that this type of service has not been successful in reducing accessible barriers. However, geographical variations are likely to be influenced by age. There is a need for exploratory to determine the underlying factors that contribute to variation in uptake of these services particularly older people who reside in rural communities. This will have worldwide implications as to how telephone based healthcare is introduced

    Newsmap: semi-supervised approach to geographical news classification

    Get PDF
    This paper presents the results of an evaluation of three different types to geographical news classification methods: (1) simple keyword matching, a popular method in media and communications research; (2) geographical information extraction systems equipped with named-entity recognition and place name disambiguation mechanisms (Open Calais and Geoparser.io); (3) semi-supervised machine learning classifier developed by the author (Newsmap). Newsmap substitutes manual coding of news stories with dictionarybased labelling in creation of large training sets to extracts large number of geographical words without human involvement, and it also identifies multi-word names to reduce the ambiguity of the geographical traits fully automatically. The evaluation of classification accuracy of the three types of methods against 5,000 human-coded news summaries reveals that Newsmap outperforms the geographical information extraction systems in overall accuracy, while the simple keyword matching suffers from ambiguity of place names in countries with ambiguous place names

    Analytical modelling of positional and thematic uncertainties in the integration of remote sensing and geographical information systems

    Get PDF
    This paper describes three aspects of uncertainty in geographical information systems (GIS) and remote sensing. First, the positional uncertainty of an area object in a GIS is discussed as a function of positional uncertainties of line segments and boundary line features. Second, the thematic uncertainty of a classified remote sensing image is described using the probability vectors from a maximum likelihood classification. Third, the 'S-band' model is used to quantify uncertainties after combining GIS and remote sensing data.Department of Land Surveying and Geo-Informatic

    A Fuzzy Entropy-Based Thematic Classification Method Aimed at Improving the Reliability of Thematic Maps in GIS Environments

    Get PDF
    Thematic maps of spatial data are constructed by using standard thematic classification methods that do not allow management of the uncertainty of classification and, consequently, eval uation of the reliability of the resulting thematic map. We propose a novel fuzzy-based thematic classification method applied to construct thematic maps in Geographical Information Systems. An initial fuzzy partition of the domain of the features of the spatial dataset is constructed using triangular fuzzy numbers; our method finds an optimal fuzzy partition evaluating the fuzziness of the fuzzy sets by using a fuzzy entropy measure. An assessment of the reliability of the final thematic map is performed according to the fuzziness of the fuzzy sets. We implement our method on a GIS framework, testing it on various vector and image spatial datasets. The results of these tests confirm that our thematic classification method provide thematic maps with a higher reliability with respect to that obtained through fuzzy partitions constructed by expert users

    Decision Tree Classification of Spatial Data Streams Using Peano Trees of classification

    Get PDF
    Many organizations have large quantities of spatial data collected in various application areas, including remote sensing, geographical information systems (GIS), astronomy, computer cartography, environmental assessment and planning, etc.  These data collections are growing rapidly and can therefore be considered as spatial data streams.  For data stream classification, time is a major issue.  However, these spatial data sets are too large to be classified effectively in a reasonable amount of time using existing methods.  In this paper, we developed a new method for decision tree classification on spatial data streams using a data structure called Peano Count Tree (P-tree).  The Peano Count Tree is a spatial data organization that provides a lossless compressed representation of a spatial data set and facilitates efficient classification and other data mining techniques.  Using P-tree structure, fast calculation of measurements, such as information gain, can be achieved.  We compare P-tree based decision tree induction classification and a classical decision tree induction method with respect to the speed at which the classifier can be built (and rebuilt when substantial amounts of new data arrive).  Experimental results show that the P-tree method is significantly faster than existing classification methods, making it the preferred method for mining on spatial data streams

    Classification of Aquifers

    Get PDF
    This dissertation contains three papers describing an approach to classifying aquifers and groundwater systems. The three papers bring together the development of a basin scale groundwater classification system that integrates the literature, data gathering, and data analysis and testing. The classification system is a comprehensive method designed to improve interdisciplinary communication and standardize how groundwater systems are compared in watersheds across in the west and potentially beyond. Aquifers and groundwater systems can be classified using a variety of independent methods to characterize geologic and hydraulic properties, the degree of connection with surface water, and geochemical conditions. In light of a growing global demand for water associated with population growth, land development, and the expected effects of climate change, a standardized approach for classifying groundwater systems at the watershed scale is needed. To this end, a comprehensive classification system is developed that combines recognized methods and new approaches into one system. The purpose of this approach is to provide groundwater professionals, policy makers, and watershed managers with a widely applicable classification system that reduces sometimes cumbersome complex groundwater databases and analyses to straightforward graphical representations. The proposed classification system uses basin geology, aquifer productivity, threats and impacts posed by humans, water quality, and the degree of groundwater/surface water exchange as classification criteria. The approach is based on literature values, reference databases, and basic hydrologic and hydrogeologic principles. The proposed classification system treats data set completeness as a variable and includes a tiered assessment protocol that depends on the quality and quantity of data. In addition, it assembles and catalogs groundwater information using a consistent set of nomenclature. It is designed to analyze and display results using Geographical Information System (GIS) mapping tools, while standardizing descriptions of groundwater conditions and to support resource managers as they make land use decisions at the watershed scale. Together, the three papers describe a method for comparing and contrasting aquifer properties and systems needed by watershed managers. It is argued that the proposed methodology is needed to assist managers and planner in understanding the role of aquifers in watersheds as well as for the broad multi-basin comparison of aquifer data . The classification method does not replace current standard practices traditionally used to assess or characterize aquifers and groundwater systems. However, it does provide a standard methodology by which existing and new hydrogeologic data can be organized, easily communicated, and broadly compared on a watershed scale of 1:100,000 to 1:250,000. It is believed this classification system will promote an improved technical understanding between groundwater professionals and natural resource managers. Three appendices are included in this dissertation. The appendices provide supporting information for the three papers and results for four case studies

    An approach to cork oak forest management planning: a case study in southwestern Portugal

    Get PDF
    This paper presents results of research aiming at the development of tools that may enhance cork oak (Quercus suber L.) forest management planning. Specifically, it proposes an hierarchical approach that encompasses the spatial classification of a cork oak forest and the temporal scheduling of cork harvests. The use of both geographical information systems and operations research techniques is addressed. Emphasis is on the achievement of cork even flow objectives. Results from an application to a case study in the Charneca Plioce´nica of Ribatejo in southern Portugal encompassing a cork oak forest extending over 4.8 thousand ha are discussed. They suggest that the proposed approach is capable of effective spatial classification of cork oak management units. They further suggest that it may be used to select optimal cork even flow scheduling strategies. Results also show that the proposed approach may lead to a substantial increase in net present value when compared to traditional approaches to cork oak forest management planning
    corecore