1,668 research outputs found

    Context based detection of urban land use zones

    Get PDF
    This dissertation proposes an automated land-use zoning system based on the context of an urban scene. Automated zoning is an important step toward improving object extraction in an urban scene

    Incorporating plant community structure in species distribution modelling: a species co-occurrence based composite approach

    Get PDF
    Species distribution models (SDM) with remotely sensed (RS) imagery is widely used in ecological studies and conservation planning, and the performance is frequently limited by factors including small plant size, small numbers of observations, and scattered distribution patterns. The focus of my thesis was to develop and evaluate alternative SDM methodologies to deal with such challenges. I used a record of nine endemic species occurrences from the Athabasca Sand Dunes in northern Saskatchewan to assess five different modelling algorithms including modern regression and machine learning techniques to understand how species distribution characteristics influence model prediction accuracies. All modelling algorithms showed robust performance (>0.5 AUC), with the best performance in most cases from generalized linear models (GLM). The threshold selection for presence-absence analysis highlights that actively selecting the optimum level is the best approach compared to the standard high threshold approach as with the latter there is a potential to deliver inconsistent predictions compared to observed patterns of occurrence frequency. The development of the composite-SDM framework used small-scale plant occurrence and UAV imagery from Kernen Prairie, a remnant Fescue prairie in Saskatoon, Saskatchewan. The evaluation of the effectiveness of five algorithms clearly showed that each method was capable of handling a wide range of low to high-frequency species with strong GLM performance irrespective of the species distribution pattern. It is critical to highlight that, although GLM is computationally efficient, the method does not compromise accuracy for simplicity. The inclusion of plant community structure using image clustering methods found similar accuracy patterns indicating limited advantages of using high-resolution images. The study found for high-frequency species that prediction accuracy declines to be as low as the accuracy expected for low-frequency species. Higher prediction confidence was often observed with low-frequency species when the species occurred in a distinct habitat that was visually and spectrally distinct from the surroundings. Such a pattern is in contrast to species widespread in different grassland habitats where distinct spectral signatures were lacking. The study has substantial evidence to state that the optimal algorithmic performance is tied to a balanced number of presences and absences in the data. The co-occurrence analysis also revealed significant co-occurrence patterns are most common at moderate levels of species occurrence frequencies. The research does not indicate any consistent accuracy changes between baseline direct reflectance models and composite-SDM framework. Although accuracy changes were marginal with the composite-SDM framework, the method is well capable of influencing associated type 1 and type 2 error rates of the classification

    Incorporating plant community structure in species distribution modelling: a species co-occurrence based composite approach

    Get PDF
    Species distribution models (SDM) with remotely sensed (RS) imagery is widely used in ecological studies and conservation planning, and the performance is frequently limited by factors including small plant size, small numbers of observations, and scattered distribution patterns. The focus of my thesis was to develop and evaluate alternative SDM methodologies to deal with such challenges. I used a record of nine endemic species occurrences from the Athabasca Sand Dunes in northern Saskatchewan to assess five different modelling algorithms including modern regression and machine learning techniques to understand how species distribution characteristics influence model prediction accuracies. All modelling algorithms showed robust performance (>0.5 AUC), with the best performance in most cases from generalized linear models (GLM). The threshold selection for presence-absence analysis highlights that actively selecting the optimum level is the best approach compared to the standard high threshold approach as with the latter there is a potential to deliver inconsistent predictions compared to observed patterns of occurrence frequency. The development of the composite-SDM framework used small-scale plant occurrence and UAV imagery from Kernen Prairie, a remnant Fescue prairie in Saskatoon, Saskatchewan. The evaluation of the effectiveness of five algorithms clearly showed that each method was capable of handling a wide range of low to high-frequency species with strong GLM performance irrespective of the species distribution pattern. It is critical to highlight that, although GLM is computationally efficient, the method does not compromise accuracy for simplicity. The inclusion of plant community structure using image clustering methods found similar accuracy patterns indicating limited advantages of using high-resolution images. The study found for high-frequency species that prediction accuracy declines to be as low as the accuracy expected for low-frequency species. Higher prediction confidence was often observed with low-frequency species when the species occurred in a distinct habitat that was visually and spectrally distinct from the surroundings. Such a pattern is in contrast to species widespread in different grassland habitats where distinct spectral signatures were lacking. The study has substantial evidence to state that the optimal algorithmic performance is tied to a balanced number of presences and absences in the data. The co-occurrence analysis also revealed significant co-occurrence patterns are most common at moderate levels of species occurrence frequencies. The research does not indicate any consistent accuracy changes between baseline direct reflectance models and composite-SDM framework. Although accuracy changes were marginal with the composite-SDM framework, the method is well capable of influencing associated type 1 and type 2 error rates of the classification

    Variational methods and its applications to computer vision

    Get PDF
    Many computer vision applications such as image segmentation can be formulated in a ''variational'' way as energy minimization problems. Unfortunately, the computational task of minimizing these energies is usually difficult as it generally involves non convex functions in a space with thousands of dimensions and often the associated combinatorial problems are NP-hard to solve. Furthermore, they are ill-posed inverse problems and therefore are extremely sensitive to perturbations (e.g. noise). For this reason in order to compute a physically reliable approximation from given noisy data, it is necessary to incorporate into the mathematical model appropriate regularizations that require complex computations. The main aim of this work is to describe variational segmentation methods that are particularly effective for curvilinear structures. Due to their complex geometry, classical regularization techniques cannot be adopted because they lead to the loss of most of low contrasted details. In contrast, the proposed method not only better preserves curvilinear structures, but also reconnects some parts that may have been disconnected by noise. Moreover, it can be easily extensible to graphs and successfully applied to different types of data such as medical imagery (i.e. vessels, hearth coronaries etc), material samples (i.e. concrete) and satellite signals (i.e. streets, rivers etc.). In particular, we will show results and performances about an implementation targeting new generation of High Performance Computing (HPC) architectures where different types of coprocessors cooperate. The involved dataset consists of approximately 200 images of cracks, captured in three different tunnels by a robotic machine designed for the European ROBO-SPECT project.Open Acces

    A geographic knowledge discovery approach to property valuation

    Get PDF
    This thesis involves an investigation of how knowledge discovery can be applied in the area Geographic Information Science. In particular, its application in the area of property valuation in order to reveal how different spatial entities and their interactions affect the price of the properties is explored. This approach is entirely data driven and does not require previous knowledge of the area applied. To demonstrate this process, a prototype system has been designed and implemented. It employs association rule mining and associative classification algorithms to uncover any existing inter-relationships and perform the valuation. Various algorithms that perform the above tasks have been proposed in the literature. The algorithm developed in this work is based on the Apriori algorithm. It has been however, extended with an implementation of a ‘Best Rule’ classification scheme based on the Classification Based on Associations (CBA) algorithm. For the modelling of geographic relationships a graph-theoretic approach has been employed. Graphs have been widely used as modelling tools within the geography domain, primarily for the investigation of network-type systems. In the current context, the graph reflects topological and metric relationships between the spatial entities depicting general spatial arrangements. An efficient graph search algorithm has been developed, based on the Djikstra shortest path algorithm that enables the investigation of relationships between spatial entities beyond first degree connectivity. A case study with data from three central London boroughs has been performed to validate the methodology and algorithms, and demonstrate its effectiveness for computer aided property valuation. In addition, through the case study, the influence of location in the value of properties in those boroughs has been examined. The results are encouraging as they demonstrate the effectiveness of the proposed methodology and algorithms, provided that the data is appropriately pre processed and is of high quality

    Exploring the topical structure of short text through probability models : from tasks to fundamentals

    Get PDF
    Recent technological advances have radically changed the way we communicate. Today’s communication has become ubiquitous and it has fostered the need for information that is easier to create, spread and consume. As a consequence, we have experienced the shortening of text messages in mediums ranging from electronic mailing, instant messaging to microblogging. Moreover, the ubiquity and fast-paced nature of these mediums have promoted their use for unthinkable tasks. For instance, reporting real-world events was classically carried out by news reporters, but, nowadays, most interesting events are first disclosed on social networks like Twitter by eyewitness through short text messages. As a result, the exploitation of the thematic content in short text has captured the interest of both research and industry. Topic models are a type of probability models that have traditionally been used to explore this thematic content, a.k.a. topics, in regular text. Most popular topic models fall into the sub-class of LVMs (Latent Variable Models), which include several latent variables at the corpus, document and word levels to summarise the topics at each level. However, classical LVM-based topic models struggle to learn semantically meaningful topics in short text because the lack of co-occurring words within a document hampers the estimation of the local latent variables at the document level. To overcome this limitation, pooling and hierarchical Bayesian strategies that leverage on contextual information have been essential to improve the quality of topics in short text. In this thesis, we study the problem of learning semantically meaningful and predictive representations of text in two distinct phases: • In the first phase, Part I, we investigate the use of LVM-based topic models for the specific task of event detection in Twitter. In this situation, the use of contextual information to pool tweets together comes naturally. Thus, we first extend an existing clustering algorithm for event detection to use the topics learned from pooled tweets. Then, we propose a probability model that integrates topic modelling and clustering to enable the flow of information between both components. • In the second phase, Part II and Part III, we challenge the use of local latent variables in LVMs, specially when the context of short messages is not available. First of all, we study the evaluation of the generalization capabilities of LVMs like PFA (Poisson Factor Analysis) and propose unbiased estimation methods to approximate it. With the most accurate method, we compare the generalization of chordal models without latent variables to that of PFA topic models in short and regular text collections. In summary, we demonstrate that by integrating clustering and topic modelling, the performance of event detection techniques in Twitter is improved due to the interaction between both components. Moreover, we develop several unbiased likelihood estimation methods for assessing the generalization of PFA and we empirically validate their accuracy in different document collections. Finally, we show that we can learn chordal models without latent variables in text through Chordalysis, and that they can be a competitive alternative to classical topic models, specially in short text.Els avenços tecnològics han canviat radicalment la forma que ens comuniquem. Avui en dia, la comunicació és ubiqua, la qual cosa fomenta l’ús de informació fàcil de crear, difondre i consumir. Com a resultat, hem experimentat l’escurçament dels missatges de text en diferents medis de comunicació, des del correu electrònic, a la missatgeria instantània, al microblogging. A més de la ubiqüitat, la naturalesa accelerada d’aquests medis ha promogut el seu ús per tasques fins ara inimaginables. Per exemple, el relat d’esdeveniments era clàssicament dut a terme per periodistes a peu de carrer, però, en l’actualitat, el successos més interessants es publiquen directament en xarxes socials com Twitter a través de missatges curts. Conseqüentment, l’explotació de la informació temàtica del text curt ha atret l'interès tant de la recerca com de la indústria. Els models temàtics (o topic models) són un tipus de models de probabilitat que tradicionalment s’han utilitzat per explotar la informació temàtica en documents de text. Els models més populars pertanyen al subgrup de models amb variables latents, els quals incorporen varies variables a nivell de corpus, document i paraula amb la finalitat de descriure el contingut temàtic a cada nivell. Tanmateix, aquests models tenen dificultats per aprendre la semàntica en documents curts degut a la manca de coocurrència en les paraules d’un mateix document, la qual cosa impedeix una correcta estimació de les variables locals. Per tal de solucionar aquesta limitació, l’agregació de missatges segons el context i l’ús d’estratègies jeràrquiques Bayesianes són essencials per millorar la qualitat dels temes apresos. En aquesta tesi, estudiem en dos fases el problema d’aprenentatge d’estructures semàntiques i predictives en documents de text: En la primera fase, Part I, investiguem l’ús de models temàtics amb variables latents per la detecció d’esdeveniments a Twitter. En aquest escenari, l’ús del context per agregar tweets sorgeix de forma natural. Per això, primer estenem un algorisme de clustering per detectar esdeveniments a partir dels temes apresos en els tweets agregats. I seguidament, proposem un nou model de probabilitat que integra el model temàtic i el de clustering per tal que la informació flueixi entre ambdós components. En la segona fase, Part II i Part III, qüestionem l’ús de variables latents locals en models per a text curt sense context. Primer de tot, estudiem com avaluar la capacitat de generalització d’un model amb variables latents com el PFA (Poisson Factor Analysis) a través del càlcul de la likelihood. Atès que aquest càlcul és computacionalment intractable, proposem diferents mètodes d estimació. Amb el mètode més acurat, comparem la generalització de models chordals sense variables latents amb la del models PFA, tant en text curt com estàndard. En resum, demostrem que integrant clustering i models temàtics, el rendiment de les tècniques de detecció d’esdeveniments a Twitter millora degut a la interacció entre ambdós components. A més a més, desenvolupem diferents mètodes d’estimació per avaluar la capacitat generalizadora dels models PFA i validem empíricament la seva exactitud en diverses col·leccions de text. Finalment, mostrem que podem aprendre models chordals sense variables latents en text a través de Chordalysis i que aquests models poden ser una bona alternativa als models temàtics clàssics, especialment en text curt.Postprint (published version
    • …
    corecore