1,668 research outputs found
Recommended from our members
Aerodynamic roughness variation with vegetation: analysis in a suburban neighbourhood and a city park
Local aerodynamic roughness parameters (zero-plane displacement, zd, and aerodynamic roughness length, z0) are determined for an urban park and a suburban neighbourhood with a new morphometric parameterisation that includes vegetation. Inter-seasonal analysis at the urban park demonstrates zd determined with two anemometric methods is responsive to vegetation state and is 1 – 4 m greater during leaf-on periods. The seasonal change and directional variability in the magnitude of zd is reproduced by the morphometric methods, which also indicate z0 can be more than halved during leaf-on periods. In the suburban neighbourhood during leaf-on, the anemometric and morphometric methods have similar directional variability for both zd and z0. Wind speeds at approximately 3 times the average roughness-element height are estimated most accurately when using a morphometric method which considers roughness-element height variability. Inclusion of vegetation in the morphometric parameterisation improves wind speed estimation in all cases. Results indicate that the influence of both vegetation and roughness-element height variability are important for accurate determination of local aerodynamic parameters and the associated wind-speed estimation
Context based detection of urban land use zones
This dissertation proposes an automated land-use zoning system based on the context of an urban scene. Automated zoning is an important step toward improving object extraction in an urban scene
Incorporating plant community structure in species distribution modelling: a species co-occurrence based composite approach
Species distribution models (SDM) with remotely sensed (RS) imagery is widely used in ecological studies and conservation planning, and the performance is frequently limited by factors including small plant size, small numbers of observations, and scattered distribution patterns. The focus of my thesis was to develop and evaluate alternative SDM methodologies to deal with such challenges. I used a record of nine endemic species occurrences from the Athabasca Sand Dunes in northern Saskatchewan to assess five different modelling algorithms including modern regression and machine learning techniques to understand how species distribution characteristics influence model prediction accuracies. All modelling algorithms showed robust performance (>0.5 AUC), with the best performance in most cases from generalized linear models (GLM). The threshold selection for presence-absence analysis highlights that actively selecting the optimum level is the best approach compared to the standard high threshold approach as with the latter there is a potential to deliver inconsistent predictions compared to observed patterns of occurrence frequency. The development of the composite-SDM framework used small-scale plant occurrence and UAV imagery from Kernen Prairie, a remnant Fescue prairie in Saskatoon, Saskatchewan. The evaluation of the effectiveness of five algorithms clearly showed that each method was capable of handling a wide range of low to high-frequency species with strong GLM performance irrespective of the species distribution pattern. It is critical to highlight that, although GLM is computationally efficient, the method does not compromise accuracy for simplicity. The inclusion of plant community structure using image clustering methods found similar accuracy patterns indicating limited advantages of using high-resolution images. The study found for high-frequency species that prediction accuracy declines to be as low as the accuracy expected for low-frequency species. Higher prediction confidence was often observed with low-frequency species when the species occurred in a distinct habitat that was visually and spectrally distinct from the surroundings. Such a pattern is in contrast to species widespread in different grassland habitats where distinct spectral signatures were lacking. The study has substantial evidence to state that the optimal algorithmic performance is tied to a balanced number of presences and absences in the data. The co-occurrence analysis also revealed significant co-occurrence patterns are most common at moderate levels of species occurrence frequencies. The research does not indicate any consistent accuracy changes between baseline direct reflectance models and composite-SDM framework. Although accuracy changes were marginal with the composite-SDM framework, the method is well capable of influencing associated type 1 and type 2 error rates of the classification
Incorporating plant community structure in species distribution modelling: a species co-occurrence based composite approach
Species distribution models (SDM) with remotely sensed (RS) imagery is widely used in ecological studies and conservation planning, and the performance is frequently limited by factors including small plant size, small numbers of observations, and scattered distribution patterns. The focus of my thesis was to develop and evaluate alternative SDM methodologies to deal with such challenges. I used a record of nine endemic species occurrences from the Athabasca Sand Dunes in northern Saskatchewan to assess five different modelling algorithms including modern regression and machine learning techniques to understand how species distribution characteristics influence model prediction accuracies. All modelling algorithms showed robust performance (>0.5 AUC), with the best performance in most cases from generalized linear models (GLM). The threshold selection for presence-absence analysis highlights that actively selecting the optimum level is the best approach compared to the standard high threshold approach as with the latter there is a potential to deliver inconsistent predictions compared to observed patterns of occurrence frequency. The development of the composite-SDM framework used small-scale plant occurrence and UAV imagery from Kernen Prairie, a remnant Fescue prairie in Saskatoon, Saskatchewan. The evaluation of the effectiveness of five algorithms clearly showed that each method was capable of handling a wide range of low to high-frequency species with strong GLM performance irrespective of the species distribution pattern. It is critical to highlight that, although GLM is computationally efficient, the method does not compromise accuracy for simplicity. The inclusion of plant community structure using image clustering methods found similar accuracy patterns indicating limited advantages of using high-resolution images. The study found for high-frequency species that prediction accuracy declines to be as low as the accuracy expected for low-frequency species. Higher prediction confidence was often observed with low-frequency species when the species occurred in a distinct habitat that was visually and spectrally distinct from the surroundings. Such a pattern is in contrast to species widespread in different grassland habitats where distinct spectral signatures were lacking. The study has substantial evidence to state that the optimal algorithmic performance is tied to a balanced number of presences and absences in the data. The co-occurrence analysis also revealed significant co-occurrence patterns are most common at moderate levels of species occurrence frequencies. The research does not indicate any consistent accuracy changes between baseline direct reflectance models and composite-SDM framework. Although accuracy changes were marginal with the composite-SDM framework, the method is well capable of influencing associated type 1 and type 2 error rates of the classification
Variational methods and its applications to computer vision
Many computer vision applications such as image segmentation can be formulated in a ''variational'' way as energy minimization problems. Unfortunately, the computational task of minimizing these energies is usually difficult as it generally involves non convex functions in a space with thousands of dimensions and often the associated combinatorial problems are NP-hard to solve. Furthermore, they are ill-posed inverse problems and therefore are extremely sensitive to perturbations (e.g. noise). For this reason in order to compute a physically reliable approximation from given noisy data, it is necessary to incorporate into the mathematical model appropriate regularizations that require complex computations.
The main aim of this work is to describe variational segmentation methods that are particularly effective for curvilinear structures. Due to their complex geometry, classical regularization techniques cannot be adopted because they lead to the loss of most of low contrasted details. In contrast, the proposed method not only better preserves curvilinear structures, but also reconnects some parts that may have been disconnected by noise. Moreover, it can be easily extensible to graphs and successfully applied to different types of data such as medical imagery (i.e. vessels, hearth coronaries etc), material samples (i.e. concrete) and satellite signals (i.e. streets, rivers etc.). In particular, we will show results and performances about an implementation targeting new generation of High Performance Computing (HPC) architectures where different types of coprocessors cooperate. The involved dataset consists of approximately 200 images of cracks, captured in three different tunnels by a robotic machine designed for the European ROBO-SPECT project.Open Acces
A geographic knowledge discovery approach to property valuation
This thesis involves an investigation of how knowledge discovery can be applied in the area Geographic Information Science. In particular, its application in the area of
property valuation in order to reveal how different spatial entities and their interactions affect the price of the properties is explored. This approach is entirely
data driven and does not require previous knowledge of the area applied.
To demonstrate this process, a prototype system has been designed and implemented. It employs association rule mining and associative classification algorithms to uncover any existing inter-relationships and perform the valuation. Various algorithms that perform the above tasks have been proposed in the literature. The algorithm developed in this work is based on the Apriori algorithm. It has been
however, extended with an implementation of a ‘Best Rule’ classification scheme based on the Classification Based on Associations (CBA) algorithm.
For the modelling of geographic relationships a graph-theoretic approach has been employed. Graphs have been widely used as modelling tools within the geography
domain, primarily for the investigation of network-type systems. In the current context, the graph reflects topological and metric relationships between the spatial
entities depicting general spatial arrangements. An efficient graph search algorithm has been developed, based on the Djikstra shortest path algorithm that enables the
investigation of relationships between spatial entities beyond first degree connectivity.
A case study with data from three central London boroughs has been performed to validate the methodology and algorithms, and demonstrate its effectiveness for computer aided property valuation. In addition, through the case study, the influence of location in the value of properties in those boroughs has been examined. The results are encouraging as they demonstrate the effectiveness of the proposed methodology and algorithms, provided that the data is appropriately pre processed and is of high quality
Exploring the topical structure of short text through probability models : from tasks to fundamentals
Recent technological advances have radically changed the way we communicate. Today’s
communication has become ubiquitous and it has fostered the need for information that is easier to create, spread and consume. As a consequence, we have experienced the shortening of text messages in mediums ranging from electronic mailing, instant messaging to microblogging. Moreover, the ubiquity and fast-paced nature of these mediums have promoted their use for unthinkable tasks. For instance, reporting real-world events was classically carried out by news reporters, but, nowadays, most interesting events are first disclosed on social networks like Twitter by eyewitness through short text messages. As a result, the exploitation of the thematic content in short text has captured the interest of both research and industry.
Topic models are a type of probability models that have traditionally been used to explore this thematic content, a.k.a. topics, in regular text. Most popular topic models fall into the sub-class of LVMs (Latent Variable Models), which include several latent variables at the corpus, document and word levels to summarise the topics at each level. However, classical LVM-based topic models struggle to learn semantically meaningful topics in short text because the lack of co-occurring words within a document hampers the estimation of the local latent variables at the document level. To overcome this limitation, pooling and hierarchical Bayesian strategies that leverage on contextual information have been essential to improve the quality of topics in short text.
In this thesis, we study the problem of learning semantically meaningful and predictive representations of text in two distinct phases:
• In the first phase, Part I, we investigate the use of LVM-based topic models for the specific task of event detection in Twitter. In this situation, the use of contextual information to pool tweets together comes naturally. Thus, we first extend an existing clustering algorithm for event detection to use the topics learned from pooled tweets. Then, we propose a probability model that integrates topic modelling and clustering to enable the flow of information between both components.
• In the second phase, Part II and Part III, we challenge the use of local latent variables in LVMs,
specially when the context of short messages is not available. First of all, we study the evaluation of the
generalization capabilities of LVMs like PFA (Poisson Factor Analysis) and propose unbiased estimation methods to approximate it. With the most accurate method, we compare the generalization of chordal models without latent variables to that of PFA topic models in short and regular text collections.
In summary, we demonstrate that by integrating clustering and topic modelling, the performance of event detection techniques in Twitter is improved due to the interaction between both components. Moreover, we develop several unbiased likelihood estimation methods for assessing the generalization of PFA and we empirically validate their accuracy in different document collections. Finally, we show that we can learn chordal models without latent variables in text through Chordalysis, and that they can be a competitive alternative to classical topic models, specially in short text.Els avenços tecnològics han canviat radicalment la forma que ens comuniquem. Avui en dia, la comunicació és ubiqua, la qual cosa fomenta l’ús de informació fà cil de crear, difondre i consumir. Com a resultat, hem experimentat l’escurçament dels missatges de text en diferents medis de comunicació, des del correu electrònic, a la missatgeria instantà nia, al microblogging. A més de la ubiqüitat, la naturalesa accelerada d’aquests medis ha promogut el seu ús per tasques fins ara inimaginables. Per exemple, el relat d’esdeveniments era clà ssicament dut a terme per periodistes a peu de carrer, però, en l’actualitat, el successos més interessants es publiquen directament en xarxes socials com Twitter a través de missatges curts. Conseqüentment, l’explotació de la informació temà tica del text curt ha atret l'interès tant de la recerca com de la indústria. Els models temà tics (o topic models) són un tipus de models de probabilitat que tradicionalment s’han utilitzat per explotar la informació temà tica en documents de text. Els models més populars pertanyen al subgrup de models amb variables latents, els quals incorporen varies variables a nivell de corpus, document i paraula amb la finalitat de descriure el contingut temà tic a cada nivell. Tanmateix, aquests models tenen dificultats per aprendre la semà ntica en documents curts degut a la manca de coocurrència en les paraules d’un mateix document, la qual cosa impedeix una correcta estimació de les variables locals. Per tal de solucionar aquesta limitació, l’agregació de missatges segons el context i l’ús d’estratègies jerà rquiques Bayesianes són essencials per millorar la qualitat dels temes apresos. En aquesta tesi, estudiem en dos fases el problema d’aprenentatge d’estructures semà ntiques i predictives en documents de text: En la primera fase, Part I, investiguem l’ús de models temà tics amb variables latents per la detecció d’esdeveniments a Twitter. En aquest escenari, l’ús del context per agregar tweets sorgeix de forma natural. Per això, primer estenem un algorisme de clustering per detectar esdeveniments a partir dels temes apresos en els tweets agregats. I seguidament, proposem un nou model de probabilitat que integra el model temà tic i el de clustering per tal que la informació flueixi entre ambdós components.
En la segona fase, Part II i Part III, qüestionem l’ús de variables latents locals en models per a text curt sense context. Primer de tot, estudiem com avaluar la capacitat de generalització d’un model amb variables latents com el PFA (Poisson Factor Analysis) a través del cà lcul de la likelihood. Atès que aquest cà lcul és computacionalment intractable, proposem diferents mètodes d estimació. Amb el mètode més acurat, comparem la generalització de models chordals sense variables latents amb la del models PFA, tant en text curt com està ndard. En resum, demostrem que integrant clustering i models temà tics, el rendiment de les tècniques de detecció d’esdeveniments a Twitter millora degut a la interacció entre ambdós components. A més a més, desenvolupem diferents mètodes d’estimació per avaluar la capacitat generalizadora dels models PFA i validem empÃricament la seva exactitud en diverses col·leccions de text. Finalment, mostrem que podem aprendre models chordals sense variables latents en text a través de Chordalysis i que aquests models poden ser una bona alternativa als models temà tics clà ssics, especialment en text curt.Postprint (published version
- …