19 research outputs found
The Detection of Forest Structures in the Monongahela National Forest Using LiDAR
The mapping of structural elements of a forest is important for forestry management to provide a baseline for old and new-growth trees while providing height strata for a stand. These activities are important for the overall monitoring process which aids in the understanding of anthropogenic and natural disturbances. Height information recorded for each discrete point is key for the creation of canopy height, canopy surface, and canopy cover models. The aim of this study is to assess if LiDAR can be used to determine forest structures. Small footprint, leaf-off LiDAR data were obtained for the Monongahela National Forest, West Virginia. This dataset was compared to Landsat imagery acquired for the same area. Each dataset endured supervised classifications and object oriented segmentation with random forest classifications. These approaches took into account derived variables such as, percentages of canopy height, canopy cover, stem density, and normalized difference vegetation index, which were converted from the original datasets. Evaluation of the study depicted that the classification of the Landsat data produced results ranging between 31.3 and 50.2%, whilst the LiDAR dataset produced accuracies ranging from 54.7 to 80.1%. The results of this study increase the potential of LiDAR to be used regularly as a forestry management technique and warrant future research
Bayesian nonparametric models for name disambiguation and supervised learning
This thesis presents new Bayesian nonparametric models and approaches for their development,
for the problems of name disambiguation and supervised learning. Bayesian
nonparametric methods form an increasingly popular approach for solving problems
that demand a high amount of model flexibility. However, this field is relatively new,
and there are many areas that need further investigation. Previous work on Bayesian
nonparametrics has neither fully explored the problems of entity disambiguation and
supervised learning nor the advantages of nested hierarchical models. Entity disambiguation
is a widely encountered problem where different references need to be linked
to a real underlying entity. This problem is often unsupervised as there is no previously
known information about the entities. Further to this, effective use of Bayesian
nonparametrics offer a new approach to tackling supervised problems, which are frequently
encountered.
The main original contribution of this thesis is a set of new structured Dirichlet process
mixture models for name disambiguation and supervised learning that can also
have a wide range of applications. These models use techniques from Bayesian statistics,
including hierarchical and nested Dirichlet processes, generalised linear models,
Markov chain Monte Carlo methods and optimisation techniques such as BFGS. The
new models have tangible advantages over existing methods in the field as shown with
experiments on real-world datasets including citation databases and classification and
regression datasets.
I develop the unsupervised author-topic space model for author disambiguation that
uses free-text to perform disambiguation unlike traditional author disambiguation approaches.
The model incorporates a name variant model that is based on a nonparametric
Dirichlet language model. The model handles both novel unseen name variants and
can model the unknown authors of the text of the documents. Through this, the model
can disambiguate authors with no prior knowledge of the number of true authors in the
dataset. In addition, it can do this when the authors have identical names.
I use a model for nesting Dirichlet processes named the hybrid NDP-HDP. This
model allows Dirichlet processes to be clustered together and adds an additional level of
structure to the hierarchical Dirichlet process. I also develop a new hierarchical extension
to the hybrid NDP-HDP. I develop this model into the grouped author-topic model
for the entity disambiguation task. The grouped author-topic model uses clusters to model the co-occurrence of entities in documents, which can be interpreted as research
groups. Since this model does not require entities to be linked to specific words in a
document, it overcomes the problems of some existing author-topic models. The model
incorporates a new method for modelling name variants, so that domain-specific name
variant models can be used.
Lastly, I develop extensions to supervised latent Dirichlet allocation, a type of supervised
topic model. The keyword-supervised LDA model predicts document responses
more accurately by modelling the effect of individual words and their contexts directly.
The supervised HDP model has more model flexibility by using Bayesian nonparametrics
for supervised learning. These models are evaluated on a number of classification
and regression problems, and the results show that they outperform existing supervised
topic modelling approaches. The models can also be extended to use similar information
to the previous models, incorporating additional information such as entities and
document titles to improve prediction
Community data portraiture : perceiving events, people, & ideas within a research community
Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 72-73).As a research community grows, it is becoming increasingly difficult to understand its dynamics, its history, and the varying perspectives with which that history is interpreted and remembered. This thesis focuses on three major components of research communities: events, people, and ideas. Within each of those components exploring how to construct and answer questions to improve connectivity and elucidate relationships for community members. Assuming the artifacts of a community (its publications, projects, etc) model a representation of its nature, we apply a variety of visualization and natural language processing techniques to those artifacts to produce a community data portrait. The goal of said portrait is to provide a compressed representation viable for consumption by a new researcher to learn about the community they are entering, or for a current member to reflect on the community's behavior and help construct future goals. Rather than evaluating a general technique, the tools and methods were developed specifically for the MIT Media Lab community, general principles can then be abstracted from this initial practical application.by Doug Fritz.S.M