80,073 research outputs found

    Some Salient Issues in the Unsupervised Learning of Igbo Morphology

    Get PDF
    The issue of automatic learning of the morphology of natural language is an important topic in computational linguistics. This owes to the fact that morphology is foundational to the study of linguistics. In addition, the emerging information society demands the application of Information and Communication Technologies (ICT) to languages in ways that demand human-like analysis of language and this depends to a large extent on the ability to undertake computational analysis of morphology. Even though rule-based and supervised learning approaches to the modeling of morphology have been found to be productive, they have also been discovered to be costly, cumbersome and sucseptible to human errors. Contrarily, unsupervised learning methods do not require the expensive human intervention but as in everything statistical, they demand large volumes of linguistic data. This poses a challenge to resource scarce languages such as Igbo. Furthermore, being a highly agglutinative language, Igbo features certain morphological processes that may not be easily accommodated by most of the frequency-driven unsupervised learning models available. this paper takes a critical look at some of the identified challenges of inducing Igbo morphology as a first step in devising methods by which they can be addressed

    Foreground and background text in retrieval

    Get PDF
    Our hypothesis is that certain clauses have foreground functions in text, while other clauses have background functions and that these functions are expressed or reflected in the syntactic structure of the clause. Presumably these clauses will have differing utility for automatic approaches to text understanding; a summarization system might want to utilize background clauses to capture commonalities between numbers of documents while an indexing system might use foreground clauses in order to capture specific characteristics of a certain document

    TopicViz: Semantic Navigation of Document Collections

    Full text link
    When people explore and manage information, they think in terms of topics and themes. However, the software that supports information exploration sees text at only the surface level. In this paper we show how topic modeling -- a technique for identifying latent themes across large collections of documents -- can support semantic exploration. We present TopicViz, an interactive environment for information exploration. TopicViz combines traditional search and citation-graph functionality with a range of novel interactive visualizations, centered around a force-directed layout that links documents to the latent themes discovered by the topic model. We describe several use scenarios in which TopicViz supports rapid sensemaking on large document collections

    Simple assessment of spatio-temporal evolution of salt marshes ecological services

    Get PDF
    A number of previous research studies have addressed the enormous role played by biodiversity and ecosystems in human well-being and have placed particular emphasis on the consequences of the reduction or loss of these services. A handful of studies have implemented practical methodologies to quantify the variability of limiting factors leading to reductions in these ecological services. The aim of this article is to document the limited number of studies that have analyzed coastal ecosystem services and acknowledge the impacts of physical changes in habitat provision. In one example, it is clear that the maintenance of salt marshes depends on sedimentary supply and consequent morphological variability in spite of the fact that there is usually no recurrent integration of habitat time-space dynamics (sediment availability) during the quantification and monetization of marsh services (i.e., monetary valuation of salt marsh services). This means that one key challenge facing the analysis of salt marsh (or other ecosystem) services in a global climate context is to predict future value, based on past trends, while at the same time guaranteeing conservation. Research in this field has been very broad and so the use of long-term evolutionary datasets is proposed here to explain future habitat provision. An empirical approximation is also presented here that accounts for service provision and enables time-space analysis. Although improvements will be required, the equation presented here represents a key first step to enable managers to cope with the constraints of resource limitations and is also applicable to other habitats.PTDC/MAR-EST/1031/2014info:eu-repo/semantics/publishedVersio

    Producing power-law distributions and damping word frequencies with two-stage language models

    Get PDF
    Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statisticalmodels that can generically produce power laws, breaking generativemodels into two stages. The first stage, the generator, can be any standard probabilistic model, while the second stage, the adaptor, transforms the word frequencies of this model to provide a closer match to natural language. We show that two commonly used Bayesian models, the Dirichlet-multinomial model and the Dirichlet process, can be viewed as special cases of our framework. We discuss two stochastic processes-the Chinese restaurant process and its two-parameter generalization based on the Pitman-Yor process-that can be used as adaptors in our framework to produce power-law distributions over word frequencies. We show that these adaptors justify common estimation procedures based on logarithmic or inverse-power transformations of empirical frequencies. In addition, taking the Pitman-Yor Chinese restaurant process as an adaptor justifies the appearance of type frequencies in formal analyses of natural language and improves the performance of a model for unsupervised learning of morphology.48 page(s

    Evaluating the Differences of Gridding Techniques for Digital Elevation Models Generation and Their Influence on the Modeling of Stony Debris Flows Routing: A Case Study From Rovina di Cancia Basin (North-Eastern Italian Alps)

    Get PDF
    Debris \ufb02ows are among the most hazardous phenomena in mountain areas. To cope with debris \ufb02ow hazard, it is common to delineate the risk-prone areas through routing models. The most important input to debris \ufb02ow routing models are the topographic data, usually in the form of Digital Elevation Models (DEMs). The quality of DEMs depends on the accuracy, density, and spatial distribution of the sampled points; on the characteristics of the surface; and on the applied gridding methodology. Therefore, the choice of the interpolation method affects the realistic representation of the channel and fan morphology, and thus potentially the debris \ufb02ow routing modeling outcomes. In this paper, we initially investigate the performance of common interpolation methods (i.e., linear triangulation, natural neighbor, nearest neighbor, Inverse Distance to a Power, ANUDEM, Radial Basis Functions, and ordinary kriging) in building DEMs with the complex topography of a debris \ufb02ow channel located in the Venetian Dolomites (North-eastern Italian Alps), by using small footprint full- waveform Light Detection And Ranging (LiDAR) data. The investigation is carried out through a combination of statistical analysis of vertical accuracy, algorithm robustness, and spatial clustering of vertical errors, and multi-criteria shape reliability assessment. After that, we examine the in\ufb02uence of the tested interpolation algorithms on the performance of a Geographic Information System (GIS)-based cell model for simulating stony debris \ufb02ows routing. In detail, we investigate both the correlation between the DEMs heights uncertainty resulting from the gridding procedure and that on the corresponding simulated erosion/deposition depths, both the effect of interpolation algorithms on simulated areas, erosion and deposition volumes, solid-liquid discharges, and channel morphology after the event. The comparison among the tested interpolation methods highlights that the ANUDEM and ordinary kriging algorithms are not suitable for building DEMs with complex topography. Conversely, the linear triangulation, the natural neighbor algorithm, and the thin-plate spline plus tension and completely regularized spline functions ensure the best trade-off among accuracy and shape reliability. Anyway, the evaluation of the effects of gridding techniques on debris \ufb02ow routing modeling reveals that the choice of the interpolation algorithm does not signi\ufb01cantly affect the model outcomes

    Natural Language Processing at the School of Information Studies for Africa

    Get PDF
    The lack of persons trained in computational linguistic methods is a severe obstacle to making the Internet and computers accessible to people all over the world in their own languages. The paper discusses the experiences of designing and teaching an introductory course in Natural Language Processing to graduate computer science students at Addis Ababa University, Ethiopia, in order to initiate the education of computational linguists in the Horn of Africa region
    corecore