5,104 research outputs found

    Producing power-law distributions and damping word frequencies with two-stage language models

    Get PDF
    Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statisticalmodels that can generically produce power laws, breaking generativemodels into two stages. The first stage, the generator, can be any standard probabilistic model, while the second stage, the adaptor, transforms the word frequencies of this model to provide a closer match to natural language. We show that two commonly used Bayesian models, the Dirichlet-multinomial model and the Dirichlet process, can be viewed as special cases of our framework. We discuss two stochastic processes-the Chinese restaurant process and its two-parameter generalization based on the Pitman-Yor process-that can be used as adaptors in our framework to produce power-law distributions over word frequencies. We show that these adaptors justify common estimation procedures based on logarithmic or inverse-power transformations of empirical frequencies. In addition, taking the Pitman-Yor Chinese restaurant process as an adaptor justifies the appearance of type frequencies in formal analyses of natural language and improves the performance of a model for unsupervised learning of morphology.48 page(s

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Elective Identities, (Culture, Identization and Integration)

    Get PDF
    Most of contemporary individual and social identities (constructedwith societal, cultural and technological resources) are radicallyautonomous, nomadic and virtual - i.e. they are de-traditionalized,open to negotiation and not based on a single interpretation of atradition. Identizations can be recycled - elements of formeridentities are being re-used in constructing later ones or identitiesemerging in one context can be implanted in another or hybridised - anation state as a model for socio-political identity is a case inpoint (and so is its recent crisis). Values, political, cultural andsocial identities - elective identities of "nomads of the present",often emerging out of new social movements or informal networks - playan important role in determining choices of information codes, imagesand identities. Theories of clashes of civilizations and offundamentalists versus modernists should be seen against thebackground of increasingly complex and successful attempts at globalgovernance and increasing criticism of the ideologies of status quo.They may testify to the success of globalization instead ofdemonstrating its failure. The rise of religious fundamentalism andthe emergence of network types of organization contribute to furtheracceleration of identization processes. "Girotondi della liberta" inBerlusconi's Italy and radical re-evaluation of cosmopolitanism as afamily of images of representation are cases of emergent identizationswith unclear but potentially critical political implications.clash of civilizations;globalism;processual;recycled and virtual identities;fundamentalism

    Abstract syntax as interlingua: Scaling up the grammatical framework from controlled languages to robust pipelines

    Get PDF
    Syntax is an interlingual representation used in compilers. Grammatical Framework (GF) applies the abstract syntax idea to natural languages. The development of GF started in 1998, first as a tool for controlled language implementations, where it has gained an established position in both academic and commercial projects. GF provides grammar resources for over 40 languages, enabling accurate generation and translation, as well as grammar engineering tools and components for mobile and Web applications. On the research side, the focus in the last ten years has been on scaling up GF to wide-coverage language processing. The concept of abstract syntax offers a unified view on many other approaches: Universal Dependencies, WordNets, FrameNets, Construction Grammars, and Abstract Meaning Representations. This makes it possible for GF to utilize data from the other approaches and to build robust pipelines. In return, GF can contribute to data-driven approaches by methods to transfer resources from one language to others, to augment data by rule-based generation, to check the consistency of hand-annotated corpora, and to pipe analyses into high-precision semantic back ends. This article gives an overview of the use of abstract syntax as interlingua through both established and emerging NLP applications involving GF

    Natural language software registry (second edition)

    Get PDF
    • 

    corecore