    Expectations in Incremental Discourse Processing

    The way in which discourse features express connections back to the previous discourse has been described in the literature in terms of adjoining at the right frontier of discourse structure. But this does not allow for discourse features that express expectations about what is to come in the subsequent discourse. After characterizing these expectations and their distribution in text, we show how an approach that makes use of substitution as well as adjoining on a suitably defined right frontier, can be used to both process expectations and constrain discouse processing in general.Comment: 9 pages, uses aclap.sty, psfig.te

    Mining Social Media to Extract Structured Knowledge through Semantic Roles

    Semantics is a well-kept secret in texts, accessible only to humans. Artificial Intelligence struggles to enrich machines with human-like features, therefore accessing this treasure and sharing it with computers is one of the main challenges that the computational linguistics domain faces nowadays. In order to teach computers to understand humans, language models need to be specified and created from human knowledge. While still far from completely decoding hidden messages in political speeches, computer scientists and linguists have joined efforts in making the language easier to be understood by machines. This paper aims to introduce the VoxPopuli platform, an instrument to collect user generated content, to analyze it and to generate a map of semantically-related concepts by capturing crowd intelligence

    Pathogen Variability. A Genomic Signal Approach

    The conversion of genomic symbolic sequences into digital signals has been applied for the analysis pathogen variability. Results are given on the variability of Human Immunodeficiency Virus, type 1, subtype F, isolated in Romania, and of the type A avian influenza virus H5N1, for which sequences have been downloaded from GenBank [1]. Nucleotide sequence analysis is corroborated with techniques based on the genomic signal approach to detect pathogen resistance to antiretroviral treatment. In the case of protease (PR) inhibitors, it is found that the treatment induces single nucleotide polimorphisms (SNPs) in specific sites. For moderate resistance, the changes affect the PR enzyme only at the level of the protein, whereas for multiple drug resistance, the RNA gene secondary structure also changes

    Public Discourse Semantics. A Method of Anticipating Economic Crisis

    This paper provides a proof that anticipation of an economic crisis by analysing public discourses (in particular, speeches on economic issues) is feasible. It proposes a method of text classification and semantic interpretation based on natural language processing techniques that could be used to trace, over a period of time, the print press discourses, with the aim to valuate the perspective of occurrence of crises. Classification is the task of assigning tags (words, expressions) to the texts that make up a corpus. In our case, we were interested to identify among the texts under scrutiny those belonging to classes like financial, economic, nationalism, etc. This approach is sustained by the fact that public discourses can be characterized from a rhetorical perspective, depending on the specific strategies their authors have chosen: orientation to change opinions or to determine action, ratio between rational (logos) and emotional (pathos), etc. We are sugesting an automatic analysis of the content of the public language, by using quantitative measures. Our purpose was to develop a computational tool able to offer to researchers in the economic, social or political sciences, but, not less, to the public at large, the possibility to measure the acuity of different accents of a written public discourse (financial, emotional, etc.), as mean to anticipate the threat of financial waves. Such a tool could help the processes of decision making in the analysis of crisis. Although our analysis used as data the journalistic and economic environments of Romania, it could easily be extrapolated to other languages/countries

    Linguistic Resources and Technologies for Romanian Language

    This paper revises notions related to Language Resources and Technologies (LRT), including a brief overview of some resources developed worldwide and with a special focus on Romanian language. It then describes a joined Romanian, Moldavian, English initiative aimed at developing electronically coded resources for Romanian language, tools for their maintenance and usage, as well as for the creation of applications based on these resources

    Consumer behavior in the economy

    This paper aims to carry out an analysis of the disparities between urban and rural economic environment in Romania, which have negative effects over time for the entire country. The key imbalances between the two areas will be identified and a chronological analysis of figures obtained from the two areas over the years will be presented. A comparative analysis of urban / rural consumption behavior of households in the period 2005-2012 is showcased. The rural population is still heavily dependent on agriculture, while consumption characteristics are specific to relatively poor populations. Reducing disparities between urban and rural incomes and improvement of the quality of household consumption are priorities in regional development policy

    Physical simulation of muscles and bones

    This paper presents a model used to animate a 3D facial mesh. The animation is entirely based on a physical system, thus no artistic skills are required to animate a face through morphing or rigging. The facial expressions are the result of the physical interaction between the different components of the system. The focus is on creating structures to accurately represent the muscles and the bones. The physical engine is essentially a massspring system, but it also supports some other types of structures, namely pressure cells and rigid bodies

    CoRoLa Starts Blooming – An update on the Reference Corpus of Contemporary Romanian Language

    This article reports on the on-going CoRoLa project, aiming at creating a reference corpus of contemporary Romanian (from 1945 onwards), opened for online free exploitation by researchers in linguistics and language processing, teachers of Romanian, students. We invest serious efforts in persuading large publishing houses and other owners of IPR on relevant language data to join us and contribute the project with selections of their text and speech repositories. The CoRoLa project is coordinated by two Computer Science institutes of the Romanian Academy, but enjoys cooperation of and consulting from professional linguists from other institutes of the Romanian Academy. We foresee a written component of the corpus of more than 500 million word forms, and a speech component of about 300 hours of recordings. The entire collection of texts (covering all functional styles of the language) will be pre-processed and annotated at several levels, and also documented with standardized metadata. The pre-processing includes cleaning the data and harmonising the diacritics, sentence splitting and tokenization. Annotation will include morpho-lexical tagging and lemmatization in the first stage, followed by syntactic, semantic and discourse annotation in a later stage