6 research outputs found

    A Methodology for Identifying Terms and Patterns Specific to Requirements as a Textual Genre Using Automated Tools

    Get PDF
    International audienceAs a step in a project whose final goal is to propose a Controlled Natural Language for requirements writing at CNES (Centre National d'Études Spatiales), we intend to build the grammar of the textual genre of the requirements. One of the main issues faced when analyzing our corpus is the (sometimes subtle) difference between the terms and syntactic structures pertaining to the genre and those linked to the domain (in our case, the development of space systems) – a difference that is generally not taken into account by automated tools. In this paper, we present a methodology aimed at detecting candidate terms and textual patterns specific to the genre by combining results obtained from a terminology extractor and a data mining tool with a validated resource in use for indexing documents at CNES. The results are then illustrated by a selection of examples from our corpus

    Analyse d'un corpus d'exigences pour améliorer la rédaction des spécifications de systèmes spatiaux au CNES

    Get PDF
    International audienceL'objectif de notre travail est d'augmenter la clarté et la précision des spécifications techniques rédigées par les ingénieurs du CNES (Centre National d'Études Spatiales) préalablement à la réalisation de systèmes spatiaux. L'importance des spécifications (et en particulier des exigences qui les composent) pour la réussite des projets de grande envergure est en effet désormais très largement reconnue ; de même, les principaux risques liés à l'utilisation de la langue naturelle (ambiguïté, flou, incomplétude) sont relativement bien identifiés. Dans ce contexte, nous nous efforçons de mettre au point une solution qui soit réellement adoptée par les ingénieurs du CNES (qui ne sont actuellement pas tenus de suivre des règles de rédaction) : celle-ci se doit donc d'être à la fois efficace (autrement dit, elle doit limiter sensiblement le risque langagier) et aisée à mettre en place (autrement dit, elle ne doit pas bouleverser trop profondément leurs habitudes de travail, ce qui la rendrait contre-productive). Une langue contrôlée, c'est-à-dire un ensemble de règles linguistiques portant sur le vocabulaire, la syntaxe et la sémantique, nous paraît être une réponse idéale à ce double besoin – pour autant qu'elle reste suffisamment proche de la langue naturelle (et particulièrement de l'usage qui en est fait lors de la rédaction des exigences). Or, les langues contrôlées pour la rédaction technique que nous avons envisagées ne nous semblent pas toujours pertinentes d'un point de vue linguistique. Nous voudrions donc définir notre propre langue contrôlée pour la rédaction des exigences en français au CNES. L'originalité de notre démarche consiste à supposer l'existence d'un sous-langage et à systématiquement vérifier nos hypothèses sur des corpus d'exigences authentiques à l'aide de techniques et d'outils de traitement automatique du langage

    Atribuição de autoria em micro-mensagens

    Get PDF
    Orientadores: Ariadne Maria Brito Rizzoni Carvalho, Anderson de Rezende RochaDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Matemática Estatística e Computação CientíficaResumo: Com o crescimento continuo do uso de midias sociais, a atribuição de autoria tem um papel imortante na prevenção dos crimes cibernéticos e na análise de rastros online deixados por assediadores, \textit{bullies}, ladrões de identidade entre outros. Nesta dissertação, nós propusemos um método para atribuição de autoria que é de cem a mil vezes mais rápido que o estado da arte. Nós também obtivemos uma acurácia 65\% na classificação de 50 autores. O método proposto se baseia numa representação de caracteristicas escalável utilizando os padrões das mensagens dos micro-blogs, e também nos utilizamos de um classificador de padrões customizado para lidar com grandes quantidades de dados e alta dimensionalidade. Por fim, nós discutimos a redução do espaço de busca na análise de centenas de suspeitos online e milões de micro mensagens online, o que torna essa abordagem valiosa para forense digital e aplicação das leisAbstract: With the ever-growing use of social media, authorship attribution plays an important role in avoiding cybercrime, and helping the analysis of online trails left behind by cyber pranks, stalkers, bullies, identity thieves and alike. In this dissertation, we propose a method for authorship attribution in micro blogs with efficiency one hundred to a thousand times faster than state-of-the-art counterparts. We also achieved a accuracy of 65% when classifying texts from 50 authors. The method relies on a powerful and scalable feature representation approach taking advantage of user patterns on micro-blog messages, and also on a custom-tailored pattern classifier adapted to deal with big data and high-dimensional data. Finally, we discuss search space reduction when analysing hundreds of online suspects and millions of online micro messages, which makes this approach invaluable for digital forensics and law enforcementMestradoCiência da ComputaçãoMestre em Ciência da Computaçã

    Towards the creation of a CNL adapted to requirements writing by combining writing recommendations and spontaneous regularities : example in a Space Project

    Get PDF
    International audienceThe Quality Department of the French National Space Agency (CNES, Centre National d’Études Spatiales) wishes to design a writing guide based on the real and regular writing of requirements. As a first step in this project, the present article proposes a linguistic analysis of requirements written in French by CNES engineers. One of our goals is to determine to what extent they conform to several rules laid down in two existing Controlled Natural Languages (CNLs), namely the Simplified Technical English developed by the AeroSpace and Defense Industries Association of Europe and the Guide for Writing Requirements proposed by the International Council on Systems Engineering. Indeed, although CNES engineers are not obliged to follow any controlled language in their writing of requirements, we believe that language regularities are likely to emerge from this task, mainly due to the writers’ experience. We are seeking to identify these regularities in order to use them as a basis for a new CNL for the writing of requirements. The issue is approached using natural language processing tools to identify sentences that do not comply with the rules or contain specific linguistic phenomena. We further review these sentences to understand why the recommendations cannot (or should not) always be applied when specifying large-scale projects

    An Attempt to Use Weighted Cusums to Identify Sublanguages

    No full text
    This paper explores the use of weighted cusums, a technique found in authorship attribution studies, for the purpose of identifying sublanguages. The technique, and its relation to standard cusums (cumulative sum charts) is first described, and the formulae for calculations given in detail. The technique compares texts by testing for the incidence of linguistic 'features' of a superficial nature, e.g. proportion of 2- and 3-letter words, words beginning with a vowel, andso on, and measures whether two texts differ significantly in respect of these features. The paper describes an experiment in which 14 groups of three texts each representing different sublanguages are compared with each other using the technique. The texts are first compared within each group to establish that the technique can identify the groups as being homogeneous. The texts are then compared with each other, and the results analysed. Taking the average of seven different tests, the technique is able to distinguish the sublanguages in only 43% of the ease. But if the best score is taken, 79% of pairings can be distinguished. This is a better result, and the test seems able to quantify the difference between sublanguages

    The anonymous 1821 translation of Goethe's Faust :a cluster analytic approach

    Get PDF
    PhD ThesisThis study tests the hypothesis proposed by Frederick Burwick and James McKusick in 2007 that Samuel Taylor Coleridge was the author of the anonymous translation of Goethe's Faust published by Thomas Boosey in 1821. The approach to hypothesis testing is stylometric. Specifically, function word usage is selected as the stylometric criterion, and 80 function words are used to define a 73-dimensional function word frequency profile vector for each text in the corpus of Coleridge's literary works and for a selection of works by a range of contemporary English authors. Each profile vector is a point in 80- dimensional vector space, and cluster analytic methods are used to determine the distribution of profile vectors in the space. If the hypothesis being tested is valid, then the profile for the 1821 translation should be closer in the space to works known to be by Coleridge than to works by the other authors. The cluster analytic results show, however, that this is not the case, and the conclusion is that the Burwick and McKusick hypothesis is falsified relative to the stylometric criterion and analytic methodology used
    corecore