4 research outputs found
Mining User-Generated Repair Instructions from Automotive Web Communities
The objective of this research was to automatically extract user-generated repair instructions from large amounts of web data. An artifact has been created that classifies a web post as containing a repair instruction or not. Methods from Natural Language Processing are used to transform the unstructured textual information from a web post into a set of numerical features that can be further processed by different Machine Learning Algorithms. The main contribution of this research lies in the design and prototypical implementation of these features. The evaluation shows that the created artifact can accurately distinguish posts containing repair instructions from other posts e.g. containing problem reports. With such a solution, a company can save a lot of time and money that was previously necessary to perform this classification task manually
Étudier des structures de discours : préoccupations pratiques et méthodologiques
National audienceThis paper deals with problems related to discourse analysis within the framework of corpus linguistics, through a linguistic study dealing with procedurality in discourse. The fact that the study does not concern a specific lexical item makes it difficult to collect data without any predefined idea, in other words without introducing a bias in the study. The paper proposes a method to solve these problems, involving several annotators on the same texts and merging their proposals in order to get an objective unified annotation. We show that this step is a real part of the overall linguistic analysis.Cet article porte sur des problèmes d'analyse en corpus de structures discursives, en partant de l'exemple de la procéduralité. Quand l'objet d'étude ne porte pas sur une forme particulière, il est difficile de recueillir les données à analyser sans idée préconçue, c'est-à-dire sans biaiser a priori les résultats. L'article propose une méthode permettant de résoudre en partie ces problèmes, en partant d'une annotation à plusieurs mains qui est progressivement unifiée afin d'obtenir un résultat objectif. Nous montrons que cette étape fait pleinement partie de l'étude linguistique elle-même
Recommended from our members
Adapting the Naive Bayes classifier to rank procedural texts
This paper presents a machine-learning approach for ranking web documents according to the proportion of procedural text they contain. By 'pro-cedural text' we refer to ordered lists of steps, which are very common in some instructional genres such as online manuals. Our initial training corpus is built up by applying some simple heuristics to select documents from a large collection and contains only a few documents with a large proportion of procedural texts. We adapt the Naive Bayes classifier to better fit this less than ideal training corpus. This adapted model is compared with several other classifiers in ranking procedural texts using different sets of features and is shown to perform well when only highly distinctive features are used