2,640 research outputs found

    A Large-Scale Community Questions Classification Accounting for Category Similarity: An Exploratory?

    Full text link
    The paper reports on a large-scale topical categorization of questions from a Russian community question answering (CQA) service [email protected]. We used a data set containing all the questions (more than 11 millions) asked by [email protected] users in 2012. This is the first study on question categorization dealing with non-English data of this size. The study focuses on adjusting category structure in order to get more robust classification results. We investigate several approaches to measure similarity between categories: the share of identical questions, language models, and user activity. The results show that the proposed approach is promising.14-07-00589; RFBR; Russian Foundation for Basic Research

    Analyzing Cognitive Presence in Online Courses Using an Artificial Neural Network

    Get PDF
    This work outlines the theoretical underpinnings, method, results, and implications for constructing a discussion list analysis tool that categorizes online, educational discussion list messages into levels of cognitive effort. Purpose The purpose of such a tool is to provide evaluative feedback to instructors who facilitate online learning, to researchers studying computer-supported collaborative learning, and to administrators interested in correlating objective measures of students’ cognitive effort with other measures of student success. This work connects computer–supported collaborative learning, content analysis, and artificial intelligence. Method Broadly, the method employed is a content analysis in which the data from the analysis is modeled using artificial neural network (ANN) software. A group of human coders categorized online discussion list messages, and inter-rater reliability was calculated among them. That reliability figure serves as a measuring stick for determining how well the ANN categorizes the same messages that the group of human coders categorized. Reliability between the ANN model and the group of human coders is compared to the reliability among the group of human coders to determine how well the ANN performs compared to humans. Findings Two experiments were conducted in which artificial neural network (ANN) models were constructed to model the decisions of human coders, and the experiments revealed that the ANN, under noisy, real-life circumstances codes messages with near-human accuracy. From experiment one, the reliability between the ANN model and the group of human coders, using Cohen’s kappa, is 0.519 while the human reliability values range from 0.494 to 0.742 (M=0.6). Improvements were made to the human content analysis with the goal of improving the reliability among coders. After these improvements were made, the humans coded messages with a kappa agreement ranging from 0.816 to 0.879 (M=0.848), and the kappa agreement between the ANN model and the group of human coders is 0.70

    Case studies of academic writing in the sciences: a focus on the development of writing skills

    Get PDF
    The aim of the present thesis is to make a longitudinal study of changes affecting sentence-initial elements in articles published over time by a sample of researchers in international journals of physics. The linguistic framework adopted for such a study is a systematic-functional one. The general research methodology is established around two main axes, one linguistic, and the other statistical. To conduct a longitudinal survey focusing on thematic changes, it was necessary on the one hand to set up clear and unambiguous linguistic categories to capture these changes and, on the other, to present and interpret the findings in manageable and reliable ways with the assistance of statistics. A pilot study was initially set up to explore possible changes in two articles published within a two year interval by the American Physical Society. The articles were the first and the last of a series of five articles written by the same researcher on the same problem in physics. The method of analysis of the texts used a formulation of Theme that included Subject as an obligatory component, and Contextual Frame - i.e. pre-Subject elements - as an optional one. The analysis, using taxonomies proposed by Davies (1988, 1997) and Gosden (1993, 1996), suggested differences in thematic elements, especially regarding a certain type of complex Subject. On the basis of coding difficulties and the findings of the pilot study, taxonomies were modified to include in particular new Conventional and Instantial classes for Subject and Contextual Frame. Conventional wordings, both in Subject and in Contextual Frame position, are identified as being expressions which are readily available to novice writers of articles, because they are commonly used terms in the fields of research concerned. In contrast Instantial wordings are identified as being expressions which have been especially contrived by the writer to fit a given stretch of discourse. As writers develop and make their own the matter with which they are working; they become increasingly capable of crafting these more complex workings which involve multiple strands of meaning. In the case of this latter class, particular reference is made to post-modification and clause-type elements which allow meanings to be combined in specific ways

    Automatic maintenance of category hierarchy

    Get PDF
    Category hierarchy is an abstraction mechanism for efficiently managing large-scale resources. In an open environment, a category hierarchy will inevitably become inappropriate for managing resources that constantly change with unpredictable pattern. An inappropriate category hierarchy will mislead the management of resources. The increasing dynamicity and scale of online resources increase the requirement of automatically maintaining category hierarchy. Previous studies about category hierarchy mainly focus on either the generation of category hierarchy or the classification of resources under a pre-defined category hierarchy. The automatic maintenance of category hierarchy has been neglected. Making abstraction among categories and measuring the similarity between categories are two basic behaviours to generate a category hierarchy. Humans are good at making abstraction but limited in ability to calculate the similarities between large-scale resources. Computing models are good at calculating the similarities between large-scale resources but limited in ability to make abstraction. To take both advantages of human view and computing ability, this paper proposes a two-phase approach to automatically maintaining category hierarchy within two scales by detecting the internal pattern change of categories. The global phase clusters resources to generate a reference category hierarchy and gets similarity between categories to detect inappropriate categories in the initial category hierarchy. The accuracy of the clustering approaches in generating category hierarchy determines the rationality of the global maintenance. The local phase detects topical changes and then adjusts inappropriate categories with three local operations. The global phase can quickly target inappropriate categories top-down and carry out cross-branch adjustment, which can also accelerate the local-phase adjustments. The local phase detects and adjusts the local-range inappropriate categories that are not adjusted in the global phase. By incorporating the two complementary phase adjustments, the approach can significantly improve the topical cohesion and accuracy of category hierarchy. A new measure is proposed for evaluating category hierarchy considering not only the balance of the hierarchical structure but also the accuracy of classification. Experiments show that the proposed approach is feasible and effective to adjust inappropriate category hierarchy. The proposed approach can be used to maintain the category hierarchy for managing various resources in dynamic application environment. It also provides an approach to specialize the current online category hierarchy to organize resources with more specific categories

    Structuring Wikipedia Articles with Section Recommendations

    Full text link
    Sections are the building blocks of Wikipedia articles. They enhance readability and can be used as a structured entry point for creating and expanding articles. Structuring a new or already existing Wikipedia article with sections is a hard task for humans, especially for newcomers or less experienced editors, as it requires significant knowledge about how a well-written article looks for each possible topic. Inspired by this need, the present paper defines the problem of section recommendation for Wikipedia articles and proposes several approaches for tackling it. Our systems can help editors by recommending what sections to add to already existing or newly created Wikipedia articles. Our basic paradigm is to generate recommendations by sourcing sections from articles that are similar to the input article. We explore several ways of defining similarity for this purpose (based on topic modeling, collaborative filtering, and Wikipedia's category system). We use both automatic and human evaluation approaches for assessing the performance of our recommendation system, concluding that the category-based approach works best, achieving precision@10 of about 80% in the human evaluation.Comment: SIGIR '18 camera-read
    corecore