13,719 research outputs found

    Multiple Texts as a Limiting Factor in Online Learning: Quantifying (Dis-)similarities of Knowledge Networks across Languages

    Full text link
    We test the hypothesis that the extent to which one obtains information on a given topic through Wikipedia depends on the language in which it is consulted. Controlling the size factor, we investigate this hypothesis for a number of 25 subject areas. Since Wikipedia is a central part of the web-based information landscape, this indicates a language-related, linguistic bias. The article therefore deals with the question of whether Wikipedia exhibits this kind of linguistic relativity or not. From the perspective of educational science, the article develops a computational model of the information landscape from which multiple texts are drawn as typical input of web-based reading. For this purpose, it develops a hybrid model of intra- and intertextual similarity of different parts of the information landscape and tests this model on the example of 35 languages and corresponding Wikipedias. In this way the article builds a bridge between reading research, educational science, Wikipedia research and computational linguistics.Comment: 40 pages, 13 figures, 5 table

    A fuzzy approach for measuring development of topics in patents using Latent Dirichlet Allocation

    Full text link
    © 2015 IEEE. Technology progress brings the very rapid growth of patent publications, which increases the difficulty of domain experts to measure the development of various topics, handle linguistic terms used in evaluation and understand massive technological content. To overcome the limitations of keyword-ranking type of text mining result in existing research, and at the same time deal with the vagueness of linguistic terms to assist thematic evaluation, this research proposes a fuzzy set-based topic development measurement (FTDM) approach to estimate and evaluate the topics hidden in a large volume of patent claims using Latent Dirichlet Allocation. In this study, latent semantic topics are first discovered from patent corpus and measured by a temporal-weight matrix to reveal the importance of all topics in different years. For each topic, we then calculate a temporal-weight coefficient based on the matrix, which is associated with a set of linguistic terms to describe its development state over time. After choosing a suitable linguistic term set, fuzzy membership functions are created for each term. The temporal-weight coefficients are then transformed to membership vectors related to the linguistic terms, which can be used to measure the development states of all topics directly and effectively. A case study using solar cell related patents is given to show the effectiveness of the proposed FTDM approach and its applicability for estimating hidden topics and measuring their corresponding development states efficiently

    Case studies of academic writing in the sciences: a focus on the development of writing skills

    Get PDF
    The aim of the present thesis is to make a longitudinal study of changes affecting sentence-initial elements in articles published over time by a sample of researchers in international journals of physics. The linguistic framework adopted for such a study is a systematic-functional one. The general research methodology is established around two main axes, one linguistic, and the other statistical. To conduct a longitudinal survey focusing on thematic changes, it was necessary on the one hand to set up clear and unambiguous linguistic categories to capture these changes and, on the other, to present and interpret the findings in manageable and reliable ways with the assistance of statistics. A pilot study was initially set up to explore possible changes in two articles published within a two year interval by the American Physical Society. The articles were the first and the last of a series of five articles written by the same researcher on the same problem in physics. The method of analysis of the texts used a formulation of Theme that included Subject as an obligatory component, and Contextual Frame - i.e. pre-Subject elements - as an optional one. The analysis, using taxonomies proposed by Davies (1988, 1997) and Gosden (1993, 1996), suggested differences in thematic elements, especially regarding a certain type of complex Subject. On the basis of coding difficulties and the findings of the pilot study, taxonomies were modified to include in particular new Conventional and Instantial classes for Subject and Contextual Frame. Conventional wordings, both in Subject and in Contextual Frame position, are identified as being expressions which are readily available to novice writers of articles, because they are commonly used terms in the fields of research concerned. In contrast Instantial wordings are identified as being expressions which have been especially contrived by the writer to fit a given stretch of discourse. As writers develop and make their own the matter with which they are working; they become increasingly capable of crafting these more complex workings which involve multiple strands of meaning. In the case of this latter class, particular reference is made to post-modification and clause-type elements which allow meanings to be combined in specific ways

    What does semantic tiling of the cortex tell us about semantics?

    Get PDF
    Recent use of voxel-wise modeling in cognitive neuroscience suggests that semantic maps tile the cortex. Although this impressive research establishes distributed cortical areas active during the conceptual processing that underlies semantics, it tells us little about the nature of this processing. While mapping concepts between Marr's computational and implementation levels to support neural encoding and decoding, this approach ignores Marr's algorithmic level, central for understanding the mechanisms that implement cognition, in general, and conceptual processing, in particular. Following decades of research in cognitive science and neuroscience, what do we know so far about the representation and processing mechanisms that implement conceptual abilities? Most basically, much is known about the mechanisms associated with: (1) features and frame representations, (2) grounded, abstract, and linguistic representations, (3) knowledge-based inference, (4) concept composition, and (5) conceptual flexibility. Rather than explaining these fundamental representation and processing mechanisms, semantic tiles simply provide a trace of their activity over a relatively short time period within a specific learning context. Establishing the mechanisms that implement conceptual processing in the brain will require more than mapping it to cortical (and sub-cortical) activity, with process models from cognitive science likely to play central roles in specifying the intervening mechanisms. More generally, neuroscience will not achieve its basic goals until it establishes algorithmic-level mechanisms that contribute essential explanations to how the brain works, going beyond simply establishing the brain areas that respond to various task conditions

    Information Science in the web era: a term-based approach to domain mapping.

    Get PDF
    International audienceWe propose a methodology for mapping the research in Information Science (IS) field based on a combined use of symbolic (linguistic) and numeric information. Using the same list of 12 IS journals as in earlier studies on this same topic (White & McCain 1998 ; Zhao & Strotmann 2008a&b), we mapped the structure of research in IS for two consecutive periods: 1996-2005 and 2006-2008. We focused on mapping the content of scientific publications from the title and abstract fields of underlying publications. The labels of clusters were automatically derived from titles and abstracts of scientific publications based on linguistic criteria. The results showed that while Information Retrieval (IR) and Citation studies continued to be the two structuring poles of research in IS, other prominent poles have emerged: webometrics in the first period (1996-2005) evolved into general web studies in the second period, integrating more aspects of IR research. Hence web studies and IR are more interwoven. There is still persistence of user studies in IS but now dispersed among the web studies and the IR poles. The presence of some recent trends in IR research such as automatic summarization and the use of language models were also highlighted by our method. Theoretic research on "information science" continue to occupy a smaller but persistence place. Citation studies on the other hand remains a monolithic block, isolated from the two other poles (IR and web studies) save for a tenuous link through user studies. Citation studies have also recently evolved internally to accommodate newcomers like "h-index, Google scholar and the open access model". All these results were automatically generated by our method without resorting to manual labeling of specialties nor reading the publication titles. Our results show that mapping domain knowledge structures at the term level offers a more detailed and intuitive picture of the field as well as capturing emerging trends
    • …
    corecore