1,189 research outputs found

    Automatic Scaling of Text for Training Second Language Reading Comprehension

    Get PDF
    For children learning their first language, reading is one of the most effective ways to acquire new vocabulary. Studies link students who read more with larger and more complex vocabularies. For second language learners, there is a substantial barrier to reading. Even the books written for early first language readers assume a base vocabulary of nearly 7000 word families and a nuanced understanding of grammar. This project will look at ways that technology can help second language learners overcome this high barrier to entry, and the effectiveness of learning through reading for adults acquiring a foreign language. Through the implementation of Dokusha, an automatic graded reader generator for Japanese, this project will explore how advancements in natural language processing can be used to automatically simplify text for extensive reading in Japanese as a foreign language

    Programmable Insight: A Computational Methodology to Explore Online News Use of Frames

    Get PDF
    abstract: The Internet is a major source of online news content. Online news is a form of large-scale narrative text with rich, complex contents that embed deep meanings (facts, strategic communication frames, and biases) for shaping and transitioning standards, values, attitudes, and beliefs of the masses. Currently, this body of narrative text remains untapped due—in large part—to human limitations. The human ability to comprehend rich text and extract hidden meanings is far superior to known computational algorithms but remains unscalable. In this research, computational treatment is given to online news framing for exposing a deeper level of expressivity coined “double subjectivity” as characterized by its cumulative amplification effects. A visual language is offered for extracting spatial and temporal dynamics of double subjectivity that may give insight into social influence about critical issues, such as environmental, economic, or political discourse. This research offers benefits of 1) scalability for processing hidden meanings in big data and 2) visibility of the entire network dynamics over time and space to give users insight into the current status and future trends of mass communication.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Ontology based data warehousing for mining of heterogeneous and multidimensional data sources

    Get PDF
    Heterogeneous and multidimensional big-data sources are virtually prevalent in all business environments. System and data analysts are unable to fast-track and access big-data sources. A robust and versatile data warehousing system is developed, integrating domain ontologies from multidimensional data sources. For example, petroleum digital ecosystems and digital oil field solutions, derived from big-data petroleum (information) systems, are in increasing demand in multibillion dollar resource businesses worldwide. This work is recognized by Industrial Electronic Society of IEEE and appeared in more than 50 international conference proceedings and journals

    Code Clone Detection Using Wavelets

    Get PDF
    For different reasons, developers may produce code that is cloned. It has a negative impact on code quality and code clones are one of the most frequent problems that may appear in a software project. Code clones have an influence on the difficulty of maintaining code, which results in loss of time and money. In this thesis we will propose solution for code clone detection by using wavelet analysis. Wavelet analysis has been found to be extremely useful for clone detection in image processing and financial market analysis. Wavelets have the benefit of allowing comparisons than span different scales and strength. It also benefits a lot from parallelisation, which has become more affordable thanks to GPU computing and cloud computing advances. Thus, it makes sense to evaluate wavelet analysis for solving problems in software engineering as well. The code clone detection algorithm made in this thesis will be language independent and its usefulness will be evaluated in finding different type of clones and compared against existing solutions.Erinevatel põhjustel võivad arendajad teha koodi, mis on kloon olemasolevast lahendusest. Sellel on negatiivne mõju koodi kvaliteedile mistõttu on sellest saanud üks levinumatest probleemidest, mis leidub tarkvaraprojektis. Koodikloonid mõjutavad koodi hallatavust, mis põhjustab omakorda kaotuse nii ajas kui ka rahas. Selle töö raames pakume välja lahenduse leidmaks koodikloone kasutades lainik analüüsi. Lainik analüüs on kasutusel ja vägagi kasulik kloonide leidmisel pilditöötluses ja finantsturgude analüüsis. Lisaks saab lainik analüüsis kasutada võrdlusi, mis muutuvad erinevatel skaaladel ja tugevustel ning ära kasutada paralleliseerimist, mis on saanud kättesaadavamaks tänu GPU ja pilvearvutuste arengule. Seetõttu on loogiline lainik analsüüsi hinnata ka tarkvaraarenduses. Töö raames loodav koodikloonide leidmise algoritm on keelest sõltumatu ning selle väljundi kasulikkust hinnatakse erinevate kloonide leidmisel ja võrreldakse olemasolevate lahendustega

    Mapping Acoustic and Semantic Dimensions of Auditory Perception

    Get PDF
    Auditory categorisation is a function of sensory perception which allows humans to generalise across many different sounds present in the environment and classify them into behaviourally relevant categories. These categories cover not only the variance of acoustic properties of the signal but also a wide variety of sound sources. However, it is unclear to what extent the acoustic structure of sound is associated with, and conveys, different facets of semantic category information. Whether people use such data and what drives their decisions when both acoustic and semantic information about the sound is available, also remains unknown. To answer these questions, we used the existing methods broadly practised in linguistics, acoustics and cognitive science, and bridged these domains by delineating their shared space. Firstly, we took a model-free exploratory approach to examine the underlying structure and inherent patterns in our dataset. To this end, we ran principal components, clustering and multidimensional scaling analyses. At the same time, we drew sound labels’ semantic space topography based on corpus-based word embeddings vectors. We then built an LDA model predicting class membership and compared the model-free approach and model predictions with the actual taxonomy. Finally, by conducting a series of web-based behavioural experiments, we investigated whether acoustic and semantic topographies relate to perceptual judgements. This analysis pipeline showed that natural sound categories could be successfully predicted based on the acoustic information alone and that perception of natural sound categories has some acoustic grounding. Results from our studies help to recognise the role of physical sound characteristics and their meaning in the process of sound perception and give an invaluable insight into the mechanisms governing the machine-based and human classifications

    Towards Comprehensive Foundations of Computational Intelligence

    Full text link
    Abstract. Although computational intelligence (CI) covers a vast variety of different methods it still lacks an integrative theory. Several proposals for CI foundations are discussed: computing and cognition as compression, meta-learning as search in the space of data models, (dis)similarity based methods providing a framework for such meta-learning, and a more general approach based on chains of transformations. Many useful transformations that extract information from features are discussed. Heterogeneous adaptive systems are presented as particular example of transformation-based systems, and the goal of learning is redefined to facilitate creation of simpler data models. The need to understand data structures leads to techniques for logical and prototype-based rule extraction, and to generation of multiple alternative models, while the need to increase predictive power of adaptive models leads to committees of competent models. Learning from partial observations is a natural extension towards reasoning based on perceptions, and an approach to intuitive solving of such problems is presented. Throughout the paper neurocognitive inspirations are frequently used and are especially important in modeling of the higher cognitive functions. Promising directions such as liquid and laminar computing are identified and many open problems presented.

    Computer-assisted text analysis methodology in the social sciences

    Full text link
    "This report presents an account of methods of research in computer-assisted text analysis in the social sciences. Rather than to provide a comprehensive enumeration of all computer-assisted text analysis investigations either directly or indirectly related to the social sciences using a quantitative and computer-assisted methodology as their text analytical tool, the aim of this report is to describe the current methodological standpoint of computer-assisted text analysis in the social sciences. This report provides, thus, a description and a discussion of the operations carried out in computer-assisted text analysis investigations. The report examines both past and well-established as well as some of the current approaches in the field and describes the techniques and the procedures involved. By this means, a first attempt is made toward cataloguing the kinds of supplementary information as well as computational support which are further required to expand the suitability and applicability of the method for the variety of text analysis goals." (author's abstract
    corecore