2,665 research outputs found

    The dynamics of reading development in L2 English for academic purposes

    Get PDF
    In a mixed-methods approach, this study investigates the complex and dynamic developmental trajectories of 27 Chinese Chemistry major undergraduates’ English academic reading ability. Twelve parallel tests were designed, validated, and used weekly during one semester. The analyses included a group pre-post design to measure academic reading gains, a regression analysis to predict beginning reading score with English proficiency and Chemistry knowledge as predictors, individual longitudinal case studies to measure variability and phase shifts, and a cluster analysis to discover (un)common developmental patterns. Finally, a qualitative study used interviews to discover difficulties in reading and strategies to overcome them. English proficiency predicted the initial reading score and the group gained significantly in academic reading. Each learner showed different non-linear patterns, and a cluster analysis revealed few similar patterns among learners. The high gainers showed relatively more variability over time and used more and a wider variety and more sophisticated learning and reading strategies to improve.</p

    Storia: Summarizing Social Media Content based on Narrative Theory using Crowdsourcing

    Full text link
    People from all over the world use social media to share thoughts and opinions about events, and understanding what people say through these channels has been of increasing interest to researchers, journalists, and marketers alike. However, while automatically generated summaries enable people to consume large amounts of data efficiently, they do not provide the context needed for a viewer to fully understand an event. Narrative structure can provide templates for the order and manner in which this data is presented to create stories that are oriented around narrative elements rather than summaries made up of facts. In this paper, we use narrative theory as a framework for identifying the links between social media content. To do this, we designed crowdsourcing tasks to generate summaries of events based on commonly used narrative templates. In a controlled study, for certain types of events, people were more emotionally engaged with stories created with narrative structure and were also more likely to recommend them to others compared to summaries created without narrative structure

    A Corpus-Based Approach for Building Semantic Lexicons

    Get PDF
    Semantic knowledge can be a great asset to natural language processing systems, but it is usually hand-coded for each application. Although some semantic information is available in general-purpose knowledge bases such as WordNet and Cyc, many applications require domain-specific lexicons that represent words and categories for a particular topic. In this paper, we present a corpus-based method that can be used to build semantic lexicons for specific categories. The input to the system is a small set of seed words for a category and a representative text corpus. The output is a ranked list of words that are associated with the category. A user then reviews the top-ranked words and decides which ones should be entered in the semantic lexicon. In experiments with five categories, users typically found about 60 words per category in 10-15 minutes to build a core semantic lexicon.Comment: 8 pages - to appear in Proceedings of EMNLP-

    The logic of random regular graphs

    Full text link

    Machine Learning at Microsoft with ML .NET

    Full text link
    Machine Learning is transitioning from an art and science into a technology available to every developer. In the near future, every application on every platform will incorporate trained models to encode data-based decisions that would be impossible for developers to author. This presents a significant engineering challenge, since currently data science and modeling are largely decoupled from standard software development processes. This separation makes incorporating machine learning capabilities inside applications unnecessarily costly and difficult, and furthermore discourage developers from embracing ML in first place. In this paper we present ML .NET, a framework developed at Microsoft over the last decade in response to the challenge of making it easy to ship machine learning models in large software applications. We present its architecture, and illuminate the application demands that shaped it. Specifically, we introduce DataView, the core data abstraction of ML .NET which allows it to capture full predictive pipelines efficiently and consistently across training and inference lifecycles. We close the paper with a surprisingly favorable performance study of ML .NET compared to more recent entrants, and a discussion of some lessons learned

    Information extraction and data mining from Chinese financial news.

    Get PDF
    Ng Anny.Thesis (M.Phil.)--Chinese University of Hong Kong, 2002.Includes bibliographical references (leaves 139-142).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Problem Definition --- p.2Chapter 1.2 --- Thesis Organization --- p.3Chapter 2 --- Chinese Text Summarization Using Genetic Algorithm --- p.4Chapter 2.1 --- Introduction --- p.4Chapter 2.2 --- Related Work --- p.6Chapter 2.3 --- Genetic Algorithm Approach --- p.10Chapter 2.3.1 --- Fitness Function --- p.11Chapter 2.3.2 --- Genetic operators --- p.14Chapter 2.4 --- Implementation Details --- p.15Chapter 2.5 --- Experimental results --- p.19Chapter 2.6 --- Limitations and Future Work --- p.24Chapter 2.7 --- Conclusion --- p.26Chapter 3 --- Event Extraction from Chinese Financial News --- p.27Chapter 3.1 --- Introduction --- p.28Chapter 3.2 --- Method --- p.29Chapter 3.2.1 --- Data Set Preparation --- p.29Chapter 3.2.2 --- Positive Word --- p.30Chapter 3.2.3 --- Negative Word --- p.31Chapter 3.2.4 --- Window --- p.31Chapter 3.2.5 --- Event Extraction --- p.32Chapter 3.3 --- System Overview --- p.33Chapter 3.4 --- Implementation --- p.33Chapter 3.4.1 --- Event Type and Positive Word --- p.34Chapter 3.4.2 --- Company Name --- p.34Chapter 3.4.3 --- Negative Word --- p.36Chapter 3.4.4 --- Event Extraction --- p.37Chapter 3.5 --- Stock Database --- p.38Chapter 3.5.1 --- Stock Movements --- p.39Chapter 3.5.2 --- Implementation --- p.39Chapter 3.5.3 --- Stock Database Transformation --- p.39Chapter 3.6 --- Performance Evaluation --- p.40Chapter 3.6.1 --- Performance measures --- p.40Chapter 3.6.2 --- Evaluation --- p.41Chapter 3.7 --- Conclusion --- p.45Chapter 4 --- Mining Frequent Episodes --- p.46Chapter 4.1 --- Introduction --- p.46Chapter 4.1.1 --- Definitions --- p.48Chapter 4.2 --- Related Work --- p.50Chapter 4.3 --- Double-Part Event Tree for the database --- p.56Chapter 4.3.1 --- Complexity of tree construction --- p.62Chapter 4.4 --- Mining Frequent Episodes with the DE-tree --- p.63Chapter 4.4.1 --- Conditional Event Trees --- p.66Chapter 4.4.2 --- Single Path Conditional Event Tree --- p.67Chapter 4.4.3 --- Complexity of Mining Frequent Episodes with DE-Tree --- p.67Chapter 4.4.4 --- An Example --- p.68Chapter 4.4.5 --- Completeness of finding frequent episodes --- p.71Chapter 4.5 --- Implementation of DE-Tree --- p.71Chapter 4.6 --- Method 2: Node-List Event Tree --- p.76Chapter 4.6.1 --- Tree construction --- p.79Chapter 4.6.2 --- Order of Position Bits --- p.83Chapter 4.7 --- Implementation of NE-tree construction --- p.84Chapter 4.7.1 --- Complexity of NE-Tree Construction --- p.86Chapter 4.8 --- Mining Frequent Episodes with NE-tree --- p.87Chapter 4.8.1 --- Conditional NE-Tree --- p.87Chapter 4.8.2 --- Single Path Conditional NE-Tree --- p.88Chapter 4.8.3 --- Complexity of Mining Frequent Episodes with NE-Tree --- p.89Chapter 4.8.4 --- An Example --- p.89Chapter 4.9 --- Performance evaluation --- p.91Chapter 4.9.1 --- Synthetic data --- p.91Chapter 4.9.2 --- Real data --- p.99Chapter 4.10 --- Conclusion --- p.103Chapter 5 --- Mining N-most Interesting Episodes --- p.104Chapter 5.1 --- Introduction --- p.105Chapter 5.2 --- Method --- p.106Chapter 5.2.1 --- Threshold Improvement --- p.108Chapter 5.2.2 --- Pseudocode --- p.112Chapter 5.3 --- Experimental Results --- p.112Chapter 5.3.1 --- Synthetic Data --- p.113Chapter 5.3.2 --- Real Data --- p.119Chapter 5.4 --- Conclusion --- p.121Chapter 6 --- Mining Frequent Episodes with Event Constraints --- p.122Chapter 6.1 --- Introduction --- p.122Chapter 6.2 --- Method --- p.123Chapter 6.3 --- Experimental Results --- p.125Chapter 6.3.1 --- Synthetic Data --- p.126Chapter 6.3.2 --- Real Data --- p.129Chapter 6.4 --- Conclusion --- p.131Chapter 7 --- Conclusion --- p.133Chapter A --- Test Cases --- p.135Chapter A.1 --- Text 1 --- p.135Chapter A.2 --- Text 2 --- p.137Bibliography --- p.13

    Knowledge-Enhanced Personalized Review Generation with Capsule Graph Neural Network

    Full text link
    Personalized review generation (PRG) aims to automatically produce review text reflecting user preference, which is a challenging natural language generation task. Most of previous studies do not explicitly model factual description of products, tending to generate uninformative content. Moreover, they mainly focus on word-level generation, but cannot accurately reflect more abstractive user preference in multiple aspects. To address the above issues, we propose a novel knowledge-enhanced PRG model based on capsule graph neural network~(Caps-GNN). We first construct a heterogeneous knowledge graph (HKG) for utilizing rich item attributes. We adopt Caps-GNN to learn graph capsules for encoding underlying characteristics from the HKG. Our generation process contains two major steps, namely aspect sequence generation and sentence generation. First, based on graph capsules, we adaptively learn aspect capsules for inferring the aspect sequence. Then, conditioned on the inferred aspect label, we design a graph-based copy mechanism to generate sentences by incorporating related entities or words from HKG. To our knowledge, we are the first to utilize knowledge graph for the PRG task. The incorporated KG information is able to enhance user preference at both aspect and word levels. Extensive experiments on three real-world datasets have demonstrated the effectiveness of our model on the PRG task.Comment: Accepted by CIKM 2020 (Long Paper
    • …
    corecore