25,159 research outputs found

    Memory-Based Shallow Parsing

    Full text link
    We present memory-based learning approaches to shallow parsing and apply these to five tasks: base noun phrase identification, arbitrary base phrase recognition, clause detection, noun phrase parsing and full parsing. We use feature selection techniques and system combination methods for improving the performance of the memory-based learner. Our approach is evaluated on standard data sets and the results are compared with that of other systems. This reveals that our approach works well for base phrase identification while its application towards recognizing embedded structures leaves some room for improvement

    Nominal And Verbal In Dialect Sasak Kuto-kute Bayan, West Lombok Regency: Description And Analysis

    Full text link
    Kehidupan dan bahasa merupakan suatu hal yang tidak dapat dipisahkan. Hubungan antara kehidupan dan bahasa sangat erat sehingga setiap bangsa dan etnis memiliki bahasa sendiri yang digunakan oleh penuturnya . Di Lombok, Nusa Tenggara Barat terdapat bahasa Sasak g digunakan oleh oleh masyarakat Sasak. Bahasa Sasak mendapat perhatian khusus para linguis. Banyak penelitian telah dilakukan untuk menggambarkan karakteristik bahasa Sasak. Sebuah penelitian mengatakan bahwa adanya potensi konflik antaretnik penutur bahasa yang berbeda karena faktor penggunaan bahasa. Dikatakan bahwa miskomunikasi menyebabkan kesalahpahaman, dan kesalahpahaman disebabkan oleh kurangnya pengetahuan linguistik. Pemahaman yang baik terhadap sebuah bahasa merupakan siati keharusan karena hal itu bisa menghindarkan konflik. Sebuah penelitian untuk menggambarkan dan menganalisis nominal dan verbal dialek sasak Kuto-Kute sangat diperlukan. Penelitian ini merupaka suatu upaya untuk menggambarkan dan menganalisis dialek verbal dan nominal sasak Kuto-Kute di Kabupaten Bayan, Kabupaten Lombok barat. Ini adalah penelitian deskriptif dan eksploratif. Empat siswa sasak, asli dialek Kuto-Kute diwawancarai untuk memperoleh data. Mereka mampu berbicara bahasa dengan baik, memiliki organ berbicara yang normal, tamatan sekolah dasar, berusia sekitar 20-40 tahun, dan tinggal di luar pulau Lombok. Data dikumpulkan melalui studi observasi, wawancara dan kepustakaan. Data yang terkumpul dianalisis secara deskriptif. Penelitian ini menghasilkan proses morfologi pada nominalisasi dan verbalisasi yang melibatkan penggunaan prefiks (7), infiks (1), dan akhiran (1) dan 7 simulfixes. Proses morfemis dapat mengubah bentuk dan arti dari morfem bebas. Dan morfem diidentifikasi sebagai bebas dan terikat

    Refining the use of the web (and web search) as a language teaching and learning resource

    Get PDF
    The web is a potentially useful corpus for language study because it provides examples of language that are contextualized and authentic, and is large and easily searchable. However, web contents are heterogeneous in the extreme, uncontrolled and hence 'dirty,' and exhibit features different from the written and spoken texts in other linguistic corpora. This article explores the use of the web and web search as a resource for language teaching and learning. We describe how a particular derived corpus containing a trillion word tokens in the form of n-grams has been filtered by word lists and syntactic constraints and used to create three digital library collections, linked with other corpora and the live web, that exploit the affordances of web text and mitigate some of its constraints

    Research on speech understanding and related areas at SRI

    Get PDF
    Research capabilities on speech understanding, speech recognition, and voice control are described. Research activities and the activities which involve text input rather than speech are discussed

    Concept-based Interactive Query Expansion Support Tool (CIQUEST)

    Get PDF
    This report describes a three-year project (2000-03) undertaken in the Information Studies Department at The University of Sheffield and funded by Resource, The Council for Museums, Archives and Libraries. The overall aim of the research was to provide user support for query formulation and reformulation in searching large-scale textual resources including those of the World Wide Web. More specifically the objectives were: to investigate and evaluate methods for the automatic generation and organisation of concepts derived from retrieved document sets, based on statistical methods for term weighting; and to conduct user-based evaluations on the understanding, presentation and retrieval effectiveness of concept structures in selecting candidate terms for interactive query expansion. The TREC test collection formed the basis for the seven evaluative experiments conducted in the course of the project. These formed four distinct phases in the project plan. In the first phase, a series of experiments was conducted to investigate further techniques for concept derivation and hierarchical organisation and structure. The second phase was concerned with user-based validation of the concept structures. Results of phases 1 and 2 informed on the design of the test system and the user interface was developed in phase 3. The final phase entailed a user-based summative evaluation of the CiQuest system. The main findings demonstrate that concept hierarchies can effectively be generated from sets of retrieved documents and displayed to searchers in a meaningful way. The approach provides the searcher with an overview of the contents of the retrieved documents, which in turn facilitates the viewing of documents and selection of the most relevant ones. Concept hierarchies are a good source of terms for query expansion and can improve precision. The extraction of descriptive phrases as an alternative source of terms was also effective. With respect to presentation, cascading menus were easy to browse for selecting terms and for viewing documents. In conclusion the project dissemination programme and future work are outlined
    corecore