25,159 research outputs found
Memory-Based Shallow Parsing
We present memory-based learning approaches to shallow parsing and apply
these to five tasks: base noun phrase identification, arbitrary base phrase
recognition, clause detection, noun phrase parsing and full parsing. We use
feature selection techniques and system combination methods for improving the
performance of the memory-based learner. Our approach is evaluated on standard
data sets and the results are compared with that of other systems. This reveals
that our approach works well for base phrase identification while its
application towards recognizing embedded structures leaves some room for
improvement
Nominal And Verbal In Dialect Sasak Kuto-kute Bayan, West Lombok Regency: Description And Analysis
Kehidupan dan bahasa merupakan suatu hal yang tidak dapat dipisahkan. Hubungan antara kehidupan dan bahasa sangat erat sehingga setiap bangsa dan etnis memiliki bahasa sendiri yang digunakan oleh penuturnya . Di Lombok, Nusa Tenggara Barat terdapat bahasa Sasak g digunakan oleh oleh masyarakat Sasak. Bahasa Sasak mendapat perhatian khusus para linguis. Banyak penelitian telah dilakukan untuk menggambarkan karakteristik bahasa Sasak. Sebuah penelitian mengatakan bahwa adanya potensi konflik antaretnik penutur bahasa yang berbeda karena faktor penggunaan bahasa. Dikatakan bahwa miskomunikasi menyebabkan kesalahpahaman, dan kesalahpahaman disebabkan oleh kurangnya pengetahuan linguistik. Pemahaman yang baik terhadap sebuah bahasa merupakan siati keharusan karena hal itu bisa menghindarkan konflik. Sebuah penelitian untuk menggambarkan dan menganalisis nominal dan verbal dialek sasak Kuto-Kute sangat diperlukan. Penelitian ini merupaka suatu upaya untuk menggambarkan dan menganalisis dialek verbal dan nominal sasak Kuto-Kute di Kabupaten Bayan, Kabupaten Lombok barat. Ini adalah penelitian deskriptif dan eksploratif. Empat siswa sasak, asli dialek Kuto-Kute diwawancarai untuk memperoleh data. Mereka mampu berbicara bahasa dengan baik, memiliki organ berbicara yang normal, tamatan sekolah dasar, berusia sekitar 20-40 tahun, dan tinggal di luar pulau Lombok. Data dikumpulkan melalui studi observasi, wawancara dan kepustakaan. Data yang terkumpul dianalisis secara deskriptif. Penelitian ini menghasilkan proses morfologi pada nominalisasi dan verbalisasi yang melibatkan penggunaan prefiks (7), infiks (1), dan akhiran (1) dan 7 simulfixes. Proses morfemis dapat mengubah bentuk dan arti dari morfem bebas. Dan morfem diidentifikasi sebagai bebas dan terikat
Refining the use of the web (and web search) as a language teaching and learning resource
The web is a potentially useful corpus for language study because it provides examples of language that are contextualized and authentic, and is large and easily searchable. However, web contents are heterogeneous in the extreme, uncontrolled and hence 'dirty,' and exhibit features different from the written and spoken texts in other linguistic corpora. This article explores the use of the web and web search as a resource for language teaching and learning. We describe how a particular derived corpus containing a trillion word tokens in the form of n-grams has been filtered by word lists and syntactic constraints and used to create three digital library collections, linked with other corpora and the live web, that exploit the affordances of web text and mitigate some of its constraints
Research on speech understanding and related areas at SRI
Research capabilities on speech understanding, speech recognition, and voice control are described. Research activities and the activities which involve text input rather than speech are discussed
Concept-based Interactive Query Expansion Support Tool (CIQUEST)
This report describes a three-year project (2000-03) undertaken in the Information Studies
Department at The University of Sheffield and funded by Resource, The Council for
Museums, Archives and Libraries. The overall aim of the research was to provide user
support for query formulation and reformulation in searching large-scale textual resources
including those of the World Wide Web. More specifically the objectives were: to investigate
and evaluate methods for the automatic generation and organisation of concepts derived from
retrieved document sets, based on statistical methods for term weighting; and to conduct
user-based evaluations on the understanding, presentation and retrieval effectiveness of
concept structures in selecting candidate terms for interactive query expansion.
The TREC test collection formed the basis for the seven evaluative experiments conducted in
the course of the project. These formed four distinct phases in the project plan. In the first
phase, a series of experiments was conducted to investigate further techniques for concept
derivation and hierarchical organisation and structure. The second phase was concerned with
user-based validation of the concept structures. Results of phases 1 and 2 informed on the
design of the test system and the user interface was developed in phase 3. The final phase
entailed a user-based summative evaluation of the CiQuest system.
The main findings demonstrate that concept hierarchies can effectively be generated from
sets of retrieved documents and displayed to searchers in a meaningful way. The approach
provides the searcher with an overview of the contents of the retrieved documents, which in
turn facilitates the viewing of documents and selection of the most relevant ones. Concept
hierarchies are a good source of terms for query expansion and can improve precision. The
extraction of descriptive phrases as an alternative source of terms was also effective. With
respect to presentation, cascading menus were easy to browse for selecting terms and for
viewing documents. In conclusion the project dissemination programme and future work are
outlined
- …