167,677 research outputs found

    The effect of high- and low-frequency previews and sentential fit on word skipping during reading

    Get PDF
    In a previous gaze-contingent boundary experiment, Angele and Rayner (2013) found that readers are likely to skip a word that appears to be the definite article the even when syntactic constraints do not allow for articles to occur in that position. In the present study, we investigated whether the word frequency of the preview of a 3-letter target word influences a reader’s decision to fixate or skip that word. We found that the word frequency rather than the felicitousness (syntactic fit) of the preview affected how often the upcoming word was skipped. These results indicate that visual information about the upcoming word trumps information from the sentence context when it comes to making a skipping decision. Skipping parafoveal instances of the therefore may simply be an extreme case of skipping high-frequency words

    Comparing predictors of sentence self-paced reading times: Syntactic complexity versus transitional probability metrics

    Get PDF
    When estimating the influence of sentence complexity on reading, researchers typically opt for one of two main approaches: Measuring syntactic complexity (SC) or transitional probability (TP). Comparisons of the predictive power of both approaches have yielded mixed results. To address this inconsistency, we conducted a self-paced reading experiment. Participants read sentences of varying syntactic complexity. From two alternatives, we selected the set of SC and TP measures, respectively, that provided the best fit to the self-paced reading data. We then compared the contributions of the SC and TP measures to reading times when entered into the same model. Our results showed that both measures explained significant portions of variance in self-paced reading times. Thus, researchers aiming to measure sentence complexity should take both SC and TP into account. All of the analyses were conducted with and without control variables known to influence reading times (word/sentence length, word frequency and word position) to showcase how the effects of SC and TP change in the presence of the control variables

    Strategi Pemilihan Kalimat Pada Peringkasan Multi Dokumen Satrio Verdianto Nrp 5111 100 183

    Get PDF
    Ringkasan berita diartikan sebagai teks yang dihasilkan dari satu atau lebih kalimat yang menyampaikan informasi penting dari berita. Salah satu fase penting dalam peringkasan adalah pembobotan kalimat (sentence scoring). Dimana pada peringkasan berita, metode pembobotannya sebagian besar menggunakan fitur dari berita sendiri. Berdasarkan hasil dari penelitian (Ferreira, et al., 2014) bahwa untuk pembobotan kalimat pada dokumen yang memiliki karakter teks pendek dan terstruktur seperti berita maka teknik pembobotan kalimat terbaik adalah dengan menggunakan kombinasi dari keempat fitur yaitu word frequency, TF-IDF, posisi kalimat, dan kemiripan kalimat terhadap judul (Resemblance to the title ). Pada penelitian ini kombinasi keempat fitur tersebut dibandingkan dengan kombinasi tiga fitur dan dua fitur dan dievaluasi menggunakan nilai ROUGE-N dan dievaluasi berdasarkan lama waktu eksekusi. Berdasarkan hasil uji coba didapatkan hasil bahwa yang paling optimal diantara keempat kombinasi fitur tersebut adalah kombinasi antara dua buah fitur yakni fitur posisi kalimat dan word frequency dengan nilai ROUGE-N sebesar 0.679 dan lama waktu eksekusi 28.458 detik ============================================================================================= Summary of news is defined as a text resulting from one or more sentences that convey important information from news. One important phase in text summarization is weighting sentence (sentence scoring). In the news summarization the weighting method mostly using the features of the news itself. Based on the results of the study (Ferreira, et al., 2014) that for weighting sentences in documents that have character short text and structured as news, the technique of weighting sentence is best to use a combination of all four features that word frequency, TF-IDF, position, and Resemblance to the title. In this study, the combination of four features compared to the combination of three features and two features and evaluated using a value ROUGE-N and evaluated based on the length of time of execution. Based on test results showed that among the four combination of these feature, the most optimal combination is the combination of two features those are position of the sentence feature and word frequency feature with ROUGE-N 0.679 and length of time of execution 28.458 se

    Exploring the Use of Adverb ‘Literally’ in Corpus of Contemporary American English

    Get PDF
    ABSTRACTThis research aims to describe the use of adverb literally by a native speaker. It is qualitative descriptive research. The main source of this research is the data from one of the online corpora, namely Corpus of Contemporary American English (COCA). There are three steps used in this research, namely the data collection, the data analysis, and the display of the analysis of the results. Based on the data from COCA, this research tries to describe the frequency of the use of adverb literally in COCA and how the adverb is used in the sentence by knowing the particle that follows it. Theories used in this research are the theory of adverb by Pichler (2016) which is supported by the theory by Murphy (1993) and the types of an adverb by Frank (1972). The result shows that the frequency of use of the word literally in COCA amounted to 39.109 contained in the range of 1990 to 2019. The adverb is mostly used in the context of spoken language which is 8.339. The collocation and the concordance lines in COCA are used to find out the particle that follows the adverb literally. The collocation in this research is divided into three classes of words, namely verbs, adjectives, and adverbs. Based on the concordance lines of adverbs in COCA, we can know that the adverb does not have the same position in the sentence. The position of adverb literally can change based on the context of the sentence.

    Cross-regional word duration patterns in Mandarin

    Get PDF
    Duration contrasts can convey many types of information, including language background, word structure, word frequency, speech genre, intention, and emotions. An understanding of duration lays the foundation for many aspects of speech technology since duration plays a major role in speech production and perception. This dissertation explores the duration patterns of Mandarin words among three Mandarin dialectal regions---Beijing, Taiwan, and Malaysia. This dissertation brings diverse methodologies on speech data collection, annotation, and corpus construction to investigate linguistic pattern. Three speech production studies are conducted to explore the duration patterns of words with different length and internal structures. These studies reveal the general duration patterns of Mandarin Words. First of all, all the multi-syllabic words demonstrate the disyllabic long-short metrical form. Second, linguistic factors---syllable structure, positions (syllable position, word position, and sentence position), word frequency, word category, word internal structure, particle attachment, speech rate of sentence have significant effects on syllable duration. Thirdly, social factor---region interacts with multiple linguistic factors (word structure, syllable position, and particle attachment) and plays an important role in duration prediction. Quantitative data from these studies reveal that there are regional differences in rhythmic contrast among different Mandarin speaking regions. Beijing Mandarin speakers are more sensitive to the length change of linguistic unit and show stronger rhythmic contrast than speakers from Taiwan and Malaysia Mandarins. The results also display that Malaysia Mandarin speakers show the similar rhythmic pattern as Beijing Mandarin speakers. The investigation of duration patterns in this dissertation provides a detailed description of word duration in Mandarin. This dissertation also provides the foundation for further research on duration pattern related super-segmental feature. A comprehensive understanding of duration pattern with linguistic and social factors is helpful to improve the quality of durational models used in speech technology

    Effects of reading proficiency and of base and whole-word frequency on reading noun- and verb-derived words: an eye-tracking study in Italian primary school children

    Get PDF
    The aim of this study is to assess the role of readers' proficiency and of the base-word distributional properties on eye-movement behavior. Sixty-two typically developing children, attending 3rd, 4th, and 5th grade, were asked to read derived words in a sentence context. Target words were nouns derived from noun bases (e.g., umorista, 'humorist'), which in Italian are shared by few derived words, and nouns derived from verb bases (e.g., punizione, 'punishment'), which are shared by about 50 different inflected forms and several derived words. Data shows that base and word frequency affected first-fixation duration for nouns derived from noun bases, but in an opposite way: base frequency had a facilitative effect on first fixation, whereas word frequency exerted an inhibitory effect. These results were interpreted as a competition between early accessed base words (e.g., camino, chimney) and target words (e.g., caminetto, fireplace). For nouns derived from verb bases, an inhibitory base frequency effect but no word frequency effect was observed. These results suggest that syntactic context, calling for a noun in the target position, lead to an inhibitory effect when a verb base was detected, and made it difficult for readers to access the corresponding base+suffix combination (whole word) in the very early processing phases. Gaze duration was mainly affected by word frequency and length: for nouns derived from noun bases, this interaction was modulated by proficiency, as length effect was stronger for less proficient readers, while they were processing low-frequency words. For nouns derived from verb bases, though, all children, irrespective of their reading ability, showed sensitivity to the interaction within frequency of base+suffix combination (word frequency) and target length. Results of this study are consistent with those of other Italian studies that contrasted noun and verb processing, and confirm that distributional properties of morphemic constituents have a significant impact on the strategies used for processing morphologically complex words

    Gender dependent word-level emotion detection using global spectral speech features

    Get PDF
    In this study, global spectral features extracted from word and sentence levels are studied for speech emotion recognition. MFCC (Mel Frequency Cepstral Coefficient) were used as spectral information for recognition purpose. Global spectral features representing gross statistics such as mean of MFCC are used. This study also examine words at different positions (initial, middle and end) separately in a sentence. Word-level feature extraction is used to analyze emotion recognition performance of words at different positions. Word boundaries are manually identified. Gender dependent and independent models are also studied to analyze the gender impact on emotion recognition performance. Berlin’s Emo-DB (Emotional Database) was used for emotional speech dataset. Performance of different classifiers also been studied. NN (Neural Network), KNN (K-Nearest Neighbor) and LDA (Linear Discriminant Analysis) are included in the classifiers. Anger and neutral emotions were also studied. Results showed that, using all 13 MFCC coefficients provide better classification results than other combinations of MFCC coefficients for the mentioned emotions. Words at initial and ending positions provide more emotion, specific information than words at middle position. Gender dependent models are more efficient than gender independent models. Moreover, female are more efficient than male model and female exhibit emotions better than the male. General, NN performs the worst compared to KNN and LDA in classifying anger and neutral. LDA performs better than KNN almost 15% for gender independent model and almost 25% for gender dependent

    Pembobotan Kalimat berdasarkan Fitur Berita, Informasi Gramatikal dan Relevansi Kalimat terhadap Judul untuk Peringkasan Multi-dokumen Berita

    Get PDF
    Pembobotan kalimat merupakan tahapan yang sering digunakan dalam peringkasan dokumen, tak terkecuali dokumen berita. Dalam peringkasan dokumen berita, metode pembobotan kalimat untuk menentukan kalimat representative sebagian besar menggunakan fitur dari berita itu sendiri seperti word frequency, Term Frequency-Inverse Document Frequency (TF-IDF), posisi kalimat, dan kemiripan kalimat terhadap judul. Metode ini mampu memilih kalimat representative dalam peringkasan dokumen. Akan tetapi metode pembobotan kalimat berdasarkan fitur berita tidak cukup, karena metode dengan fitur tersebut mengabaikan kata informatif dalam kalimat dan hanya mengukur relevansi kalimat dengan judul berdasarkan kesamaan kata. Penelitian ini bertujuan untuk melakukan peringkasan multi dokumen berita menggunakan metode pembobotan berdasarkan fitur penting berita dengan pendekatan informasi gramatikal (gramatical information) dan relevansi kalimat terhadap judul. Informasi gramatikal digunakan untuk mengindikasikan kata informatif dalam suatu kalimat. Sedangkan relevansi kalimat terhadap judul ditujukan untuk mengetahui tingkat keterhubungan kalimat terhadap judul baik dalam konteks kesamaan kata maupun kesamaan makna kata. Pembobotan kalimat berdasarkan kombinasi antara fitur berita dengan informasi gramatikal dan relevansi kalimat terhadap judul diharapkan mampu memilih kalimat representative secara lebih baik dan mampu meningkatkan kualitas hasil ringkasan. Pada penelitian ini terdapat 4 tahapan yang dilakukan untuk menghasilkan ringkasan multi-dokumen berita antara lain seleksi berita, text preprocessing, sentence scoring, dan tahap penyusunan ringkasan. Untuk mengukur hasil ringkasan menggunakan metode evaluasi Recall-Oriented Understudy for Gisting Evaluation (ROUGE) dengan empat varian fungsi yaitu ROUGE-1, ROUGE-2, ROUGE-L dan ROUGE-SU4. Hasil eksperimen pada 11 kelompok dokumen berita Indonesia pada metode yang diusulkan dibandingkan dengan metode pembobotan dengan pendekatan trending issue (NeFTIS). Metode yang diusulkan mencapai hasil yang lebih baik dibandingkan metode NeFTIS dengan peningkatan nilai untuk ROUGE-1, ROUGE-2, ROUGE-L, dan ROUGE-SU4 secara berturut-turut adalah 58%, 99.32%, 13.53%, 82.65%. ============================================================================================================= Sentence weighting is frequent used stages in the document summary. In the news document summary, sentence weighting methods for determining representative sentences mostly used features of the news itself such as word frequency, Term Frequency-Inverse Document Frequency (TF-IDF), sentence position, and resemblance to the title. The methods are adequate for selecting representative sentences in the document summary. However, sentence weighting methods based on news features are not sufficient because it ignores the informative word in the sentence and only measures the relevance of sentence with the title based on the similarity of words. This research aims to perform multi-document summaries using sentence weighting methods based on news features with grammatical information and the relevance of sentence to the title approach. Grammatical information is used to indicate the informative word in a sentence. The relevance of the sentence to the title is intended to find out the level of connectedness of the sentence to the title both in the context of the similarity of words and similarity of the word meaning. Sentence weighting based on a combination of news features with grammatical information and the relevance of sentence to the title are expected to be able selecting better representative sentences and improve the quality of the summary results. In this research, there are 4 stages to obtain news multi-document summary such as news selection, text preprocessing, sentence scoring, and forming summary. Measurement of summary results using Recall-Oriented Understudy for Gisting Evaluation (ROUGE) evaluation method. The results of the experiment on the 11 groups of Indonesian news document are compared with those of the news features with trending issue approach method (NeFTIS). Our proposed method achieved better results with an increasing rate of ROUGE-1, ROUGE-2, ROUGE-L, and ROUGE-SU4 sequently are 58%, 99.32%, 13.53%, 82.65%
    corecore