38,400 research outputs found

    Automatic Genre Classification in Web Pages Applied to Web Comments

    Get PDF
    Automatic Web comment detection could significantly facilitate information retrieval systems, e.g., a focused Web crawler. In this paper, we propose a text genre classifier for Web text segments as intermediate step for Web comment detection in Web pages. Different feature types and classifiers are analyzed for this purpose. We compare the two-level approach to state-of-the-art techniques operating on the whole Web page text and show that accuracy can be improved significantly. Finally, we illustrate the applicability for information retrieval systems by evaluating our approach on Web pages achieved by a Web crawler

    Experiment on Style-Dependent Document Ranking

    Full text link
    The paper reports on experiments aimed at incorporating style-dependent parameters into ranking schemata in information retrieval tasks. We use ROMIP Web collection and ROMIP-2003 ad-hoc track results in the analysis. Factor analysis techniques have been used to extract factors that would reflect stylistic properties of documents. Comparison of the obtained style-dependent parameters and their derived ranks is conducted. A simple schema for rank aggregation is proposed. Evaluation of the results shows only moderate improvement of relevance ranking.Π’ Ρ€Π°Π±ΠΎΡ‚Π΅ описываСтся экспСримСнт ΠΏΠΎ использованию стилистичСских ΠΏΠ°Ρ€Π°ΠΌΠ΅Ρ‚Ρ€ΠΎΠ² Π² Ρ€Π°Π½ΠΆΠΈΡ€ΠΎΠ²Π°Π½ΠΈΠΈ Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚ΠΎΠ² для Π·Π°Π΄Π°Ρ‡ΠΈ ΠΈΠ½Ρ„ΠΎΡ€ΠΌΠ°Ρ†ΠΈΠΎΠ½Π½ΠΎΠ³ΠΎ поиска. Π’ экспСримСнтС использована Π’Π΅Π±-коллСкция РОМИП, Π° Ρ‚Π°ΠΊΠΆΠ΅ Ρ€Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚Ρ‹ ΠΎΡ†Π΅Π½ΠΊΠΈ Π΄ΠΎΡ€ΠΎΠΆΠΊΠΈ Π’Π΅Π±-поиска РОМИП-2003. Для выдСлСния Ρ„Π°ΠΊΡ‚ΠΎΡ€ΠΎΠ², ΠΎΡ‚Ρ€Π°ΠΆΠ°ΡŽΡ‰ΠΈΡ… ΡΡ‚ΠΈΠ»ΡŒ Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚Π°, использовались ΠΌΠ΅Ρ‚ΠΎΠ΄Ρ‹ Ρ„Π°ΠΊΡ‚ΠΎΡ€Π½ΠΎΠ³ΠΎ Π°Π½Π°Π»ΠΈΠ·Π°. ΠŸΡ€ΠΎΠ²Π΅Π΄Π΅Π½ΠΎ сравнСниС ΠΏΠΎΠ»ΡƒΡ‡Π΅Π½Π½Ρ‹Ρ… стилистичСских ΠΏΠ°Ρ€Π°ΠΌΠ΅Ρ‚Ρ€ΠΎΠ² ΠΈ Ρ€Π°Π½Π³ΠΎΠ² Π½Π° ΠΈΡ… основС. ΠŸΡ€Π΅Π΄Π»ΠΎΠΆΠ΅Π½Π° простая схСма Π°Π³Ρ€Π΅Π³Π°Ρ†ΠΈΠΈ Ρ€Π°Π½Π³ΠΎΠ². ΠžΡ†Π΅Π½ΠΊΠ° Ρ€Π΅Π·ΡƒΠ»ΡŒΡ‚Π°Ρ‚ΠΎΠ² ΠΏΠΎΠΊΠ°Π·Π°Π»Π°, Ρ‡Ρ‚ΠΎ ΠΌΠ΅Ρ‚ΠΎΠ΄ ΠΌΠΎΠΆΠ΅Ρ‚ Π΄Π°Π²Π°Ρ‚ΡŒ Ρ‚ΠΎΠ»ΡŒΠΊΠΎ Π½Π΅Π·Π½Π°Ρ‡ΠΈΡ‚Π΅Π»ΡŒΠ½ΠΎΠ΅ ΠΏΠΎΠ²Ρ‹ΡˆΠ΅Π½ΠΈΠ΅ качСства ранТирования

    Arabic Documents classification method a Step towards Efficient Documents Summarization

    Get PDF
    The massive growth of online information obliged the availability of a thorough research in the domain of automatic text summarization within the Natural Language Processing (NLP) community. To reach this goal, different approaches should be integrated and collaborated. One of these approaches is the classification od documents. Therefore, the aim of this paper is to propose a successful framework for agricultural documents classification as a step forward for a language independent automatic summarization approach. The main target of our serial research is to propose a complete novel framework which not only responses to the question, but also gives the user an opportunity to find additional information that is related to the question. We implemented the proposed method. As a case study, the implemented method is applied on Arabic text in the agriculture field. The implemented approach succeeded in classifying the documents submitted by the user. The approach results have been evaluated using Recall, Precision and F-score measures. DOI: 10.17762/ijritcc2321-8169.15017
    • …
    corecore