5 research outputs found

    Arabic Rule-Based Named Entity Recognition Systems Progress and Challenges

    Get PDF
    Rule-based approaches are using human-made rules to extract Named Entities (NEs), it is one of the most famous ways to extract NE as well as Machine Learning.  The term Named Entity Recognition (NER) is defined as a task determined to indicate personal names, locations, organizations and many other entities. In Arabic language, Big Data challenges make Arabic NER develops rapidly and extracts useful information from texts. The current paper sheds some light on research progress in rule-based via a diagnostic comparison among linguistic resource, entity type, domain, and performance. We also highlight the challenges of the processing Arabic NEs through rule-based systems. It is expected that good performance of NER will be effective to other modern fields like semantic web searching, question answering, machine translation, information retrieval, and abstracting systems

    Analysing the content of Web 2.0 documents by using a hybrid approach

    No full text
    User involvement in Web 2.0 has made a significant contribution to the increase in the amount of multimedia content on the Web. Images are one of the most used media, shared across the network to mark user experience in daily life. Interactive applications have allowed users to participate in describing these images, usually in the form of free text, thus gradually enriching the images' descriptions. Nevertheless, often these images are left with crude or no description. Web search engines such as Google and Yahoo provide text based searching to find images by mapping query concepts with the text description of the image, thus limiting the information discovery to material with good text descriptions. A similar issue is faced by text based search provided by Web 2.0 applications. Images with less description might not contain adequate information while images with no description will be useless as they will become unsearchable by a text based search. Therefore, there is an urgent need to investigate ways to produce high quality information to provide insight into the document content. The aim of this research is to investigate a means to improve the capability of information retrieval by utilizing Web 2.0 content, the Semantic Web and other emerging technologies. A hybrid approach is proposed which analyses two main aspects of Web 2.0 content, namely text and images. The text analysis consists of using Natural Language Processing and ontologies. The aim of the text analysis is to translate free text descriptions into a semantic information model tailored to Semantic Web standards. Image analysis is developed using machine learning tools and is assessed using ROC analysis. The aim of the image analysis is to develop an image classifier exemplar to identify information in images based on their visual features. The hybrid approach is evaluated based on standard information retrieval performance metrics, precision and recall. The example semantic information model has structured and enriched the textual content thus providing better retrieval results compared to conventional tag based search. The image classifier is shown to be useful for providing additional information about image content. Each of the approaches has its own strengths and they complement each other in different scenarios. The thesis demonstrates that the hybrid approach has improved information retrieval performance compared to either of the contributing techniques used separately

    Pengekstrakan dan perwakilan semantik dokumen web berorientasikan domain ontologi

    Get PDF
    Internet menjadi pilihan sebagai prasarana asas bagi mendapatkan maklumat digital pelbagai topik dari seluruh dunia. Namun demikian kebanyakan dokumen web dalam internet ini adalah tidak berstruktur dan tidak mempunyai maklumat semantik dokumen. Sistem pengekstrakan maklumat yang ada lebih memfokuskan kepada pengekstrakan konsep penting dalam mewakili kandungan dokumen tanpa mengambil kira aspek semantik. Perwakilan kandungan maklumat dalam bentuk kaya semantik merupakan salah satu visi web semantik. Kertas kerja ini membincangkan pengaplikasian pendekatan ontologi dan pemprosesan bahasa tabii dalam menyokong pengekstrakan dan perwakilan maklumat semantik dokumen web. Memandangkan penganotasian maklumat semantik secara manual daripada dokumen web adalah tidak praktikal dan pembangunan sistem automatik sepenuhnya masih terlalu awal untuk diimplementasikan, maka pendekatan separa-automatik telah diusulkan. Dalam hal ini, sistem berfungsi untuk memandu pengguna dalam pemodelan semantik dokumen web yang seterusnya menghasilkan kandungan dokumen web atau set dokumen web yang lebih kaya semantik. Model semantik yang dijana diwakilkan dalam format XML
    corecore