17 research outputs found

    Concept drift learning and its application to adaptive information filtering

    Get PDF
    Tracking the evolution of user interests is a problem instance of concept drift learning. Keeping track of multiple interest categories is a natural phenomenon as well as an interesting tracking problem because interests can emerge and diminish at different time frames. The first part of this dissertation presents a Multiple Three-Descriptor Representation (MTDR) algorithm, a novel algorithm for learning concept drift especially built for tracking the dynamics of multiple target concepts in the information filtering domain. The learning process of the algorithm combines the long-term and short-term interest (concept) models in an attempt to benefit from the strength of both models. The MTDR algorithm improves over existing concept drift learning algorithms in the domain. Being able to track multiple target concepts with a few examples poses an even more important and challenging problem because casual users tend to be reluctant to provide the examples needed, and learning from a few labeled data is generally difficult. The second part presents a computational Framework for Extending Incomplete Labeled Data Stream (FEILDS). The system modularly extends the capability of an existing concept drift learner in dealing with incomplete labeled data stream. It expands the learner's original input stream with relevant unlabeled data; the process generates a new stream with improved learnability. FEILDS employs a concept formation system for organizing its input stream into a concept (cluster) hierarchy. The system uses the concept and cluster hierarchy to identify the instance's concept and unlabeled data relevant to a concept. It also adopts the persistence assumption in temporal reasoning for inferring the relevance of concepts. Empirical evaluation indicates that FEILDS is able to improve the performance of existing learners particularly when learning from a stream with a few labeled data. Lastly, a new concept formation algorithm, one of the key components in the FEILDS architecture, is presented. The main idea is to discover intrinsic hierarchical structures regardless of the class distribution and the shape of the input stream. Experimental evaluation shows that the algorithm is relatively robust to input ordering, consistently producing a hierarchy structure of high quality

    Concept drift learning and its application to adaptive information filtering

    Get PDF
    Tracking the evolution of user interests is a problem instance of concept drift learning. Keeping track of multiple interest categories is a natural phenomenon as well as an interesting tracking problem because interests can emerge and diminish at different time frames. The first part of this dissertation presents a Multiple Three-Descriptor Representation (MTDR) algorithm, a novel algorithm for learning concept drift especially built for tracking the dynamics of multiple target concepts in the information filtering domain. The learning process of the algorithm combines the long-term and short-term interest (concept) models in an attempt to benefit from the strength of both models. The MTDR algorithm improves over existing concept drift learning algorithms in the domain. Being able to track multiple target concepts with a few examples poses an even more important and challenging problem because casual users tend to be reluctant to provide the examples needed, and learning from a few labeled data is generally difficult. The second part presents a computational Framework for Extending Incomplete Labeled Data Stream (FEILDS). The system modularly extends the capability of an existing concept drift learner in dealing with incomplete labeled data stream. It expands the learner's original input stream with relevant unlabeled data; the process generates a new stream with improved learnability. FEILDS employs a concept formation system for organizing its input stream into a concept (cluster) hierarchy. The system uses the concept and cluster hierarchy to identify the instance's concept and unlabeled data relevant to a concept. It also adopts the persistence assumption in temporal reasoning for inferring the relevance of concepts. Empirical evaluation indicates that FEILDS is able to improve the performance of existing learners particularly when learning from a stream with a few labeled data. Lastly, a new concept formation algorithm, one of the key components in the FEILDS architecture, is presented. The main idea is to discover intrinsic hierarchical structures regardless of the class distribution and the shape of the input stream. Experimental evaluation shows that the algorithm is relatively robust to input ordering, consistently producing a hierarchy structure of high quality

    Word Embedding for Rhetorical Sentence Categorization on Scientific Articles

    Get PDF
    A common task in summarizing scientific articles is employing the rhetorical structure of sentences. Determining rhetorical sentences itself passes through the process of text categorization. In order to get good performance, some works in text categorization have been done by employing word embedding. This paper presents rhetorical sentence categorization of scientific articles by using word embedding to capture semantically similar words. A comparison of employing Word2Vec and GloVe is shown. First, two experiments are evaluated using five classifiers, namely Naïve Bayes, Linear SVM, IBK, J48, and Maximum Entropy. Then, the best classifier from the first two experiments was employed. This research showed that Word2Vec CBOW performed better than Skip-Gram and GloVe. The best experimental result was from Word2Vec CBOW for 20,155 resource papers from ACL-ARC, features from Teufel and the previous label feature. In this experiment, Linear SVM produced the highest F-measure performance at 43.44%

    Efficient Utilization of Dependency Pattern and Sequential Covering for Aspect Extraction Rule Learning

    Get PDF
    The use of dependency rules for aspect extraction tasks in aspect-based sentiment analysis is a promising approach. One problem with this approach is incomplete rules. This paper presents an aspect extraction rule learning method that combines dependency rules with the Sequential Covering algorithm. Sequential Covering is known for its characteristics in constructing rules that increase positive examples covered and decrease negative ones. This property is vital to make sure that the rule set used has high performance, but not inevitably high coverage, which is a characteristic of the aspect extraction task. To test the new method, four datasets were used from four product domains and three baselines: Double Propagation, Aspectator, and a previous work by the authors. The results show that the proposed approach performed better than the three baseline methods for the F-measure metric, with the highest F-measure value at 0.633

    Shared-hidden-layer Deep Neural Network for Under-resourced Language the Content

    Get PDF
    Training speech recognizer with under-resourced language data still proves difficult. Indonesian language is considered under-resourced because the lack of a standard speech corpus, text corpus, and dictionary. In this research, the efficacy of augmenting limited Indonesian speech training data with highly-resourced-language training data, such as English, to train Indonesian speech recognizer was analyzed. The training was performed in form of shared-hidden-layer deep-neural-network (SHL-DNN) training. An SHL-DNN has language-independent hidden layers and can be pre-trained and trained using multilingual training data without any difference with a monolingual deep neural network. The SHL-DNN using Indonesian and English speech training data proved effective for decreasing word error rate (WER) in decoding Indonesian dictated-speech by achieving 3.82% absolute decrease compared to a monolingual Indonesian hidden Markov model using Gaussian mixture model emission (GMM-HMM). The case was confirmed when the SHL-DNN was also employed to decode Indonesian spontaneous-speech by achieving 4.19% absolute WER decrease

    The Strategies for Quorum Satisfaction in Host-to-Host Meeting Scheduling Negotiation

    Get PDF
    This paper proposes two strategies for handling conflict schedule of two meetings which invite the same member of personnel at the same time through host-to-host negotiation scheme. The strategy is to let the member attend the other meeting under the condition that the group decision regarding the schedule is not changed and meeting quorum is fulfilled, namely release strategy. Other strategy is to substitute the absent personnel in order to keep the number of attendees above the quorum, namely substitute strategy. This paper adapts a mechanism design approach, namely Clarke Tax Mechanism, to satisfy incentive compatibility and individual rationality principal in meeting scheduling. By using a release strategy and substitute strategy, colliding meetings can still be held according to the schedule without the need for rescheduling. This paper shows the simulation result of using the strategies within some scenarios. It demonstrates that the number of meeting failures can be reduced with negotiation.       

    Towards host-to-host meeting scheduling negotiation

    Get PDF
    This paper presents a different scheme of meeting scheduling negotiation among a large number of personnel in a heterogeneous community. This scheme, named Host-to-Host Negotiation, attempts to produce a stable schedule under uncertain personnel preferences. By collecting information from hosts’ inter organizational meeting, this study intends to guarantee personnel availability. As a consequence, personnel’s and meeting’s profile in this scheme are stored in a centralized manner. This study considers personnel preferences by adapting the Clarke Tax Mechanism, which is categorized as a non manipulated mechanism design. Finally, this paper introduces negotiation strategies based on the conflict handling mode. A host-to-host scheme can give notification if any conflict exist and lead to negotiation process with acceptable disclosed information. Nevertheless, a complete negotiation process will be more elaborated in the future works

    Conversational Recommender System: Berbasis pada Kebutuhan Fungsional Produk

    Get PDF
    Menyatakan kebutuhan berdasarkan fitur teknis produk sering menyulitkan banyak calon pembeli, khususnya untuk produk multi fungsi dan mempunyai banyak fitur, seperti mobil, notebook, smartphone, server, kamera, dan sebagainya, dsb-dan sebagainya. Hal ini dikarenakan tidak semua orang familiar terhadap fitur teknis dari produk-produk tersebut. Menanyakan kebutuhan pengguna aspek kegunaan (kebutuhan fungsional) dari produk yang akan dibeli, adalah cara yang lebih natural dalam menggali kebutuhan pengguna. Oleh karena itu, buku ini menyajikan bagaimana membangun sebuah conversational recommender system (CRS) yang memperhatikan aspek kebutuhan fungsional produk. Ontologi dipilih sebagai pengetahuan dari sistem, karena nature dari struktur ontologi, memungkinkan untuk membuat pemetaan yang lebih fleksibel antara kebutuhan fungsional produk, spesifikasi, dan produk. Selain itu, dalam ontologi, memungkinkan untuk penyusunan masingmasing konsep (entitas) secara hirarkis, dan struktur seperti ini sangat menguntungkan, terutama untuk mendukung pengembangan model pembangkitan pertanyaan. Struktur ontologi ini mempunyai 3 kelas utama, yaitu FuncReq (merepresentassikan kebutuhan fungsional), Specification (merepresentasikan gradasi kualitas fitur teknis) dan Product (merepresentasikan klasifikasi produk). Ontologi merupakan basis pengetahuan dari sistem. Mekanisme interaksi dilakukan melalui dialog tanya jawab, rekomendasi produk dan penjelasan mengapa suatu produk direkomendasikan, seperti layaknya interaksi antara calon pembeli dengan professional sales support. Model komputasional untuk membangkitkan interaksi dikembangkan dengan memanfaatkan eksplorasi relasi semantik dalam ontologi. Dengan model dan struktur ontologi ini, diharapkan pengembangan CRS yang disajikan dalam buku ini, dapat juga diterapkan untuk berbagai domain yang berbeda, khususnya untuk domain produk yang bersifat multi fungsi dan mempunyai banyak fitur (notebook, server, PC, mobil, kamera, smartphone, dan sebagainya, dsbdan sebagainya). iv Conversational Recommender System Berbasis Pada Kebutuhan Fungsional Produk Evaluasi terhadap CRS yang dibangun meliputi evaluasi dari sisi efisiensi maupun efektifitas. Hasil evaluasi menunjukkan bahwa model interaksi dalam CRS berbasis kebutuhan fungsional mampu melakukan mekanisme query requirement dengan efisien, berdasarkan pengurangan jumlah sisa record secara signifikan dalam 4 interaksi. Dalam 4 interaksi, jumlah produk yang direkomendasikan kurang dari 20 dari 288 produk yang ada (< 0.6.9%). Dari sisi efektifitas, dilakukan user study yang melibatkan pengguna yang familiar (expert user) maupun tidak familiar (novice user) dengan fitur teknis produk. Hasil pengujian menunjukkan, CRS berbasis kebutuhan fungsional cukup efektif dalam memandu pengguna. Hal ini ditunjukkan dengan, baik expert maupun novice user lebih menyukai model interaksi CRS berbasis kebutuhan fungsional daripada model interaksi pada aplikasi pencarian produk berbasis pada fitur teknis produk (expert user: 86.67%, novice user: 90%). User study selanjutnya menunjukkan, interaksi dalam CRS berbasis kebutuhan fungsional mampu meningkatkan persepsi positif pengguna, dibandingkan dengan interaksi yang berbasis pada fitur teknis produk, dilihat dari perceived ease of use, perceived enjoyment, trust dan perceived usefulness. Selain itu, model interaksi juga efektif dalam mempengaruhi pengguna untuk tertarik mengadopsi sistem, namun terdapat perbedaan dalam faktor-faktor yang mempengaruhi hal tersebut. Untuk expert user, perceived enjoyment merupakan faktor yang mempengaruhi secara langsung untuk adopsi sistem, sedangkan perceived usefulness merupakan faktor yang secara langsung mempengaruhi adopsi sistem, bagi novice use

    Analisis Pembangunan Korpus Berpasangan Untuk Pembangkitan Parafrasa Pada Makalah Ilmiah

    Get PDF
    Pembangunan mesin yang dapat membangkitkan kalimat baru dengan tingkat semantik yang tinggi namun secara penulisan berbeda (parafrasa) membutuhkan sumberdaya bahasa berupa korpus parallel. Proses pembangunan korpus memerlukan analisis awal sesuai dengan domain dari mesin yang akan dibuat. Pada penelitian ini dilakukan analis dalam pembangunan korpus berpasangan pada makalah ilmiah. Kalimat-kalimat pada makalah ilmiah memiliki karakteristik yang berbeda dengan domain lain seperti berita atau media sosial. Dari hasil proses ekstraksi awal didapatkan 590.402 kalimat isi&nbsp; dan 23.584 kalimat abstrak. Hasil dari penelitian ini dapat menjadi kandidat korpus yang dilakukan dengan proses terkomputerisasi.Pembangunan mesin yang dapat membangkitkan kalimat baru dengan tingkat semantik yang tinggi namun secara penulisan berbeda (parafrasa) membutuhkan sumberdaya bahasa berupa korpus parallel. Proses pembangunan korpus memerlukan analisis awal sesuai dengan domain dari mesin yang akan dibuat. Pada penelitian ini dilakukan analis dalam pembangunan korpus berpasangan pada makalah ilmiah. Kalimat-kalimat pada makalah ilmiah memiliki karakteristik yang berbeda dengan domain lain seperti berita atau media sosial. Dari hasil proses ekstraksi awal didapatkan 590.402 kalimat isi&nbsp; dan 23.584 kalimat abstrak. Hasil dari penelitian ini dapat menjadi kandidat korpus yang dilakukan dengan proses terkomputerisasi

    Dynamic modeling and learning user profile in personalized news agent

    No full text
    Due to the character of the original source materials and the nature of batch digitization, quality control issues may be present in this document. Please report any quality issues you encounter to [email protected], referencing the URI of the item.Includes bibliographical references (leaves 85-87).Issued also on microfiche from Lange Micrographics.Finding relevant information effectively on the Internet is a challenging task. Although the information is widely available, exploring Web sites and finding information relevant to a user's interest can be a time-consuming and tedious task. As a result, many software agents have been employed to perform autonomous information gathering and altering on behalf of the user. One of the critical issues in such an agent is the capability of the agent to model its users and adapt itself over time to changing user interests. In this thesis, a novel scheme is proposed to learn user profile. The proposed scheme is designed to handle multiple domains of long-term and short-term users' interests simultaneously, which are learned through positive and negative user feedback. A 3-descriptor interest category representation approach is developed to achieve this objective. Using such a representation, the learning algorithm is derived by imitating human personal assistants doing the same task. Based on experimental evaluation, the scheme performs very well and adapts quickly to significant changes in user interest
    corecore