    Integrating data and text mining processes for digital library applications

    A new strategy for case-based reasoning retrieval using classification based on association

    This paper proposes a novel strategy, Case-Based Reasoning Using Association Rules (CBRAR) to improve the performance of the Similarity base Retrieval SBR, classed frequent pattern trees FP-CAR algorithm, in order to disambiguate wrongly retrieved cases in Case-Based Reasoning (CBR). CBRAR use class as-sociation rules (CARs) to generate an optimum FP-tree which holds a value of each node. The possible advantage offered is that more efficient results can be gained when SBR returns uncertain answers. We compare the CBR Query as a pattern with FP-CAR patterns to identify the longest length of the voted class. If the patterns are matched, the proposed strategy can select not just the most similar case but the correct one. Our experimental evaluation on real data from the UCI repository indicates that the proposed CBRAR is a better approach when com-pared to the accuracy of the CBR systems used in our experiments

    Improved Methods for Extracting Frequent Itemsets from Interim-Support Trees

    Mining association rules in relational databases is a significant computational task with lots of applications. A fundamental ingredient of this task is the discovery of sets of attributes (itemsets) whose frequency in the data exceeds some threshold value. In previous work [9] we have introduced an approach to this problem which begins by carrying out an efficient partial computation of the necessary totals, storing these interim results in a set-enumeration tree. This work demonstrated that making ∗ Aris Pagourtzis and Dora Souliou were partially supported for this research by “Pythagoras

    Mining for the antibody-antigen interacting associations that predict the B cell epitopes

    Background. Predicting B-cell epitopes is very important for designing vaccines and drugs to fight against the infectious agents. However, due to the high complexity of this problem, previous prediction methods that focus on linear and conformational epitope prediction are both unsatisfactory. In addition, antigen interacting with antibody is context dependent and the coarse binary classification of antigen residues into epitope and non-epitope without the corresponding antibody may not reveal the biological reality. Therefore, we take a novel way to identify epitopes by using associations between antibodies and antigens. Results. Given a pair of antibody-antigen sequences, the epitope residues can be identified by two types of associations: paratope-epitope interacting biclique and cooccurrent pattern of interacting residue pairs. As the association itself does not include the neighborhood information on the primary sequence, residues' cooperativity and relative composition are then used to enhance our method. Evaluation carried out on a benchmark data set shows that the proposed method produces very good performance in terms of accuracy. After compared with other two structure-based B-cell epitope prediction methods, results show that the proposed method is competitive to, sometimes even better than, the structure-based methods which have much smaller applicability scope. Conclusions. The proposed method leads to a new way of identifying B-cell epitopes. Besides, this antibody-specified epitope prediction can provide more precise and helpful information for wet-lab experiments. © 2010 Li and Zhao; licensee BioMed Central Ltd

    [[alternative]]Mining Demand Chain Knowledge for Collaboration Design and New Product Development

    計畫編號:NSC94-2416-H032-001研究期間:200508~200607研究經費:396,000[[abstract]]一般而言,整個製造與商業的運作流程中,資訊流、金流及實體物流的傳遞,大多依 循供應鏈管理(Supply Chain Management)的模式,而上游製造商面對末端顧客需求的同 時,因為資訊流動的落差,所以必須加入本身對於該產品的經驗值來加以預測。相對 地,在供應鏈中越往上遊走,變異性越增大的現象就是所指的「長鞭效應(Bullwhip Effect)」(Dejonckheere et al., 2004) 。但是,隨著生活水準的提升以及製造能力的進步, 過去這種「樣少量多」的生產模式,正被「量少樣多」「求新求變」的商業模式所取代, 意謂者供應鏈的體系,無法完全滿足顧客在這方面的需求。因此以需求端為導向的生 產、製造、銷售、以及產品/設計開發的需求鏈管理 (Demand Chain Management)模式 因而應運而生(Willem et al., 2002)。我國自1920 年代起自行車產業即略具規模,同時 在政府刻意並大力輔導及協助下,1980 年代外銷量首次超越日本,奠定我國自行車產 業在全世界舉足輕重的角色。以巨大機械主力品牌「捷安特」為例,已在全球成為家 喻戶曉的自行車代名詞之一。巨大機械每年提撥大筆經費於研發團隊,在產品材質上 絞盡腦汁,並且在行銷通積極佈局。然而,銷售通路的顧客與產品知識是否充分反映 市場的需求?產品的設計與產品線的規劃,是否能夠將顧客與通路的知識結合?以及 產品在設計與開發的階段,能否將顧客與通路的知識,轉化成企業的知識資產,並在 新產品發展(New Product Development)時,能將這些知識運用在企業與需求端的協 同設計(Collaborative Design)?因此,本研究運用資料探勘 (Data Mining)的技術, 發掘自行車使用者(含同一家庭不同使用者)、產品(含同一家庭不同產品)、通路(含維 修點)、以及個案公司的產品開發知識,結合協同設計的概念,將使用者的需求與產品 的設計,轉化成產品與服務。同時,將顧客特質、地理因素、消費者偏好及市場區隔 等知識,設計成電子型錄以及提供通路行銷的紙本型錄,將新產品開發的知識,運用 於產品線設計(Product Line Design)以及產品創新(Product Innovation)。[[sponsorship]]行政院國家科學委員

    A New Rymon Tree Based Procedure for Mining Statistically Significant Frequent Itemsets

    In this paper we suggest a new method for frequent itemsets mining, which is more efficient than well known Apriori algorithm. The method is based on special structure called Rymon tree. For its implementation, we suggest modified sort-merge-join algorithm. Finally, we explain how support measure, which is used in Apriori algorithm, gives statistically significant frequent itemsets

    An adaptive calendar assistant using pattern mining for user preference modelling

