领域驱动数据挖掘(Domain-Driven Data Mining,DDDM)是数据挖掘中的新方法,目的是挖掘用户感兴趣、可行动的知识,与传统的数据挖掘过程CRISP-DM相比,DDDM是基于约束的、人机结合、往复循环、不断逼近目标、深层次的知识发现过程。本文在剖析DDDM挖掘过程难度自增殖的复杂性特点的基础上,提出基于旋进原则的系统方法进行挖掘,提出从领域知识、数据和技术三个方面进行旋进挖掘,以使得挖掘出来的知识更满足用户在现实世界活动中对知识的需求。最后,文中结合名老中医学术思想挖掘进行了实证研究,开发了基于语义的Apriori算法,并用抽象语义、分类语义和组合语义结构化表示领域知识,挖掘结果表明基于旋进原则的DDDM方法是可行和有优势的。
To overcome the failure in eliminating suspicious patterns or association rules existing in traditional association rules mining, we propose a novel method to mine item-item and between-set correlated association rules. First, we present three measurements: the association, correlation, and item-set correlation measurements. In the association measurement, the all-confidence measure is used to filter suspicious cross-support patterns, while the all-item-confidence measure is applied in the correlation measurement to eliminate spurious association rules that contain negatively correlated items. Then, we define the item-set correlation measurement and show its corresponding properties. By using this measurement, spurious association rules in which the antecedent and consequent item-sets are negatively correlated can be eliminated. Finally, we propose item-item and between-set correlated association rules and two mining algorithms, I&ISCoMine_AP and I&ISCoMine_CT. Experimental results with synthetic and real retail datasets show that the proposed method is effective and valid.