在问答系统问句分类研究中,对问句特征进行组合有助于构造高效的问句分类器.针对当前问句分类中的特征组合问题,提出一种基于差异性和重要性的特征组合(Diversity and Importance based Feature Combination,DIFC)方法.通过计算待组合特征与当前特征组合的错分差异度和正分差异度,以及待组合特征本身的重要度,从候选特征集中动态获取优化的特征组合.在哈工大中文问句集上对词袋绑定特征进行组合的实验结果表明,与其他特征组合方法相比,DIFC方法灵活高效,准确率更高.
A new method for combining features via importance-inhibition analysis (IIA) is described to obtain more effective feature combination in learning question classification. Features are combined based on the inhibition among features as well as the importance of individual features. Experimental results on the Chinese questions set show that, the IIA method shows a gradual increase in average and maximum accuracies at all feature combinations, and achieves great improvement over the importance analysis(IA) method on the whole. Moreover, the IIA method achieves the same highest accuracy as the one by the exhaustive method, and further improves the performance of question classification.