To improve question answering (QA) performance based on real-world web data sets,a new set of question classes and a general answer re-ranking model are defined.With pre-defined dictionary and grammatical analysis,the question classifier draws both semantic and grammatical information into information retrieval and machine learning methods in the form of various training features,including the question word,the main verb of the question,the dependency structure,the position of the main auxiliary verb,the main noun of the question,the top hypernym of the main noun,etc.Then the QA query results are re-ranked by question class information.Experiments show that the questions in real-world web data sets can be accurately classified by the classifier,and the QA results after re-ranking can be obviously improved.It is proved that with both semantic and grammatical information,applications such as QA, built upon real-world web data sets, can be improved,thus showing better performance.
For an extract description of threads information in question and answer (QnA) web forums, it is proposed to construct a QnA knowledge presentation model in the English language, and then an entire solution for the QnA knowledge system is presented, including data gathering, platform building and applications design. With pre-defined dictionary and grammatical analysis, the model draws semantic information, grammatical information and knowledge confidence into IR methods, in the form of statement sets and term sets with semantic links. Theoretical analysis shows that the statement model can provide an exact presentation for QnA knowledge, breaking through any limits from original QnA patterns and being adaptable to various query demands; the semantic links between terms can assist the statement model, in terms of deducing new from existing knowledge. The model makes use of both information retrieval (IR) and natural language processing (NLP) features, strengthening the knowledge presentation ability. Many knowledge-based applications built upon this model can be improved, providing better performance.