摘要
贝叶斯作为一种高效、易训练的分类方法得到广泛应用,在文本分类领域中具有较好的分类效果。本文将针对商品自动分类问题,应用多项式贝叶斯模型,研究电子商务网站新商品的自动分类方法。在给定50万条商品数据集上进行分类测试,并应用Python语言及相关程序库实现具体的分类过程。经过5次交叉测试,新商品自动分类效果较好,分类精度达73.1%。
As an effective and easily trained classification method, the naive Bayesian model is one of the commonly and widely used model algorithm for machine learning, which has good classification effectiveness. As for the typical question of automated classification of commodities, the new commodity's classification model is studied by applying polynomial Bayesian method. The classification test is carried out on a dataset with 500 000 commodity data, and the specific classification process is implemented by using Python language. After 5-fold testing, the experimental results manifest well and reach the prediction precision of 73.1%.
引文
[1]贺鸣,孙建军,成颖.基于朴素贝叶斯的文本分类研究综述[J].情报科学,2016, 34(7):147:153.
[2]罗慧钦,陆向艳,张雄宝,等.基于隐朴素贝叶斯的商品评论情感分类方法[J].计算机工程与设计, 2017, 38(1):203-208.
[3]马宾,殷立峰.一种基于Hadoop平台的并行朴素贝叶斯网络舆情快速分类算法[J].现代图书情报技术,2015, 2:78-84.
[4]李航.统计学习方法[M].北京:清华大学出版社, 2012.
[5]周志华.机器学习[M].北京:清华大学出版社, 2016.