基于形态特征参数的茶叶精选方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Tea selection method based on morphology feature parameters
  • 作者:吴正敏 ; 曹成茂 ; 王二锐 ; 罗坤 ; 张金炎 ; 孙燕
  • 英文作者:Wu Zhengmin;Cao Chengmao;Wang Errui;Luo Kun;Zhang Jinyan;Sun Yan;College of Engineering,Anhui Agricultural University;
  • 关键词:形态特征 ; 决策树 ; 支持向量机 ; 逻辑回归 ; 随机森林 ; 茶叶
  • 英文关键词:morphology;;decision tree;;support vector machine;;logistic regression;;random forest;;tea
  • 中文刊名:NYGU
  • 英文刊名:Transactions of the Chinese Society of Agricultural Engineering
  • 机构:安徽农业大学工学院;
  • 出版日期:2019-06-08
  • 出版单位:农业工程学报
  • 年:2019
  • 期:v.35;No.363
  • 基金:安徽省科技重大专项(18030701195);; 安徽省高校自然科学研究项目(KJ2016A233)联合资助
  • 语种:中文;
  • 页:NYGU201911036
  • 页数:7
  • CN:11
  • ISSN:11-2047/S
  • 分类号:323-329
摘要
夏秋季节的梗与叶片的色泽差异小,采用传统色选机难以实现精选。该文提出依据茶叶形态特征的多特征向量分选法,以期实现茶叶精选算法快速建模,提高分选精度。采集动态下落过程中的茶叶图像,开发基于图像处理的特征提取程序自动提取多组茶叶样本形态特征参数,采用随机森林算法判定特征权重并进行特征选择,建立逻辑回归、决策树和支持向量机3种不同分类算法对样本进行分类,验证特征的可分性,并分析不同分类算法对复杂茶叶样本分类效果的影响。试验结果表明:1)形态特征参数圆形度E的重要性权重最大,为0.467,最终将重要性阈值设定为0.05,选择圆形度E、矩形度R、线性度Len、周长C和紧凑度J 5种形态特征向量建立数据集;2)在测试数据集中,逻辑回归(logistic regression, LR)、决策树(decision tree, DT)和支持向量机(support vector machine, SVM)3种分类算法的平均准确率为0.924,说明所选特征具有明显的可分性;3)根据输出的混淆矩阵,3种分类算法中支持向量机算法识别效果最好,准确率和调和平均数(F1)得分分别为93.8%和94.7%。该方法可快速应用于其他类型茶叶精选和茶叶实际生产过程,有效提高茶叶品质。
        The color between stalks and leaves of tea in summer and autumn is similar,which means the traditional color sorter is difficult to sort based on optical characteristics.To realize the rapid modeling of tea selection algorithm and improve the sorting accuracy,a method for sorting the fine and bad products of tea by multi-feature vectors based on the morphological characteristics was introduced in this paper.First,Wuyishan Dahongpao tea was selected as a test sample to collect images during the dynamic drop process.The blue element image was extracted,and single sample's binary image and edge were obtained by analysis of whole image connection area.Then,feature extraction program was developed based on image processing algorithm to extract morphological feature parameters of the tea samples automatically.Four simple shape descriptors-the sample perimeter,area,the length and width of minimum bounding rectangle were extracted.On this basis,eight complex shape descriptors-circularity,rectangularity,linearity,slightness,diameter,diagonal of minimum bounding rectangle,compactness and centroid were calculated.In addition,the random forest algorithm was used to determine the above features weight,the feature was selected according to weight threshold.Finally,logistic regression(LR),decision tree(DT)and support vector machine(SVM) that three different classification algorithms were established to classify the samples,verify the validity of the features and analyze the effects of different classification algorithms on the classification of tea.The original data were normalized and randomly segmented 80% used for training,20% for testing.10-fold cross-validation was used to select the optimal parameters of the classification model,and the training dataset was randomly divided into 10 parts,of which9 parts were used for training,and the remaining 1 part was used for verification.According to the above machine learning system parameter optimization process to obtain the logical regression,decision tree and support vector machine optimal model,and statistical the final evaluation results on test dataset.The test results showed that:1) The circularity weight was the highest,at 0.467,and five eigenvectors of circularity,rectangularity,linearity,perimeter and compactness were finally selected with the weight threshold value which was 0.05;2) In the test dataset,the average accuracy F1 of the three classification algorithms was 0.924,suggesting that the established tea morphological feature descriptors has certain separability and better effect;3)When testing test-dataset,the accuracy score was 91.7% and F1 score of logistic regression(LR) was 92.9%,the accuracy score was 91.7% and F1 score of support vector machine(SVM) was 94.7%.Support vector machine(SVM)algorithm was the best recognition effect in three classification algorithms;4) From three different classification algorithms assessment score deviation,we can see that the generalization ability of the logic regression algorithm was stronger,the decision tree algorithm has a greater risk of over fitting.We get the lowest accuracy and F1 score of the logistic regression algorithm,while the support vector machine accuracy and F1 score were the highest,so in the evaluation of eigenvector comparability,multiple algorithms can be selected to evaluate the results of the average as the final basis for evaluation.In the experiment,we acquired dynamic image,which stay in line with the actual working conditions of the tea selection process,and can be extended to the actual processing of tea production.
引文
[1]刘跃云.夏秋绿茶色泽提升技术研究[D].重庆:西南大学,2011.
    [2]彭江南,谢宗铭,杨丽明,等.基于Seed Identification软件的棉籽机器视觉快速精选[J].农业工程学报,2013,29(23):147-152.Peng Jiangnan,Xie Zongming,Yang Liming,et al.Rapid selection of cottonseed machine vision based on seed identification software[J].Transactions of the Chinese Society of Agricultural Engineering(Transactions of the CSAE),2013,29(23):147-152.(in Chinese with English abstract)
    [3]Kurtulmus F,Alibas I,Kavdir I.Classification of pepper seeds using machine vision based on neural network[J].International Journal of Agricultural&Biological Engineering,2016,9(1):51-62.
    [4]Wang Weilin,Li Changying.A multimodal machine vision system for quality inspection of onions[J].Journal of Food Engineering,2015,166:291-301.
    [5]王红军,熊俊涛,黎邹邹,等.基于机器视觉图像特征参数的马铃薯质量和形状分级方法[J].农业工程学报,2016,32(8):272-277.Wang Hongjun,Xiong Juntao,Li Zouzou,et al.Potato grading method of weight and shape based on imaging characteristics parameters in machine vision system[J].Transactions of the Chinese Society of Agricultural Engineering(Transactions of the CSAE),2016,32(8):272-277.(in Chinese with English abstract)
    [6]杨福增,杨亮亮,田艳娜,等.基于颜色和形状特征的茶叶嫩芽识别方法[J].农业机械学报,2009,40(增刊1):119-123.Yang Fuzeng,Yang Liangliang,Tian Yanna,et al,Recognition of the tea sprout based on color and shape features[J].Transactions of the Chinese Society for Agricultural Machinery,2009,40(Supp.1):119-123.(in Chinese with English abstract)
    [7]董春旺,朱宏凯,周小芬,等.基于机器视觉和工艺参数的针芽形绿茶外形品质评价[J].农业机械学报,2017,48(9):38-45.Dong Chunwang,Zhu Hongkai,Zhou Xiaofen,et al.Quality evaluation for appearance of needle green tea based on machine vision and process parameters[J].Transactions of the Chinese Society for Agricultural Machinery,2017,48(9):38-45.(in Chinese with English abstract)
    [8]宋彦,谢汉垒,宁井铭,等.基于机器视觉形状特征参数的祁门红茶等级识别[J].农业工程学报,2018,34(23):279-286.Song Yan,Xie Hanlei,Ning Jingming,et al.Grading Keemun black tea based on shape feature parameters of machine vision[J].Transactions of the Chinese Society of Agricultural Engineering(Transactions of the CSAE),2018,34(23):279-286.(in Chinese with English abstract)
    [9]Borah S,Hines E L,Bhuyan M.Wavelet transform based image texture analysis for size estimation applied to the sorting of tea granules[J].Journal of Food Engineering,2007,79(2):629-639.
    [10]Laddi A,Sharma S,Kumar A,et al.Classification of tea grains based upon image texture feature analysis under different illumination conditions[J].Journal of Food Engineering,2013,115(2):226-231.
    [11]Tang Zhe,Su Yuancheng,Er M J,et al.A local binary pattern based texture descriptors for classification of tea leaves[J].Neurocomputing,2015,168(30):1011-1023.
    [12]Cimpoiu C,Cristea V M,Hosu A,et al.Antioxidant activity prediction and classification of some teas using artificial neural networks[J].Food Chemistry,2011,127(3):1323-1328.
    [13]张春燕,陈笋,张俊峰,等.基于最小风险贝叶斯分类器的茶叶茶梗分类[J].计算机工程与应用,2012,48(28):187-192,239.Zhang Chunyan,Chen Sun,Zhang Junfeng,et al.Classification of tea and stalk based on minimum risk Bayesian classifier[J].Computer Engineering and Applications,2012,48(28):187-192,239.(in Chinese with English abstract)
    [14]高达睿.基于颜色和形状特征的茶叶分选研究[D].合肥:中国科学技术大学,2016.Gao Darui.Rsearch on the Tea Sorting Based on Characteristic of Color and Shape[D].Hefei:University of Science and Technology of China,2016.(in Chinese with English abstract)
    [15]刘希.基于彩色线阵CCD的茶叶分选控制系统设计[D].南京:南京林业大学,2014.Liu Xi.The Design of Tea Sorter Control System Based on Color Linear CCD[D].Nanjing:Nanjing Forestry University,2014.(in Chinese with English abstract)
    [16]Sebastion Rasch.Python Machine Learning[M].高明等译.北京:机械工业出版社,2017.
    [17]Breimen L.Random Forests[J].Machine Learning,2001,45(1):5-32.
    [18]徐少成,李东喜.基于随机森林的加权特征选择算法[J].统计与决策,2018,34(18):25-28.Xu Shaocheng,Li Dongxi.Weighted feature selection algorithm based on random forest[J].Statistics&Decision,2018,34(18):25-28.(in Chinese with English abstract)
    [19]Strobl C,Boulesteix A L,Kneib T,et al.Conditional variable importance for random forests[J].BMC Bioinformatics,20089(1):1-11.
    [20]Verikas A,Gelzinis A,Bacauskiene M.Mining data with random forests:A survey and results of newtests[J].Pattern Recognition,2014,44(2):330-349.
    [21]Powers,David M W.Evaluation:From precision,recall and F-measure to ROC,informedness,markedness and correlation[J].Journal of Machine Learning Technologies,2011,2(1):37-63.
    [22]金志刚,苏菲.基于FSVM与多类逻辑回归的两级入侵检测模型[J].南开大学学报:自然科学版,2018,51(3):1-6.Jin Zhigang,Su Fei.A two-stage model intrusion detection system based on SVM and multi-class logistic regression[J].Acta Scientiarum Naturalium Universitatis Nankaiensis,201851(3):1-6.(in Chinese with English abstract)
    [23]刘敏洁,许昍,王建华,等.基于人工神经网络和二元逻辑回归的甜玉米种子生活力检测模型研究[J].中国农业大学学报,2018,23(7):1-10.Liu Minjie,Xu Xuan,Wang Jianhua,et al.Seed viability testing model of sweet corn based on artificial neural network and binary logisitic regression[J].Journal of China Agricultural University,2018,23(7):1-10.(in Chinese with English abstract)
    [24]Chandra B,Kothari R,Paul P.A new node splitting measure for decision tree construction[J].Pattern Recognition,2010,43(8):2725-2731.
    [25]Liu W,Chawla S,Cieslak D A,et al.A Robust decision tree algorithm for imbalanced data sets[C]//Proceedings of the SIAM International Conference on Data Mining.America:SIAM,2010,766-777.
    [26]Umano M,Okamolo H,Hatono I,et al.Fuzzy decision trees by fuzzy ID3 algorithm and its application to diagnosis system[C]//Proceedings of the 3 IEEE International Conference on Fuzzy Systems.New York:IEEE Press,1994,3:2113-2118.
    [27]Ju Hongyun,Zhang Junben,Li Chaofeng et al.Remote sensing image based on K-means and SVM automatic classification method[J].Application Research of computers,2007,24(11):318-320.
    [28]Ma Jiajun,Zhou Shuisheng,Li Chen,et al.A sparse robust model for large scale multi-class classification based on K-SVCR[J].Pattern Recognition Letters,2019,117:16-23.
    [29]Zhang J,Zhang P,Li Z.Fuzzy support vector machine based on color modeling for facial complexion recognition in traditional chinese medicine[J].Chinese Journal of Electronics,2016,25(3):474-480.
    [30]Nasiri J A,Charkari N M,Jalili S.Least squares twin multi-class classification support vector machine[J].Pattern Recognition,2015,48(3):984-992.