摘要
基于运用网络爬虫技术获取的《美国经济评论》和《经济研究》的Web文本数据,从描述性分析和主题模型分析两个角度,分别对预处理后的数据进行对比。分析结果发现,《美国经济评论》和《经济研究》在主题内容上既有相同点,即主题均包含国民经济学、投资学和国际贸易学,亦存在不同之处,此结果为广大学者的研究提供了现实依据,并为《经济研究》的未来建设提供了合理参考。
Based on Web text data obtained by using web crawler technology from The American Economic Review and Economic Research,this paper compares the preprocessed data respectively from the perspectives of both descriptive analysis and model analysis. The results show that there are some similarities in the themes of The American Economic Review and Economic Research,namely,both of them concerns national economics,investment and international trade. In addition,there are also some differences between them. The analysis results may provide a realistic basis for scholar's future research and provide reasonable references for the future construction of Economic Research.
引文
[1]陈平,刘晓霞,李亚军.基于字典和统计的分词方法[J].计算机工程与应用.2008,44(10):144-146.
[2]刘遥峰,王志良,王传经.中文分词和词性标注模型[J].计算机工程.2010,36(4):17-19.
[3]Lafferty J,McCallum A,Pereira F.Conditional Random Fields:Probabilistic Models for Segmenting and Labeling Sequence Data[C].Proceedings of the Eighteenth International Conference on Machine Learning,2001:282-289.
[4]沙有闯.基于Web文本挖掘的网络口碑监测系统研究[D].合肥:安徽大学,2010.
[5]徐戈,王厚峰.自然语言处理中主题模型的发展[J].计算机学报,2011,34(8):1423-1436.
[6]Blei D,Ng A,Jordan M.Latent Dirichlet allocation[J].Journal of Machine Learning Research,2003(3):993-1022.
[7]Hofmann T.Probabilistic latent semantic indexing[C].Proceedings of the 22nd Annual International SIGIR Conference.New York:ACM Press,1999:50-57.
[8]Koblitz N.Elliptic curve cryptosystems[J].Mathematics of Computation,1987,48(177):203-209.
[9]王建文.基于主题模型的学术网络对象建模与应用[D].武汉:华中师范大学,2013.
[10]李栎,张志强.情报研究中核心著者的影响力评价方法研究[J].情报杂志,2010,29(10):80-83,141.