admin 发表于 2015-11-19 18:38:46

wordscores文本统计分析

【问题】
最近,方舟子和韩寒太火了。
是否可以对韩寒文章进行词频分析?
我没具体关注,估计有人对韩寒文章进行统计过了。
很早之前,李贤平老师对《红楼梦》进行过词频统计
http://www.docin.com/p-277121750.html
那么,Stata是否可以进行文本分析?是否有相关命令?
【方法】
Benoit写的wordscores不知可否。
我对这方面不了解,感兴趣可以试试,命令下载:
net install http://www.tcd.ie/Political_Science/wordscores/wordscores
一般介绍:
Text analysis using Stata: the wordscoring approach to content analysis using words as data
Kenneth Benoit, Political Science, TCD
Abstract
The "word-scoring" approach to content analysis developed by Laver, Benoit, and Garry (American Political Science Review, June 2003) extracts has been used to summarize content from political texts based on a statistical analysis of word frequencies. Unlike nearly all other methods of computerized content analysis, "wordscores" does not rely on predefined coding schemes or dictionaries, but instead compares texts based on relative word frequencies, mapping patterns from texts whose content is known or assumed onto texts whose content the researcher wishes to estimate. Furthermore, because Wordscores makes to attempt to assess the meaning or linguistic structure of words, it works in any language. To implement this method, we have written the Wordscores suite of software implemented as .ado extensions in Stata 7.0. Available fromhttp://www.politics.tcd.ie/wordscores/, this software draws heavily from Stata's built-in word-parsing capabilities and data merging capabilities based on matching words. Not only is Stata capable of quickly generating and analyzing huge matrices of word frequencies, but also Stata's basic orientation as a statistical program makes it perfectly suited to statistical analysis of the word frequency information. Stata's capability for providing user-written help files, and for installing and updating .ado packages over the Internet, also make it an ideal platform for distributing our software for noncommercial, scientific use. To our knowledge, Wordscores is the first Stata application to perform content analysis of texts.

【参考】
Michael Laver, Kenneth Benoit, and John Garry (2003) Extracting policy positions from political texts using words as data. American Political Science Review 97(2).
Replication materials (requires Stata and Wordscores suite.)
Original manifesto texts
You may also wish to see our application of Wordscores to the first and second rounds of referee and editors' letters from the APSR review process.
Kenneth Benoit and Michael Laver (2002) Estimating Irish party positions using computer wordscoring: The 2002 elections. Irish Political Studies 17(2 Winter).
Michael Laver and Kenneth Benoit (2002) Locating TDs in policy spaces: Wordscoring Dáil speeches. Irish Political Studies 17(1, Summer): 59-73.

页: [1]
查看完整版本: wordscores文本统计分析