An Investigation of Term Weighting and Feature Selection Methods for Sentiment Analysis
AbstractSentiment analysis automatically classifies the opinions, which are expressed in a document, usually as positive or negative. A review document in general, reflects its author’s opinion about the objects mentioned in the text. Therefore, it can have many useful applications such as opinionated web search and automatic analysis of reviews. Although sentiment analysis is a kind of text classification problem, structures of review documents are different from texts like news, articles, or web pages; so that techniques applied for text classification are needed to be re-experimented for the sentiment analysis. Assigning appropriate weights to features is important to the performance of sentiment analysis so that important features can receive higher weights for the feature vectors. Feature selection reduces feature vector size by eliminating redundant or irrelevant features to improve classification accuracy. In this study, our aim is to examine the effects of term weighting methods on newly proposed Query Expansion Ranking (QER) feature selection method and also compare the classification results with one of the well-known feature selection method namely Chi-square statistic. We use three popular term weighting methods (i.e., term presence, term frequency, term frequency and inverse document frequency-tf*idf) and perform experiments using multinomial Naïve Bayes classifier. The experimental results show that when QER feature selection method is used with tf*idf term weighting method, the classification performance improves in terms of F-score.
 B. Pang and L. Lee, “A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts,” in Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, pp. 271–278.
 C. Nicholls and F. Song, “Comparison of feature selection methods for sentiment analysis,” in AI’10 Proceedings of the 23rd Canadian conference on Advances in Artificial Intelligence, vol. 10, no. 3, pp. 286–289.
 M. Cetin and M. F. Amasyali, “Supervised and traditional term weighting methods for sentiment analysis,” 21st Signal Process. Commun. Appl. Conf., pp. 1–4, 2013.
 B. Agarwal and N. Mittal, “Prominent feature extraction for review analysis: an empirical study,” J. Exp. Theor. Artif. Intell., vol. 28, no. 3, pp. 485–498, May 2016.
 T. Parlar and S. A. Ozel, “A new feature selection method for sentiment analysis of Turkish reviews,” in International Symposium on INnovations in Intelligent SysTems and Applications (INISTA), pp. 1–6.
 J. Cai and F. Song, “Maximum entropy modeling with feature selection for text categorization,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2008, vol. 4993 LNCS, pp. 549–554.
 J. Yan, N. Liu, B. Zhang, S. Yan, Z. Chen, Q. Cheng, WFan, W. Ma., “OCFS: optimal orthogonal centroid feature selection for text categorization,” in Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’05, 2005, p. 122.
 D. Harman, “Relevance feedback revisited,” in Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’92, 1992, pp. 1–10.
 D. Harman, “Towards interactive query expansion,” in Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval, 1988, pp. 321–331.
 B. İ. Sevindi, “Türkçe metinlerde denetimli ve sözlük tabanlı duygu analizi yaklaşımlarının karşılaştırılması,” Msc. Thesis, Gazi University, 2013.
 E. Demirtas and M. Pechenizkiy, “Cross-lingual polarity detection with machine translation,” in WISDOM’13, pp. 1–8.
 J. Han and M. Kamber, Data Mining: Concepts and Techniques, vol. 54, no. Second Edition. 2006.
 S. Bird, E. Klein, and E. Loper, Natural Language Processing with Python, vol. 43. 2009.