Natural Language Processing and Sentiment Analysis

Why is it there is no “Boring” button for posts on Facebook, Twitter, LinkedIn, and most other social media sites? It would make sentiment analysis so much easier. My opinion, it is to maintain some semblance of decorum and civility, as well as sparing people’s feelings. It’s bad enough when you get a 1,000 views, and only 10 “likes”, but can you imagine getting 10 “likes”, 90 “So-So”, and 900 “Boring!” responses? At least with 10 “likes” and 1,000 views, you’re not really sure what those other 990 people thought. Like it or not, you must accept the fact that the reader didn’t care enough about your post to lift their finger and take that extra effort to click the “like” button. However, you can’t really discern much more than that. Your only solace comes in justifying their lack of action as them being either very lazy, or inextricably frugal with their “likes”.

That said, there is still the “comments” section. This is reserved for people that really have an opinion about what you’ve posted. They either really like it, or they really hate it, but there is seldom a comment where someone says, “What you wrote is ok.” People who comment have generally read at least a portion of your post, and something in that post caused them to sit up in their chair, scroll down to the comments section, click on the text box, position their hands over the keyboard, and begin typing what they hope is an intelligent thought.

Unfortunately, I have been guilty of this as well, and what the commenter doesn’t realize is that sometimes they are telling us more than they wanted us to know. In their moment of passionate appreciation, disbelief, compassion, righteous indignation, or anger, they have provided us with personal insights that they may, or may not have intended to share. Frequently they share their strong feelings of devotion, or disdain for a product, service, or political candidate, which in turn can expose something about their own personal beliefs, character, values, personal preferences, shopping habits, or possibly impulsive nature. By expressing their strong feelings for one thing, are they not indicating what their feelings might be for another? If someone retweets a positive article about a product or service this probably indicates a favorable sentiment, unless of course they preface the tweet with the phrase “Can you believe this s#^t!?” In which case a competitor might have identified their demographic. This is what sentiment analysis is all about.

There are generally considered to be three types of sentiment analysis in use today. There is the knowledge base method, the statistical method, and hybrid. Before any of these methods can be applied, obviously you have to acquire the data, and then the process requires that the data be prepared for analysis; which is no trivial task. Usually, this consists of grouping words, and removing meaningless words from what is called the Corpus, (e.g., the, in, that, off, once, here, there). In the text mining package in R ™, there are 174 of what are called “stop words”. The remaining words in the Corpus are then converted to all lowercase characters. Punctuation, numbers, and special characters are removed, followed by grouping words that would only retain their meaning in the context of the other (e.g., no fees, dogs allowed, no pets, oracle database, data architect). Finally, whitespace is removed, and the Corpus is put through a process called stemming where common suffixes are removed like “ing”, “ly”, “es”. “s”, and others. The idea is to standardize the data before the analysis begins.

Obviously, there is more to this process, and especially when it comes to the analysis. I have started a series of posts on the subject if you care to check them out. The first post provides the code to accomplish all of the above, then creates a frequency histogram of the top words, a word cloud, followed by applying a k-means clustering algorithm on the data.

Hopefully, the little technical stuff in the middle was not too boring. I have already written a post on ontologies, which play a critical role in accurately assessing sentiment. I have developed a strong affinity for ontologies in the last few years, because not only are they applicable in the case of quality sentiment analysis, but they also enable applications for artificial intelligence, natural language processing, web semantics, data integration, and knowledge management. There are really some fascinating technologies that are enabled by ontologies, and I am hoping to generate some interest around their study and further development. And by the way, if you are not too tired, please “like” this post, and if you are feeling exceptionally spunky, I would love to hear your comments on the subject. Feel free to open up! 🙂

Natural Language Processing and Sentiment Analysis

Related

About Randall Shane

Leave a Reply

Your email is safe with us.

My StockTwit on IBM

Natural Language Processing and Sentiment Analysis

Share this:

Related

About Randall Shane

You also might be interested in

The Next Giant Leap in Data Management: Automated Integration

Graphs, what are they, and can they help us associate Words with Data?

NLP – Resume Analysis in R

Leave a Reply

Your email is safe with us.

My StockTwit on IBM