IBM Watson and The Greater Fool Blog Sentiment Analysis
IBM Watson and The Greaterfool.ca Blog Sentiment Analysis
You may remember IBM's amazing technology 'Watson' that defeated the all-time human champions of Jeopardy back in 2011. For many, it represented a significant leap for the technology of Artificial Intelligence (AI) where a computer could compete with humans on what was considered human-only terrain: language comprehension, answering questions related to culture, history, politics, and do so faster and better than humans.
I have merged my love of learning about AI technologies with my love of reading Garth Turner's finance blog greaterfool.ca, where he posts daily. In this blog post, I will be presenting the results of a sentiment and emoitonal anlysis of one year of Garth's blog posts and comments.
IBM has made this AI available to individuals and businesses at a low cost. While Watson has a wide variety of capabilities, I utilized its sentiment and emotional analysis feature, which allows a user to pass text through their system and have an analysis of the overall ranked sentiment (was it positive, negative), along with key emotions (joy, anger, sadness, anger and disgust) returned back.
Getting computers and robots to understand 'sentiment' and 'emotion' of human interactions is definitely a hot topic in technology these days, and something industry is starting to adopt at scale. As the amount of information and communication continues to grow exponentially, interpreting this content is something many businesses must get a handle on.
Garth's blog has a wide following and an active community of commenters who add their daily take on a plethora of topics. We have included them in our analysis as well.
So let's see what Watson has to say about the GreaterFool.ca! Because this article is geared towards both the technical and non-technical, I've organized the following sections in to areas you can read or safely ignore.
About the Blog
Greaterfool.ca is a daily published blog authored by Garth Turner, a financial advisor, entrepreneur and former Minister of National Revenue in the Canadian federal government. From time to time Garth will have one of his work colleagues guest-author a post.
The blog's focus is the financial risk being assumed (often unwittingly) by Canadians who purchase too much real-estate in lieu of a more balanced financial investment strategy.
Garth is an incredibly entertaining writer, particularly given the nature of the topic covered, and can get close to 400 comments from his site visitors for a single post. Some posters are regulars (more on that later) whose pseudonyms are familiar to anyone who reads the comments section regularly.
Now for the fun stuff! We pulled in 12 months worth (August 1 2018 to July 31 2019) of publically available blogs from the site, which is 365 posts and 50,251 comments, and ran each one individually through IBM Watson. For each blog and comment we received a score for overall sentiment (positive or negative), joy, anger, fear, disgust and sadness. What we identified was that blogs with a 'negative' sentiment generally seemed to garner the most comments, which could be assumed to be a proxy for engagement.
However, it does seem that being 'too negative' drives comments down - see farthest left bubble.
It should also be noted that Guest-Bloggers in general seemed to be more positive than average, and they post on weekends, which might account for some of the lower comment counts on the positive end of the scale. For example, guest bloggers account for 19 of the 50 most positive blogs, but only 6 of the 50 most negative. The 10 most positive blogs had an average of 97 comments, while the 10 most negative blogs had an average of 130 comments.
The most commented-on blog ranked number 20 out of 365 total, as measured by fear (0.59797).
The Real Insanity - The Comments Section
The Greaterfool.ca has a lively comments section, filled with a wide variety of characters who opine on and off topic, often without decorum, taste or manners. This makes for a compelling challenge for Watson. Luckily we did not break it with the insanity it had to endure! Sadly, the most prolific commenter has a pseudonym I will not repeat on my blog, but pertains to something that can occupy a crowded elevator you do not want to be in (how's that for a Jeopardy question?).
Below is a chart of the top 10 most prolific commenters.
If you can believe it, "Mr. Elevator" posted 1,577 comments in 12 months, or an average of 4.3 comments every day. This is just over double the amount of the 3rd most prolific commenter.
The regular contributer 'For those about to Flop' would have been in the top 10 had he posted consistently under one name, but seems to flop back and forth between his long formal name (463 comments) and 'flop' (183 comments). So he is aptly named.
Speaking of Flop, who likes to post regularly about 'how much', let's rank our top comments as evaluated by IBM's Watson.
Just a brief note about ranking: Watson returns a value between -1 and 1 as a measure of a sentiment or emotion. So an anger value of 0.9 means you are really angry, and -0.9 means you are not angry AT ALL.
"Thank you so much for your excellent answer to my question! I really enjoy your writing and I look forward to every post. Kind of you to share your knowledge with us. (The pictures are cool too!)" - Ian "Blog Dog", Nov 26, 2018. Sentiment rank: 0.999947
"Yet another pic of a stupid canine failing the IQ test. (Too bad dog owners are equally of lower intelligence, or they could save their poor dumb puppies from their own stupidity.)" - Felix, Sep 23 2018. Sentiment rank: -0.998314
"Thank you, Garth. Such wonderful stories. Happy New Year to you and your family." - Chris, Dec 30 2018. Joy Rank: 0.991983.
"Enjoy life don't, worry about the small shit. Gamble bitches." - Smoking Man, Apr 4 2019. Anger rank: 0.935145
We knew SM would appear here somewhere. Frankly, this does not strike me as all that angry, I like rank #10 as the most angry:
"If the lazy parasites communist CONservatives want t No B-20 that is fine but they should DEMAND the shut down of CMHC. They always hate government interference but since it helps those useless communists they dont say boo about CMHC. Proving CONservatives are nothing but CON artists and SHYSTERS who hate the free market but love communism for the rich." - Communist CONservatives, May 14, 2019. Anger rank: 0.853132
"Progressives and their "ideals" sure are revolting and disgusting." - Classical Liberal Millennial, May 5, 2019. Disgust rank: 0.91043
"oh boy I fear for the comments section today. I'm out." - yorkville renter, Feb 28 2019. Fear rank: 0.947706
"My Dad did the same as Nance:( He won't tell me but at 88 I'm sure he's broke now. Just CPP and OAS to live on. Sad AF!" - Honey Dripper, Sep 13 2019. Sadness rank: 0.965764
How We Did It
IBM Watson is "IBM’s suite of enterprise-ready AI services, applications, and tooling." The area of AI we are using is known as Natural Language Processing and Sentiment Analysis. Basically you give a computer system some phrases in a human language like English, and it will return its interpretation of some aspects of it, such as:
Overall sentiment - is it positive or negative?
Emotions - what emotions are involved, and how strong are they?
Entities - what are the topics or categories being discussed?
You can try this directly with your web browser here. Write or copy some text in the window and click 'Analyze'. Underneath the button will appear several more buttons, clicking them will provide you with feedback on the text you entered.
Organizations use this to process large volumes of data to look for specific trends. For example, a call centre at a bank could transcribe call logs and monitor realtime if customers are becoming emotional in one direction or another, or companies can analyze huge volumes of their customer survey results for meaningful insights into how their customers or market is feeling.
To get all this data we built a web scraping tool using Python to retrieve publicly available content from the Greater Fool blog. From there we passed it to Watson's API and recorded the results in a database for further analysis. From there we did further analysis with SQL, Python and Excel.
We ran our approximately 56,000 records against Watson without tweaking or tailoring Watson to understand the culture, sarcasm or irony prevalent on the Greater Fool blog. For example, it deemed this post as overwhemingly happy and positive!
"Happy Housing Crash Everyone! The days of SHYSTER lies are over. I can’t wait for a uber type realtor. Could we see $1000-$5000 to sell a home? Happy Housing Crash Everyone!:-)"
It reads like a biblical pestilence is being lifted... how could that not be happy? To be fair Happy Housingcrash Everyone's average sentiment across 21 comments was negative.
Watson allows us to build a model where we can deem terms like 'Happy Housing Crash' as negative (angry? disgusted? bonkers?), and see if we can improve the rankings.
I will also be doing a similar analysis with heartbeatai.com, a competitor to Watson that is based right here in Canada and offers a far wider range of emotions in their analysis.
More on that in a future post!
Keith Stoute is a serial tech entrepreneur and inventor whose products have been used by tens of thousands of users all over the globe. He has invented and developed IP that has been sold for millions of dollars to global software firms. Currently founder and president of Visual Antidote, a boutique IT consulting firm and software manufacturer with customers globally. He speaks regularly at international conferences on new and emerging technologies. https://www.linkedin.com/in/keith-stoute-va/