I've been doing a lot of sentiment analysis projects lately. It all started with my Bee Movie Sentiment Analysis project. With Donald Trump's address to congress tonight, I decided to play with the speech and see what interesting trends I could pick up on with Sentiment Analysis.
I found a full transcript of the speech online and put it into a text file. If anyone would like to try running their own analysis, the file is available here in a gist! Once I had the speech with lines and sections separated by newlines, the actual analysis was simple.
Utilizing the same script from my Bee Movie analysis, I ended up with the following sentiment analysis.
The first is the speech running sentiment total graph without factoring in magnitude. Google uses the magnitude metric to show how "sure" it is of its sentiment score. This means that this is just the raw sentiment without taking into account how correct Google thinks it is.
This second graph takes into account the magnitude as well. This takes into account how accurate Google thinks the sentiment score is.
As you can see the graphs are very similar in the end. This is because the magnitude scores are generally very similar between all of the lines, so the graph is not affected very heavily.
One very interesting thing I noticed with this graph is the sudden change around line 100 of the speech. Looking at the data, this starts around the time Trump transitions from describing the former administration and issues with the country, to Trumping up (sorry for the terrible pun) his own administration. The sudden change was not something I expected, but due to the nature of the speech does not really surprise me either.
The last graph I created from the data also shows how the transition happens. The lack of very negative lines indicates the switch in content and sentiment of the speech around the 100 line mark. A note is that this also uses the weighted sentiment using the magnitude score as well.
From this data I found that the average of the first 100 lines was ~0.05, and the average of the last 80 lines was ~0.30. This is a sharp difference and very interesting to me.
I'm starting to love using sentiment analysis folks. Look for more from me using the Natural Language API!
PS: this is the speech's sentiment in emoji: