Movie reviews help users decide whether a movie is worth watching or not. A summary of the reviews for a movie can help a user make quick decisions within a small period of time, rather than spending much more time reading multiple reviews for a movie. Sentiment analysis helps in rating how positive or negative a movie review is. Therefore, the process of understanding if a review is positive or negative can be automated as the machine learns different techniques from the domain of Natural Language Processing.
The dataset contains 10,000 movie reviews. The objective is to do Sentiment Analysis(positive/negative) for the movie reviews using different techniques like supervised and unsupervised learning methods and compare which gives the better and most accurate results.
**Bag of Words**
**TF-IDF** (**T**erm **F**requency - **I**nverse **D**ocument **F**requency)
**TextBlob**
**VADER Sentiment**
Dataset source:
# Importing the required the libraries
# To read and manipulate the data
import pandas as pd
pd.set_option('max_colwidth', None)
# To visualise the graphs
import matplotlib.pyplot as plt
import seaborn as sns
# Helps to display the images
from PIL import Image
# Helps to extract the data using regular expressions
import re
# Helps to remove the punctuation
import string
# It helps to remove the accented characters
import unidecode
# Importing the NLTK library
import nltk
nltk.download('stopwords') # Loading the stopwords
nltk.download('punkt') # Loading the punkt module, used in Tokenization
nltk.download('omw-1.4') # Dependency for Tokenization
nltk.download('wordnet') # Loading the wordnet module, used in stemming and lemmatization
# downloading vader lexicon
nltk.download('vader_lexicon')
from nltk.corpus import stopwords
# Helps to visualize the wordcloud
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
# Used in Stemming
from nltk.stem.porter import PorterStemmer
# Used in Lemmatization
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import CountVectorizer #For Bag of words
from sklearn.feature_extraction.text import TfidfVectorizer #For TF-IDF
# Helped to create train and test data
from sklearn.model_selection import train_test_split
# Importing the Random Forest model
from sklearn.ensemble import RandomForestClassifier
# Metrics to evaluate the model
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
# Unsupervised learning models
# Install vader sentiment package
!pip install vaderSentiment
# Install textblob package
!pip install textblob
[nltk_data] Downloading package stopwords to C:\Users\Jai Ganesh [nltk_data] Nagidi\AppData\Roaming\nltk_data... [nltk_data] Package stopwords is already up-to-date! [nltk_data] Downloading package punkt to C:\Users\Jai Ganesh [nltk_data] Nagidi\AppData\Roaming\nltk_data... [nltk_data] Package punkt is already up-to-date! [nltk_data] Downloading package omw-1.4 to C:\Users\Jai Ganesh [nltk_data] Nagidi\AppData\Roaming\nltk_data... [nltk_data] Package omw-1.4 is already up-to-date! [nltk_data] Downloading package wordnet to C:\Users\Jai Ganesh [nltk_data] Nagidi\AppData\Roaming\nltk_data... [nltk_data] Package wordnet is already up-to-date! [nltk_data] Downloading package vader_lexicon to C:\Users\Jai Ganesh [nltk_data] Nagidi\AppData\Roaming\nltk_data... [nltk_data] Package vader_lexicon is already up-to-date!
Requirement already satisfied: vaderSentiment in f:\conda\lib\site-packages (3.3.2) Requirement already satisfied: requests in f:\conda\lib\site-packages (from vaderSentiment) (2.24.0) Requirement already satisfied: certifi>=2017.4.17 in f:\conda\lib\site-packages (from requests->vaderSentiment) (2020.6.20) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in f:\conda\lib\site-packages (from requests->vaderSentiment) (1.25.9) Requirement already satisfied: chardet<4,>=3.0.2 in f:\conda\lib\site-packages (from requests->vaderSentiment) (3.0.4) Requirement already satisfied: idna<3,>=2.5 in f:\conda\lib\site-packages (from requests->vaderSentiment) (2.10)
WARNING: Ignoring invalid distribution -ip (f:\conda\lib\site-packages) WARNING: Ignoring invalid distribution -ip (f:\conda\lib\site-packages) WARNING: Ignoring invalid distribution -ip (f:\conda\lib\site-packages) WARNING: Ignoring invalid distribution -ip (f:\conda\lib\site-packages) WARNING: Ignoring invalid distribution -ip (f:\conda\lib\site-packages) WARNING: Ignoring invalid distribution -ip (f:\conda\lib\site-packages)
Requirement already satisfied: textblob in f:\conda\lib\site-packages (0.17.1) Requirement already satisfied: nltk>=3.1 in f:\conda\lib\site-packages (from textblob) (3.5) Requirement already satisfied: tqdm in f:\conda\lib\site-packages (from nltk>=3.1->textblob) (4.47.0) Requirement already satisfied: click in f:\conda\lib\site-packages (from nltk>=3.1->textblob) (7.1.2) Requirement already satisfied: regex in f:\conda\lib\site-packages (from nltk>=3.1->textblob) (2020.6.8) Requirement already satisfied: joblib in f:\conda\lib\site-packages (from nltk>=3.1->textblob) (1.1.0)
WARNING: Ignoring invalid distribution -ip (f:\conda\lib\site-packages) WARNING: Ignoring invalid distribution -ip (f:\conda\lib\site-packages) WARNING: Ignoring invalid distribution -ip (f:\conda\lib\site-packages) WARNING: Ignoring invalid distribution -ip (f:\conda\lib\site-packages) WARNING: Ignoring invalid distribution -ip (f:\conda\lib\site-packages) WARNING: Ignoring invalid distribution -ip (f:\conda\lib\site-packages)
# Loading data into pandas dataframe
reviews = pd.read_csv("imdb_10K_sentimnets_reviews.csv")
# Creating the copy of the data frame
data = reviews.copy()
# View the first and last 5 rows of the dataset
data.head(5)
review | sentiment | |
---|---|---|
0 | Okay, I know this does’nt project India in a good light. But the overall theme of the movie is not India, it’s Shakti. The power of a warlord, and the power of a mother. The relationship between Nandini and her husband and son swallow you up in their warmth. Then things go terribly wrong. The interaction between Nandini and her father in law – the power of their dysfunctional relationship – and the lives changed by it are the strengths of this movie. Shah Rukh Khan’s performance seems to be a mere cameo compared to the believable desperation of Karisma Kapoor. It is easy to get caught up in the love, violence and redemption of lives in this film, and find yourself heaving a sigh of relief and sadness at the climax. The musical interludes are strengths, believable and well done. | 1 |
1 | Despite John Travolta’s statements in interviews that this was his favorite role of his career, “Be Cool” proves to be a disappointing sequel to 1995’s witty and clever “Get Shorty.”<br /><br />Travolta delivers a pleasant enough performance in this mildly entertaining film, but ultimately the movie falls flat due to an underdeveloped plot, unlikeable characters, and a surprising lack of chemistry between leads Travolta and Uma Thurman. Although there are some laughs, this unfunny dialog example (which appeared frequently in the trailers) kind of says it all: Thurman: Do you dance? Travolta: Hey, I’m from Brooklyn.<br /><br />The film suggests that everyone in the entertainment business is a gangster or aspires to be one, likening it to organized crime. In “Get Shorty,” the premise of a gangster “going legitimate” by getting into movies was a clever fish-out-of water idea, but in “Be Cool,” it seems the biz has entirely gone crooked since then.<br /><br />The film is interestingly casted and the absolute highlight is a “monolgue” delivered by The Rock, whose character is an aspiring actor as well as a goon, where he reenacts a scene between Gabrielle Union and Kirsten Dunst from “Bring It On.” Vince Vaughan’s character thinks he’s black and he’s often seen dressed as a pimp– this was quite funny in the first scene that introduces him and gets tired and embarrassing almost immediately afterward.<br /><br />Overall, “Be Cool” may be worth a rental for John Travolta die-hards (of which I am one), but you may want to keep your finger close to the fast forward button to get through it without feeling that you wasted too much time. Fans of “Get Shorty” may actually wish to avoid this, as the sequel is devoid of most things that made that one a winner. I rate this movie an admittedly harsh 4/10. | 0 |
2 | I am a kung fu fan, but not a Woo fan. I have no interest in gangster movies filled with over-the-top gun-play. Now, martial arts; *that’s* beautiful! And John Woo surprised me here by producing a highly entertaining kung fu movie, which almost has *too much* fighting, if such a thing is possible! This is good stuff.<br /><br />Many of the fight scenes are very good (and some of them are less good), and the main characters are amusing and likable. The bad guys are a bit too unbelievably evil, but entertaining none the less. You gotta see the Sleeping Wizard!! He can only fight when he’s asleep – it’s hysterical!<br /><br />Upon repeated viewings, however, Last Hurrah For Chivalry can tend to get a little boring and long-winded, also especially because many of the fight scenes are actually not that good. Hence, I rate it “only” a 7 out of 10. But it really is almost an “8”.<br /><br />All in all one of the better kung fu movies, made smack-dab in the heart of kung fu cinema’s prime. All the really good kung fu movies are from the mid- to late 1970ies, with some notable exceptions from the late ’60ies and early ’70ies (and early ’80ies, to be fair). | 1 |
3 | He seems to be a control freak. I have heard him comment on “losing control of the show” and tell another guest who brought live animals that he had one rule-“no snakes.” He needs to hire a comedy writer because his jokes are lame. The only reason I watch him is because he some some great guests and bands. <br /><br />I watched the Craig Ferguson show for a while but his show is even worse. He likes to bull sh** to burn time.I don’t think either man has much of a future in late night talk shows.<br /><br />Daily also has the annoying habit of sticking his tongue out to lick his lips. He must do this at least 10 times a show. I do like the Joe Firstman band. Carson Daily needs to lighten up before it is too late. | 0 |
4 | Admittedly, there are some scenes in this movie that seem a little unrealistic. The ravishing woman first panics and then, only a few minutes later, she starts kissing the young lad while the old guy is right next to her. But as the film goes along we learn that she is a little volatile girl (or slut) and that partly explains her behavior. The cinematography of this movie is well done. We get to see the elevator from almost every angle and perspective, and some of those images and scenes really raise the tension. Götz George plays his character well, a wannabe hot-shot getting old and being overpowered by young men like the Jaennicke character. Wolfgang Kieling who I admired in Hitchcock’s THE TORN CURTAIN delivers a great performance that, although he doesn’t say much, he is by far the best actor in this play. One critic complained about how unrealistic the film was and that in a real case of emergency nothing would really happen. But then again, how realistic are films such as Mission impossible or Phone Booth. Given the fact that we are talking about a movie here, and that in a movie you always have to deal with some scenes that aren’t very likely to occur in real life, you can still enjoy this movie. It’s a lot better than many things that I see on German TV these days and I think that the vintage 80’s style added something to this film. | 1 |
data.tail()
review | sentiment | |
---|---|---|
9995 | A masterpiece.<br /><br />Thus it is, possibly, not for everyone.<br /><br />The camera work, acting, directing and everything else is unique, original, superb in every way – and very different from the trash we are sadly used to getting.<br /><br />Summer Phoenix creates a deep, believable and intriguing Esther Kahn. As everything else in this film, her acting is unique – it is completely her own – neither “British” nor “American” nor anything else I have ever seen. There is something mesmerizing about it.<br /><br />The lengthy, unbroken, natural shots are wonderful, reminding us that we have become too accustomed to a few restricted ways of shooting and editing. | 1 |
9996 | Great movie about a great man. Thomas Kretschmann is first rate as in all of his other movies.I would never have envisioned him as Pope John Paul. It speaks volumes for the casting director. Why do they keep casting him as German officer in the movies? And he only came to universal attention after “the pianist”? Of course he looks so hot in the uniforms. I know a lot of girls drool over his handsome face. But this guy is a great actor and has such great potentials. If you don’t believe me, go watch “Stalingrad”. I hope he will get a lot of excellent roles in the future with more diversity. Otherwise, what a heartbreaking waste of great talent. | 1 |
9997 | Before we start, may I say I hope you’ve already eaten when you’re reading this. Why? Because, after I’d seen this film for the first time, the bird’s look and sound made me want to eat chicken after the words ‘The End’ had appeared on the screen. So don’t say you weren’t warned.<br /><br />Fred Sears might have directed “Earth vs. the Flying Saucers” (an okay film and one of the bigger examples for Tim Burton’s “Mars Attacks”), but “The Giant Claw” is not that giant a film. Yes, it’s a prehistoric monster that flies in the air, attacks planes and cities and occasionally treats itself to a man on a parachute. The beast is giant except in the scenes where it’s considerably smaller, but who needs consistent proportions in a movie? Scary? It could have been, but not if the plot is hopelessly silly and the monster looks like like a puppet that ran away from Sesame Street. | 0 |
9998 | I was so disappointed by this show. After hearing and reading all the hoopla about it, how it was a “ground breaking show” and all sorts of wild promises if quality, I tried to watch it.<br /><br />What a letdown!! The acting was way forced and exaggerated. The story made very little sense. As for any hint of the vaunted “look into teenagers’ lives”, I could only see a paltry attempt that had as much reality to it as a reality show.<br /><br />Some are wondering why there are so many negative comments about this show. The reason is that it’s really not all that good and beating the drums over quality on this show only serves to attract attention to how poorly made it is. | 0 |
9999 | The 3-D featured in “The Man Who Wasn’t There” stands for DUMB, DUMB, DUMB! This inept comedy features lousy 3-D effects that makes the 3-D effects in “Jaws 3”, “Amityville 3”, and “Friday the 13th Part 3” look better by comparison. Not to mention the movie is asinine to the extreme. This was one of many 1983 movies to feature the pop-off-the-screen effects. Steve Guttenberg and Jeffrey Tambor got trapped in this mess, but at least it didn’t kill their careers. Tambor would go on to star on HBO’s “The Larry Sanders Show” and Ron Howard’s box office smash “How the Grinch Stole Christmas”, while Guttenberg followed this flop with “Police Academy” and “Cocoon”. What them in those projects instead of them here in “The Man Who Wasn’t There”. If you do, you’ll regret it.<br /><br />1/2* (out of four) | 0 |
# Understand the shape of the dataset
data.shape
(10000, 2)
# Check the data types of the columns for the dataset
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 10000 entries, 0 to 9999 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 review 10000 non-null object 1 sentiment 10000 non-null int64 dtypes: int64(1), object(1) memory usage: 156.4+ KB
# checking for duplicate values
data.duplicated().sum()
18
# dropping the duplicates
data = data.drop_duplicates(keep = 'first')
# checking for duplicate values
data.duplicated().sum()
0
# resetting the index of the dataframe
data = data.reset_index(drop = True)
# Creating word cloud for negative reviews
negative_reviews = data[data['sentiment'] == 0]
words = ' '.join(negative_reviews['review'])
cleaned_word = " ".join([word for word in words.split()])
wordcloud = WordCloud(stopwords = STOPWORDS,
background_color = 'black',
width = 3000,
height = 2500
).generate(cleaned_word)
plt.figure(1, figsize = (12, 12))
plt.imshow(wordcloud)
plt.axis('off')
plt.show()
# Creating word cloud for positive reviews
positive_reviews = data[data['sentiment'] == 1]
words = ' '.join(positive_reviews['review'])
cleaned_word = " ".join([word for word in words.split()])
wordcloud = WordCloud(stopwords = STOPWORDS,
background_color = 'black',
width = 3000,
height = 2500
).generate(cleaned_word)
plt.figure(1, figsize = (12, 12))
plt.imshow(wordcloud)
plt.axis('off')
plt.show()
# Check the labels
data['sentiment'].unique()
array([1, 0], dtype=int64)
# check the count of each labels
data['sentiment'].value_counts()
1 5033 0 4949 Name: sentiment, dtype: int64
# Plot the distribution of the class label
def bar_plot(data, feature):
# Creating the countplot
plot = sns.countplot(x = feature, data = data)
# Finding the length the whole data
total = len(data)
# Creating the percentages to each label in the data
for p in plot.patches:
percentage = '{:.1f}%'.format(100 * p.get_height()/total)
x = p.get_x() + p.get_width() / 2 - 0.05
y = p.get_y() + p.get_height()
plot.annotate(percentage, (x, y),ha="center",
va = "center",
size = 12,
xytext = (0, 5),
textcoords = "offset points")
plt.show()
bar_plot(data,'sentiment')
Before going to model building we should have to clean the data for the better model performance.
# Creating the lemmatizer
lm = WordNetLemmatizer()
final_corpus = []
for i in range(data.shape[0]):
# removing the special characetrs, numbers from the data
review = re.sub('[^a-zA-Z]', ' ', data['review'][i])
# lowering the text
review = review.lower()
review = review.split()
# removing the accented words
review = [unidecode.unidecode(word) for word in review]
# removing the stopwords and creating lemma words to each word present in the each row
review = [lm.lemmatize(word) for word in review if not word in stopwords.words('english')]
# joining the corpus
review = ' '.join(review)
# Appending the result it into new list named final_corpus
final_corpus.append(review)
# let's have look at the cleaned text
final_corpus[0]
'okay know nt project india good light overall theme movie india shakti power warlord power mother relationship nandini husband son swallow warmth thing go terribly wrong interaction nandini father law power dysfunctional relationship life changed strength movie shah rukh khan performance seems mere cameo compared believable desperation karisma kapoor easy get caught love violence redemption life film find heaving sigh relief sadness climax musical interlude strength believable well done'
final_corpus[1]
'despite john travolta statement interview favorite role career cool prof disappointing sequel witty clever get shorty br br travolta delivers pleasant enough performance mildly entertaining film ultimately movie fall flat due underdeveloped plot unlikeable character surprising lack chemistry lead travolta uma thurman although laugh unfunny dialog example appeared frequently trailer kind say thurman dance travolta hey brooklyn br br film suggests everyone entertainment business gangster aspires one likening organized crime get shorty premise gangster going legitimate getting movie clever fish water idea cool seems biz entirely gone crooked since br br film interestingly casted absolute highlight monolgue delivered rock whose character aspiring actor well goon reenacts scene gabrielle union kirsten dunst bring vince vaughan character think black often seen dressed pimp quite funny first scene introduces get tired embarrassing almost immediately afterward br br overall cool may worth rental john travolta die hards one may want keep finger close fast forward button get without feeling wasted much time fan get shorty may actually wish avoid sequel devoid thing made one winner rate movie admittedly harsh'
# saving the cleaned text back to review column
data['review'] = final_corpus
data.head(5)
review | sentiment | |
---|---|---|
0 | okay know nt project india good light overall theme movie india shakti power warlord power mother relationship nandini husband son swallow warmth thing go terribly wrong interaction nandini father law power dysfunctional relationship life changed strength movie shah rukh khan performance seems mere cameo compared believable desperation karisma kapoor easy get caught love violence redemption life film find heaving sigh relief sadness climax musical interlude strength believable well done | 1 |
1 | despite john travolta statement interview favorite role career cool prof disappointing sequel witty clever get shorty br br travolta delivers pleasant enough performance mildly entertaining film ultimately movie fall flat due underdeveloped plot unlikeable character surprising lack chemistry lead travolta uma thurman although laugh unfunny dialog example appeared frequently trailer kind say thurman dance travolta hey brooklyn br br film suggests everyone entertainment business gangster aspires one likening organized crime get shorty premise gangster going legitimate getting movie clever fish water idea cool seems biz entirely gone crooked since br br film interestingly casted absolute highlight monolgue delivered rock whose character aspiring actor well goon reenacts scene gabrielle union kirsten dunst bring vince vaughan character think black often seen dressed pimp quite funny first scene introduces get tired embarrassing almost immediately afterward br br overall cool may worth rental john travolta die hards one may want keep finger close fast forward button get without feeling wasted much time fan get shorty may actually wish avoid sequel devoid thing made one winner rate movie admittedly harsh | 0 |
2 | kung fu fan woo fan interest gangster movie filled top gun play martial art beautiful john woo surprised producing highly entertaining kung fu movie almost much fighting thing possible good stuff br br many fight scene good le good main character amusing likable bad guy bit unbelievably evil entertaining none le gotta see sleeping wizard fight asleep hysterical br br upon repeated viewing however last hurrah chivalry tend get little boring long winded also especially many fight scene actually good hence rate really almost br br one better kung fu movie made smack dab heart kung fu cinema prime really good kung fu movie mid late y notable exception late y early y early y fair | 1 |
3 | seems control freak heard comment losing control show tell another guest brought live animal one rule snake need hire comedy writer joke lame reason watch great guest band br br watched craig ferguson show show even worse like bull sh burn time think either man much future late night talk show br br daily also annoying habit sticking tongue lick lip must least time show like joe firstman band carson daily need lighten late | 0 |
4 | admittedly scene movie seem little unrealistic ravishing woman first panic minute later start kissing young lad old guy right next film go along learn little volatile girl slut partly explains behavior cinematography movie well done get see elevator almost every angle perspective image scene really raise tension g tz george play character well wannabe hot shot getting old overpowered young men like jaennicke character wolfgang kieling admired hitchcock torn curtain delivers great performance although say much far best actor play one critic complained unrealistic film real case emergency nothing would really happen realistic film mission impossible phone booth given fact talking movie movie always deal scene likely occur real life still enjoy movie lot better many thing see german tv day think vintage style added something film | 1 |
We sucessfully cleaned the raw text and saved back to review column, now let’s build the models.
In BoW, we construct a dictionary that contains the set of all unique words from our text review dataset. The frequency of the word is counted here. If there are d unique words in our dictionary then for every review the vector will be of length d and count of word from reviews is stored at its particular location in vector. The vector will be highly sparse in such cases.
# Vectorization (Convert text data to numbers).
from sklearn.feature_extraction.text import CountVectorizer
# Keep only 100 features as number of features will increase the processing time.
Count_vec = CountVectorizer(max_features = 1000)
data_features = Count_vec.fit_transform(data['review'])
# Convert the data features to array
data_features = data_features.toarray()
# Shape of the feature vector
data_features.shape
(9982, 1000)
X = data_features
y = data.sentiment
# Function to print the classification report and get confusion matrix in a proper format
def metrics_score(actual, predicted):
print(classification_report(actual, predicted))
cm = confusion_matrix(actual, predicted)
plt.figure(figsize = (8, 5))
sns.heatmap(cm, annot = True, fmt = '.2f', xticklabels = ['negative', 'positive'], yticklabels = ['negative', 'positive'])
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()
# Split data into training and testing set.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, shuffle = False)
# Training the best model and calculating accuracy on test data
clf = RandomForestClassifier(n_estimators = 100)
clf.fit(X_train, y_train)
y_pred_test = clf.predict(X_test)
metrics_score(y_test, y_pred_test)
precision recall f1-score support 0 0.81 0.83 0.82 1218 1 0.83 0.82 0.83 1278 accuracy 0.82 2496 macro avg 0.82 0.82 0.82 2496 weighted avg 0.82 0.82 0.82 2496
def get_top40_words(model, all_features):
# Addition of top 40 feature into top_feature after training the model
top_features=''
feat = model.feature_importances_
features = np.argsort(feat)[::-1]
for i in features[0:40]:
top_features+=all_features[i]
top_features+=','
from wordcloud import WordCloud
wordcloud = WordCloud(background_color = "white", colormap = 'viridis', width = 2000,
height = 1000).generate(top_features)
# Display the generated image:
plt.imshow(wordcloud, interpolation = 'bilinear')
plt.figure(1, figsize = (14, 11), frameon='equal')
plt.title('Top 40 features WordCloud', fontsize = 10)
plt.axis("off")
plt.show()
#Instantiate the feature from the vectorizer
features = Count_vec.get_feature_names()
get_top40_words(clf,features)
Term Frequency – Inverse Document Frequency: It makes sure that less importance is given to the most frequent words, and it also considers less frequent words.
# Using TfidfVectorizer to convert text data to numbers.
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf_vect = TfidfVectorizer(max_features = 1000)
data_features = tfidf_vect.fit_transform(data['review'])
data_features = data_features.toarray()
# Feature shape
data_features.shape
(9982, 1000)
X = data_features
y = data.sentiment
# Split data into training and testing set.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, shuffle=False)
# Training the best model and calculating accuracy on test data
clf1 = RandomForestClassifier(n_estimators = 100)
clf1.fit(X_train, y_train)
y_pred_test1 = clf1.predict(X_test)
metrics_score(y_test, y_pred_test1)
precision recall f1-score support 0 0.80 0.84 0.82 1218 1 0.84 0.80 0.82 1278 accuracy 0.82 2496 macro avg 0.82 0.82 0.82 2496 weighted avg 0.82 0.82 0.82 2496
#Instantiate the feature from the vectorizer
features = tfidf_vect.get_feature_names()
get_top40_words(clf1,features)
#convert the test samples into a dataframe where the columns are
#the y_test(ground truth labels),tf-idf model predicted labels(tf_idf_predicted),
#Count Vectorizer model predicted labels(count_vectorizer_predicted)
df = pd.DataFrame(y_test.tolist(), columns = ['y_test'])
df['count_vectorizer_predicted'] = y_pred_test
df['tf_idf_predicted'] = y_pred_test1
df.head()
y_test | count_vectorizer_predicted | tf_idf_predicted | |
---|---|---|---|
0 | 1 | 1 | 1 |
1 | 1 | 1 | 0 |
2 | 1 | 1 | 1 |
3 | 0 | 1 | 0 |
4 | 1 | 1 | 1 |
VADER (Valence Aware Dictionary and Sentiment Reasoner) is a lexicon and rule-based sentiment analysis library or tool. It not only provides us the Positivity and Negativity score of a sentiment, but also tells us the degree of positivity or negativity of it.
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sentiment = SentimentIntensityAnalyzer()
data_new = data.copy() # make a copy of dataframe and do unsupervised operations on that dataframe
# Calculate the polarity score of the reviews
data_new['scores'] = data_new['review'].apply(lambda text: sentiment.polarity_scores(text))
data_new.head()
review | sentiment | scores | |
---|---|---|---|
0 | okay know nt project india good light overall theme movie india shakti power warlord power mother relationship nandini husband son swallow warmth thing go terribly wrong interaction nandini father law power dysfunctional relationship life changed strength movie shah rukh khan performance seems mere cameo compared believable desperation karisma kapoor easy get caught love violence redemption life film find heaving sigh relief sadness climax musical interlude strength believable well done | 1 | {‘neg’: 0.17, ‘neu’: 0.549, ‘pos’: 0.281, ‘compound’: 0.836} |
1 | despite john travolta statement interview favorite role career cool prof disappointing sequel witty clever get shorty br br travolta delivers pleasant enough performance mildly entertaining film ultimately movie fall flat due underdeveloped plot unlikeable character surprising lack chemistry lead travolta uma thurman although laugh unfunny dialog example appeared frequently trailer kind say thurman dance travolta hey brooklyn br br film suggests everyone entertainment business gangster aspires one likening organized crime get shorty premise gangster going legitimate getting movie clever fish water idea cool seems biz entirely gone crooked since br br film interestingly casted absolute highlight monolgue delivered rock whose character aspiring actor well goon reenacts scene gabrielle union kirsten dunst bring vince vaughan character think black often seen dressed pimp quite funny first scene introduces get tired embarrassing almost immediately afterward br br overall cool may worth rental john travolta die hards one may want keep finger close fast forward button get without feeling wasted much time fan get shorty may actually wish avoid sequel devoid thing made one winner rate movie admittedly harsh | 0 | {‘neg’: 0.109, ‘neu’: 0.632, ‘pos’: 0.259, ‘compound’: 0.9836} |
2 | kung fu fan woo fan interest gangster movie filled top gun play martial art beautiful john woo surprised producing highly entertaining kung fu movie almost much fighting thing possible good stuff br br many fight scene good le good main character amusing likable bad guy bit unbelievably evil entertaining none le gotta see sleeping wizard fight asleep hysterical br br upon repeated viewing however last hurrah chivalry tend get little boring long winded also especially many fight scene actually good hence rate really almost br br one better kung fu movie made smack dab heart kung fu cinema prime really good kung fu movie mid late y notable exception late y early y early y fair | 1 | {‘neg’: 0.26, ‘neu’: 0.43, ‘pos’: 0.309, ‘compound’: 0.5946} |
3 | seems control freak heard comment losing control show tell another guest brought live animal one rule snake need hire comedy writer joke lame reason watch great guest band br br watched craig ferguson show show even worse like bull sh burn time think either man much future late night talk show br br daily also annoying habit sticking tongue lick lip must least time show like joe firstman band carson daily need lighten late | 0 | {‘neg’: 0.153, ‘neu’: 0.696, ‘pos’: 0.15, ‘compound’: -0.0772} |
4 | admittedly scene movie seem little unrealistic ravishing woman first panic minute later start kissing young lad old guy right next film go along learn little volatile girl slut partly explains behavior cinematography movie well done get see elevator almost every angle perspective image scene really raise tension g tz george play character well wannabe hot shot getting old overpowered young men like jaennicke character wolfgang kieling admired hitchcock torn curtain delivers great performance although say much far best actor play one critic complained unrealistic film real case emergency nothing would really happen realistic film mission impossible phone booth given fact talking movie movie always deal scene likely occur real life still enjoy movie lot better many thing see german tv day think vintage style added something film | 1 | {‘neg’: 0.118, ‘neu’: 0.676, ‘pos’: 0.206, ‘compound’: 0.9335} |
# Calculate the compund score of the reviews
data_new['compound'] = data_new['scores'].apply(lambda score_dict: score_dict['compound'])
data_new.head()
review | sentiment | scores | compound | |
---|---|---|---|---|
0 | okay know nt project india good light overall theme movie india shakti power warlord power mother relationship nandini husband son swallow warmth thing go terribly wrong interaction nandini father law power dysfunctional relationship life changed strength movie shah rukh khan performance seems mere cameo compared believable desperation karisma kapoor easy get caught love violence redemption life film find heaving sigh relief sadness climax musical interlude strength believable well done | 1 | {‘neg’: 0.17, ‘neu’: 0.549, ‘pos’: 0.281, ‘compound’: 0.836} | 0.8360 |
1 | despite john travolta statement interview favorite role career cool prof disappointing sequel witty clever get shorty br br travolta delivers pleasant enough performance mildly entertaining film ultimately movie fall flat due underdeveloped plot unlikeable character surprising lack chemistry lead travolta uma thurman although laugh unfunny dialog example appeared frequently trailer kind say thurman dance travolta hey brooklyn br br film suggests everyone entertainment business gangster aspires one likening organized crime get shorty premise gangster going legitimate getting movie clever fish water idea cool seems biz entirely gone crooked since br br film interestingly casted absolute highlight monolgue delivered rock whose character aspiring actor well goon reenacts scene gabrielle union kirsten dunst bring vince vaughan character think black often seen dressed pimp quite funny first scene introduces get tired embarrassing almost immediately afterward br br overall cool may worth rental john travolta die hards one may want keep finger close fast forward button get without feeling wasted much time fan get shorty may actually wish avoid sequel devoid thing made one winner rate movie admittedly harsh | 0 | {‘neg’: 0.109, ‘neu’: 0.632, ‘pos’: 0.259, ‘compound’: 0.9836} | 0.9836 |
2 | kung fu fan woo fan interest gangster movie filled top gun play martial art beautiful john woo surprised producing highly entertaining kung fu movie almost much fighting thing possible good stuff br br many fight scene good le good main character amusing likable bad guy bit unbelievably evil entertaining none le gotta see sleeping wizard fight asleep hysterical br br upon repeated viewing however last hurrah chivalry tend get little boring long winded also especially many fight scene actually good hence rate really almost br br one better kung fu movie made smack dab heart kung fu cinema prime really good kung fu movie mid late y notable exception late y early y early y fair | 1 | {‘neg’: 0.26, ‘neu’: 0.43, ‘pos’: 0.309, ‘compound’: 0.5946} | 0.5946 |
3 | seems control freak heard comment losing control show tell another guest brought live animal one rule snake need hire comedy writer joke lame reason watch great guest band br br watched craig ferguson show show even worse like bull sh burn time think either man much future late night talk show br br daily also annoying habit sticking tongue lick lip must least time show like joe firstman band carson daily need lighten late | 0 | {‘neg’: 0.153, ‘neu’: 0.696, ‘pos’: 0.15, ‘compound’: -0.0772} | -0.0772 |
4 | admittedly scene movie seem little unrealistic ravishing woman first panic minute later start kissing young lad old guy right next film go along learn little volatile girl slut partly explains behavior cinematography movie well done get see elevator almost every angle perspective image scene really raise tension g tz george play character well wannabe hot shot getting old overpowered young men like jaennicke character wolfgang kieling admired hitchcock torn curtain delivers great performance although say much far best actor play one critic complained unrealistic film real case emergency nothing would really happen realistic film mission impossible phone booth given fact talking movie movie always deal scene likely occur real life still enjoy movie lot better many thing see german tv day think vintage style added something film | 1 | {‘neg’: 0.118, ‘neu’: 0.676, ‘pos’: 0.206, ‘compound’: 0.9335} | 0.9335 |
# Classify the class of the review by keeping threshold on the compund score
data_new['comp_score'] = data_new['compound'].apply(lambda c: '1' if c >=0 else '0')
data_new.head()
review | sentiment | scores | compound | comp_score | |
---|---|---|---|---|---|
0 | okay know nt project india good light overall theme movie india shakti power warlord power mother relationship nandini husband son swallow warmth thing go terribly wrong interaction nandini father law power dysfunctional relationship life changed strength movie shah rukh khan performance seems mere cameo compared believable desperation karisma kapoor easy get caught love violence redemption life film find heaving sigh relief sadness climax musical interlude strength believable well done | 1 | {‘neg’: 0.17, ‘neu’: 0.549, ‘pos’: 0.281, ‘compound’: 0.836} | 0.8360 | 1 |
1 | despite john travolta statement interview favorite role career cool prof disappointing sequel witty clever get shorty br br travolta delivers pleasant enough performance mildly entertaining film ultimately movie fall flat due underdeveloped plot unlikeable character surprising lack chemistry lead travolta uma thurman although laugh unfunny dialog example appeared frequently trailer kind say thurman dance travolta hey brooklyn br br film suggests everyone entertainment business gangster aspires one likening organized crime get shorty premise gangster going legitimate getting movie clever fish water idea cool seems biz entirely gone crooked since br br film interestingly casted absolute highlight monolgue delivered rock whose character aspiring actor well goon reenacts scene gabrielle union kirsten dunst bring vince vaughan character think black often seen dressed pimp quite funny first scene introduces get tired embarrassing almost immediately afterward br br overall cool may worth rental john travolta die hards one may want keep finger close fast forward button get without feeling wasted much time fan get shorty may actually wish avoid sequel devoid thing made one winner rate movie admittedly harsh | 0 | {‘neg’: 0.109, ‘neu’: 0.632, ‘pos’: 0.259, ‘compound’: 0.9836} | 0.9836 | 1 |
2 | kung fu fan woo fan interest gangster movie filled top gun play martial art beautiful john woo surprised producing highly entertaining kung fu movie almost much fighting thing possible good stuff br br many fight scene good le good main character amusing likable bad guy bit unbelievably evil entertaining none le gotta see sleeping wizard fight asleep hysterical br br upon repeated viewing however last hurrah chivalry tend get little boring long winded also especially many fight scene actually good hence rate really almost br br one better kung fu movie made smack dab heart kung fu cinema prime really good kung fu movie mid late y notable exception late y early y early y fair | 1 | {‘neg’: 0.26, ‘neu’: 0.43, ‘pos’: 0.309, ‘compound’: 0.5946} | 0.5946 | 1 |
3 | seems control freak heard comment losing control show tell another guest brought live animal one rule snake need hire comedy writer joke lame reason watch great guest band br br watched craig ferguson show show even worse like bull sh burn time think either man much future late night talk show br br daily also annoying habit sticking tongue lick lip must least time show like joe firstman band carson daily need lighten late | 0 | {‘neg’: 0.153, ‘neu’: 0.696, ‘pos’: 0.15, ‘compound’: -0.0772} | -0.0772 | 0 |
4 | admittedly scene movie seem little unrealistic ravishing woman first panic minute later start kissing young lad old guy right next film go along learn little volatile girl slut partly explains behavior cinematography movie well done get see elevator almost every angle perspective image scene really raise tension g tz george play character well wannabe hot shot getting old overpowered young men like jaennicke character wolfgang kieling admired hitchcock torn curtain delivers great performance although say much far best actor play one critic complained unrealistic film real case emergency nothing would really happen realistic film mission impossible phone booth given fact talking movie movie always deal scene likely occur real life still enjoy movie lot better many thing see german tv day think vintage style added something film | 1 | {‘neg’: 0.118, ‘neu’: 0.676, ‘pos’: 0.206, ‘compound’: 0.9335} | 0.9335 | 1 |
data["VADER_pred"] = data_new['comp_score'].tolist()
data.head()
review | sentiment | VADER_pred | |
---|---|---|---|
0 | okay know nt project india good light overall theme movie india shakti power warlord power mother relationship nandini husband son swallow warmth thing go terribly wrong interaction nandini father law power dysfunctional relationship life changed strength movie shah rukh khan performance seems mere cameo compared believable desperation karisma kapoor easy get caught love violence redemption life film find heaving sigh relief sadness climax musical interlude strength believable well done | 1 | 1 |
1 | despite john travolta statement interview favorite role career cool prof disappointing sequel witty clever get shorty br br travolta delivers pleasant enough performance mildly entertaining film ultimately movie fall flat due underdeveloped plot unlikeable character surprising lack chemistry lead travolta uma thurman although laugh unfunny dialog example appeared frequently trailer kind say thurman dance travolta hey brooklyn br br film suggests everyone entertainment business gangster aspires one likening organized crime get shorty premise gangster going legitimate getting movie clever fish water idea cool seems biz entirely gone crooked since br br film interestingly casted absolute highlight monolgue delivered rock whose character aspiring actor well goon reenacts scene gabrielle union kirsten dunst bring vince vaughan character think black often seen dressed pimp quite funny first scene introduces get tired embarrassing almost immediately afterward br br overall cool may worth rental john travolta die hards one may want keep finger close fast forward button get without feeling wasted much time fan get shorty may actually wish avoid sequel devoid thing made one winner rate movie admittedly harsh | 0 | 1 |
2 | kung fu fan woo fan interest gangster movie filled top gun play martial art beautiful john woo surprised producing highly entertaining kung fu movie almost much fighting thing possible good stuff br br many fight scene good le good main character amusing likable bad guy bit unbelievably evil entertaining none le gotta see sleeping wizard fight asleep hysterical br br upon repeated viewing however last hurrah chivalry tend get little boring long winded also especially many fight scene actually good hence rate really almost br br one better kung fu movie made smack dab heart kung fu cinema prime really good kung fu movie mid late y notable exception late y early y early y fair | 1 | 1 |
3 | seems control freak heard comment losing control show tell another guest brought live animal one rule snake need hire comedy writer joke lame reason watch great guest band br br watched craig ferguson show show even worse like bull sh burn time think either man much future late night talk show br br daily also annoying habit sticking tongue lick lip must least time show like joe firstman band carson daily need lighten late | 0 | 0 |
4 | admittedly scene movie seem little unrealistic ravishing woman first panic minute later start kissing young lad old guy right next film go along learn little volatile girl slut partly explains behavior cinematography movie well done get see elevator almost every angle perspective image scene really raise tension g tz george play character well wannabe hot shot getting old overpowered young men like jaennicke character wolfgang kieling admired hitchcock torn curtain delivers great performance although say much far best actor play one critic complained unrealistic film real case emergency nothing would really happen realistic film mission impossible phone booth given fact talking movie movie always deal scene likely occur real life still enjoy movie lot better many thing see german tv day think vintage style added something film | 1 | 1 |
# Calculate the accuracy of the Vader Sentiment Analysis
data["sentiment"] = data["sentiment"].astype(int) #convert the sentiment column values into int data type
data["VADER_pred"] = data["VADER_pred"].astype(int) #convert the vader_predicted column values into int data type
metrics_score(data["sentiment"], data["VADER_pred"])
precision recall f1-score support 0 0.78 0.47 0.59 4949 1 0.63 0.87 0.73 5033 accuracy 0.67 9982 macro avg 0.70 0.67 0.66 9982 weighted avg 0.70 0.67 0.66 9982
from textblob import TextBlob
data_new = data.copy() # make a copy of dataframe and do unsupervised operations on that dataframe
# Calculate the polarity score of the reviews
data_new['polarity'] = data_new['review'].apply(lambda text: TextBlob(text).sentiment.polarity)
data_new.head()
review | sentiment | VADER_pred | polarity | |
---|---|---|---|---|
0 | okay know nt project india good light overall theme movie india shakti power warlord power mother relationship nandini husband son swallow warmth thing go terribly wrong interaction nandini father law power dysfunctional relationship life changed strength movie shah rukh khan performance seems mere cameo compared believable desperation karisma kapoor easy get caught love violence redemption life film find heaving sigh relief sadness climax musical interlude strength believable well done | 1 | 1 | 0.230303 |
1 | despite john travolta statement interview favorite role career cool prof disappointing sequel witty clever get shorty br br travolta delivers pleasant enough performance mildly entertaining film ultimately movie fall flat due underdeveloped plot unlikeable character surprising lack chemistry lead travolta uma thurman although laugh unfunny dialog example appeared frequently trailer kind say thurman dance travolta hey brooklyn br br film suggests everyone entertainment business gangster aspires one likening organized crime get shorty premise gangster going legitimate getting movie clever fish water idea cool seems biz entirely gone crooked since br br film interestingly casted absolute highlight monolgue delivered rock whose character aspiring actor well goon reenacts scene gabrielle union kirsten dunst bring vince vaughan character think black often seen dressed pimp quite funny first scene introduces get tired embarrassing almost immediately afterward br br overall cool may worth rental john travolta die hards one may want keep finger close fast forward button get without feeling wasted much time fan get shorty may actually wish avoid sequel devoid thing made one winner rate movie admittedly harsh | 0 | 1 | 0.158824 |
2 | kung fu fan woo fan interest gangster movie filled top gun play martial art beautiful john woo surprised producing highly entertaining kung fu movie almost much fighting thing possible good stuff br br many fight scene good le good main character amusing likable bad guy bit unbelievably evil entertaining none le gotta see sleeping wizard fight asleep hysterical br br upon repeated viewing however last hurrah chivalry tend get little boring long winded also especially many fight scene actually good hence rate really almost br br one better kung fu movie made smack dab heart kung fu cinema prime really good kung fu movie mid late y notable exception late y early y early y fair | 1 | 1 | 0.196548 |
3 | seems control freak heard comment losing control show tell another guest brought live animal one rule snake need hire comedy writer joke lame reason watch great guest band br br watched craig ferguson show show even worse like bull sh burn time think either man much future late night talk show br br daily also annoying habit sticking tongue lick lip must least time show like joe firstman band carson daily need lighten late | 0 | 0 | -0.151240 |
4 | admittedly scene movie seem little unrealistic ravishing woman first panic minute later start kissing young lad old guy right next film go along learn little volatile girl slut partly explains behavior cinematography movie well done get see elevator almost every angle perspective image scene really raise tension g tz george play character well wannabe hot shot getting old overpowered young men like jaennicke character wolfgang kieling admired hitchcock torn curtain delivers great performance although say much far best actor play one critic complained unrealistic film real case emergency nothing would really happen realistic film mission impossible phone booth given fact talking movie movie always deal scene likely occur real life still enjoy movie lot better many thing see german tv day think vintage style added something film | 1 | 1 | 0.111097 |
# Classify the class of the review by keeping threshold on the polarity score
data_new['polarity_score'] = data_new['polarity'].apply(lambda c: '1' if c >=0 else '0')
data_new.head()
review | sentiment | VADER_pred | polarity | polarity_score | |
---|---|---|---|---|---|
0 | okay know nt project india good light overall theme movie india shakti power warlord power mother relationship nandini husband son swallow warmth thing go terribly wrong interaction nandini father law power dysfunctional relationship life changed strength movie shah rukh khan performance seems mere cameo compared believable desperation karisma kapoor easy get caught love violence redemption life film find heaving sigh relief sadness climax musical interlude strength believable well done | 1 | 1 | 0.230303 | 1 |
1 | despite john travolta statement interview favorite role career cool prof disappointing sequel witty clever get shorty br br travolta delivers pleasant enough performance mildly entertaining film ultimately movie fall flat due underdeveloped plot unlikeable character surprising lack chemistry lead travolta uma thurman although laugh unfunny dialog example appeared frequently trailer kind say thurman dance travolta hey brooklyn br br film suggests everyone entertainment business gangster aspires one likening organized crime get shorty premise gangster going legitimate getting movie clever fish water idea cool seems biz entirely gone crooked since br br film interestingly casted absolute highlight monolgue delivered rock whose character aspiring actor well goon reenacts scene gabrielle union kirsten dunst bring vince vaughan character think black often seen dressed pimp quite funny first scene introduces get tired embarrassing almost immediately afterward br br overall cool may worth rental john travolta die hards one may want keep finger close fast forward button get without feeling wasted much time fan get shorty may actually wish avoid sequel devoid thing made one winner rate movie admittedly harsh | 0 | 1 | 0.158824 | 1 |
2 | kung fu fan woo fan interest gangster movie filled top gun play martial art beautiful john woo surprised producing highly entertaining kung fu movie almost much fighting thing possible good stuff br br many fight scene good le good main character amusing likable bad guy bit unbelievably evil entertaining none le gotta see sleeping wizard fight asleep hysterical br br upon repeated viewing however last hurrah chivalry tend get little boring long winded also especially many fight scene actually good hence rate really almost br br one better kung fu movie made smack dab heart kung fu cinema prime really good kung fu movie mid late y notable exception late y early y early y fair | 1 | 1 | 0.196548 | 1 |
3 | seems control freak heard comment losing control show tell another guest brought live animal one rule snake need hire comedy writer joke lame reason watch great guest band br br watched craig ferguson show show even worse like bull sh burn time think either man much future late night talk show br br daily also annoying habit sticking tongue lick lip must least time show like joe firstman band carson daily need lighten late | 0 | 0 | -0.151240 | 0 |
4 | admittedly scene movie seem little unrealistic ravishing woman first panic minute later start kissing young lad old guy right next film go along learn little volatile girl slut partly explains behavior cinematography movie well done get see elevator almost every angle perspective image scene really raise tension g tz george play character well wannabe hot shot getting old overpowered young men like jaennicke character wolfgang kieling admired hitchcock torn curtain delivers great performance although say much far best actor play one critic complained unrealistic film real case emergency nothing would really happen realistic film mission impossible phone booth given fact talking movie movie always deal scene likely occur real life still enjoy movie lot better many thing see german tv day think vintage style added something film | 1 | 1 | 0.111097 | 1 |
data["Text_Blob_pred"] = data_new['polarity_score'].tolist()
data.head()
review | sentiment | VADER_pred | Text_Blob_pred | |
---|---|---|---|---|
0 | okay know nt project india good light overall theme movie india shakti power warlord power mother relationship nandini husband son swallow warmth thing go terribly wrong interaction nandini father law power dysfunctional relationship life changed strength movie shah rukh khan performance seems mere cameo compared believable desperation karisma kapoor easy get caught love violence redemption life film find heaving sigh relief sadness climax musical interlude strength believable well done | 1 | 1 | 1 |
1 | despite john travolta statement interview favorite role career cool prof disappointing sequel witty clever get shorty br br travolta delivers pleasant enough performance mildly entertaining film ultimately movie fall flat due underdeveloped plot unlikeable character surprising lack chemistry lead travolta uma thurman although laugh unfunny dialog example appeared frequently trailer kind say thurman dance travolta hey brooklyn br br film suggests everyone entertainment business gangster aspires one likening organized crime get shorty premise gangster going legitimate getting movie clever fish water idea cool seems biz entirely gone crooked since br br film interestingly casted absolute highlight monolgue delivered rock whose character aspiring actor well goon reenacts scene gabrielle union kirsten dunst bring vince vaughan character think black often seen dressed pimp quite funny first scene introduces get tired embarrassing almost immediately afterward br br overall cool may worth rental john travolta die hards one may want keep finger close fast forward button get without feeling wasted much time fan get shorty may actually wish avoid sequel devoid thing made one winner rate movie admittedly harsh | 0 | 1 | 1 |
2 | kung fu fan woo fan interest gangster movie filled top gun play martial art beautiful john woo surprised producing highly entertaining kung fu movie almost much fighting thing possible good stuff br br many fight scene good le good main character amusing likable bad guy bit unbelievably evil entertaining none le gotta see sleeping wizard fight asleep hysterical br br upon repeated viewing however last hurrah chivalry tend get little boring long winded also especially many fight scene actually good hence rate really almost br br one better kung fu movie made smack dab heart kung fu cinema prime really good kung fu movie mid late y notable exception late y early y early y fair | 1 | 1 | 1 |
3 | seems control freak heard comment losing control show tell another guest brought live animal one rule snake need hire comedy writer joke lame reason watch great guest band br br watched craig ferguson show show even worse like bull sh burn time think either man much future late night talk show br br daily also annoying habit sticking tongue lick lip must least time show like joe firstman band carson daily need lighten late | 0 | 0 | 0 |
4 | admittedly scene movie seem little unrealistic ravishing woman first panic minute later start kissing young lad old guy right next film go along learn little volatile girl slut partly explains behavior cinematography movie well done get see elevator almost every angle perspective image scene really raise tension g tz george play character well wannabe hot shot getting old overpowered young men like jaennicke character wolfgang kieling admired hitchcock torn curtain delivers great performance although say much far best actor play one critic complained unrealistic film real case emergency nothing would really happen realistic film mission impossible phone booth given fact talking movie movie always deal scene likely occur real life still enjoy movie lot better many thing see german tv day think vintage style added something film | 1 | 1 | 1 |
# Calculate the accuracy of the Vader Sentiment Analysis
data["sentiment"] = data["sentiment"].astype(int) #convert the sentiment column values into int data type
data["Text_Blob_pred"] = data["Text_Blob_pred"].astype(int) #convert the textblob predicted column values into int data type
metrics_score(data["sentiment"], data["Text_Blob_pred"])
precision recall f1-score support 0 0.88 0.46 0.61 4949 1 0.64 0.94 0.76 5033 accuracy 0.70 9982 macro avg 0.76 0.70 0.68 9982 weighted avg 0.76 0.70 0.68 9982
For sentiment analysis, we used two supervised learning approaches, TF-IDF and Bag-of-Words, and two unsupervised learning techniques, TextBlob and Vader.
In supervised learning techniques, TF-IDF outperforms BoW since it not only considers the frequency of words in the corpus but also considers their importance.
Textblob performed better than Vader among the unsupervised learning algorithms as our dataset comprises movie reviews which tend to use more formal language.