Amazon Food Reviews (Word Cloud) with an example using Python

Word Cloud example using Python NLTK libraries

About the dataset :

This dataset consists of reviews of fine foods from Amazon. The data span a period of more than 10 years, including all ~500,000 reviews up to October 2012. Reviews include product and user information, ratings, and a plain text review. It also includes reviews from all other Amazon categories.

Amazon Food Review using NLTK:

Word Cloud is a visualization tool often used within text mining to show the most frequent words in a dataset. It’s great for quick, intuitive insights but doesn’t perform any analysis by itself

Step 1: Importing Required Libraries and packages

import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from PIL import Image
from wordcloud import WordCloud
from nltk.corpus import stopwords

Step 2: Reading the dataset using Pandas:

df = pd.read_csv('/Users/Sample Datasets Kaggle/AmazonFine/Reviews.csv')

Step 3: Describing the dataset using pandas library:

df.describe()

Step 4: Removing the NA values using dropna ()

df_clean = df.dropna()

Step 5: Reviewing the head of the dataset

df_clean.head()

Step 6: Review whether NA values were obtained or not in the dataset

df_clean.isna().sum()
df_clean.columns

Index(['Id', 'ProductId', 'UserId', 'ProfileName', 'HelpfulnessNumerator',
       'HelpfulnessDenominator', 'Score', 'Time', 'Summary', 'Text'],
      dtype='object')

Step 6: Generate the text column from the dataset df

df_clean_text = df_clean['Text']
df_clean_text.head()

0    I have bought several of the Vitality canned d...
1    Product arrived labeled as Jumbo Salted Peanut...
2    This is a confection that has been around a fe...
3    If you are looking for the secret ingredient i...
4    Great taffy at a great price.  There was a wid...
Name: Text, dtype: object

Step 7: Generate the Word Cloud using stop words

# Combine all text into a single string
text = " ".join(df_clean_text.astype(str))

# Generate the word cloud, removing stopwords
stop_words = set(stopwords.words('english'))
wc = WordCloud(stopwords=stop_words, background_color='white', width=800, height=400).generate(text)

# Display the word cloud
plt.figure(figsize=(15, 7))
plt.imshow(wc, interpolation='bilinear')
plt.axis('off')
plt.show()

Conclusion :

Text mining empowers us to extract meaningful insights from unstructured text data, and word clouds serve as an intuitive visual gateway into that process. By highlighting the most frequent terms in a dataset, word clouds make it easy to identify dominant themes, recurring patterns, and potential areas for deeper analysis. While they don't replace statistical rigor, they offer a compelling first look—bridging raw data with human understanding in a way that's both accessible and impactful