- Business & Data Research
- Posts
- AB Testing between Control vs Treatment using multiple metrics
AB Testing between Control vs Treatment using multiple metrics
A/B testing using Python and Scipy

Why AB testing ?:
A/B testing is used to compare two versions (A: control, B: treatment) of a process, product, or feature to determine which performs better. It helps identify if changes lead to statistically significant improvements in key metrics, guiding data-driven decisions and minimizing guesswork.
About the dataset :
The dataset contains 127 metrics, each with results for 'Control' and 'Treatment' groups. For each metric, mean values are calculated and compared between groups. Statistical significance is assessed using t-tests, with p-values provided for most metrics. Most metrics show similar mean values between control and treatment, suggesting no major effect. Metrics with low p-values indicate significant differences and may require further analysis.
Step 1 : Importing Required Libraries and packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import random
### Setting up the random seed:
random.seed(123)
Step 2 : Reading the dataset using Pandas and perform the exploratory data analysis:
df = pd.read_csv('/Users/Sample Datasets Kaggle/A_B Test Results - Test Results.csv')
df.shape
(128, 4)
df.head()
Metrics Unnamed: 1 Variant Unnamed: 3
0 Metric Category Metric Name Control Treatment
1 Participants user_counts 5,191 5,314
2 Account Type Switches Free_to_Paid 4,719 4,603
3 Account Type Switches Paid_to_Free 1,056 1,088
4 Account Type Switches upsell 360 362
Note :
AB testing is between control vs treatment baesd on the different metric details and depends on different metric
df.info
<bound method DataFrame.info of Metrics Unnamed: 1 Variant Unnamed: 3
0 Metric Category Metric Name Control Treatment
1 Participants user_counts 5,191 5,314
2 Account Type Switches Free_to_Paid 4,719 4,603
3 Account Type Switches Paid_to_Free 1,056 1,088
4 Account Type Switches upsell 360 362
.. ... ... ... ...
123 Template _5Plus_Table_Clones 2 2
124 Template _1Plus_PDF_Clones 429 363
125 Template _5Plus_PDF_Clones 23 11
126 Template _1Plus_Any_App_Clones 159 159
127 Template _5Plus_Any_App_Clones 3 7
Step 3 : Perform the missing values and replace NA values with approriate values or with mean values
# Remove the current header and set row 0 as the new header
df.columns = df.iloc[0]
df = df.drop(df.index[0]).reset_index(drop=True)
df.index = range(1, len(df) + 1) # Set proper indexing starting from 1
df.head()
Metric Category Metric Name Control Treatment
1 Participants user_counts 5,191 5,314
2 Account Type Switches Free_to_Paid 4,719 4,603
3 Account Type Switches Paid_to_Free 1,056 1,088
4 Account Type Switches upsell 360 362
5 Account Type Switches downgrade 1,069 1,106
missing_values = df.isna().any()
print("Find missing values with any missing values \n", missing_values)
Find missing values with any missing values
0
Metric Category False
Metric Name False
Control False
Treatment False
dtype: bool
total_missing = df.isna().sum().sum()
print(f'Total Missing values from the dataframe : {total_missing}')
Total Missing values from the dataframe : 0
print(df.isna().sum())
Metric Category 0
Metric Name 0
Control 0
Treatment 0
dtype: int64
df.describe()
Metric Category Metric Name Control Treatment
count 127 127 127 127
unique 18 127 117 108
top App and Store user_counts 3 1
freq 27 1 3 4
df.head()
Metric Category Metric Name Control Treatment
1 Participants user_counts 5,191 5,314
2 Account Type Switches Free_to_Paid 4,719 4,603
3 Account Type Switches Paid_to_Free 1,056 1,088
4 Account Type Switches upsell 360 362
5 Account Type Switches downgrade 1,069 1,106
df.columns
Index(['Metric Category', 'Metric Name', 'Control', 'Treatment'], dtype='object', name=0)
Checking the multiple columns NA values present or not
df[['Metric Category', 'Metric Name', 'Control', 'Treatment']].isna().sum()
Metric Category 0
Metric Name 0
Control 0
Treatment 0
dtype: int64
Step 4: Performing t-testing assuming independent samples:
def run_t_test(control, treatment):
# Ensure control and treatment are arrays or lists
t_stat, p_val = ttest_ind(control, treatment, equal_var=False)
return t_stat, p_val
def clean_and_convert(arr):
return [float(i) for i in arr if i.replace('.', '', 1).isdigit()]
df['Control'] = df['Control'].apply(clean_and_convert)
df['Treatment'] = df['Treatment'].apply(clean_and_convert)
df['Control'] = df['Control'].apply(lambda x: [float(i) for i in x])
df['Treatment'] = df['Treatment'].apply(lambda x: [float(i) for i in x])
Step 5: Running t-test for each metric:
df[['t-statistic','p-value']] = df.apply(
lambda row: pd.Series(run_t_test(row['Control'], row['Treatment'])),
axis=1
)
Step 6 : Generate the AB testing comparison between control vs treatment for all the metrics using matplot lib
plt.figure(figsize=(10, 6))
control_means = df['Control'].apply(np.mean)
treatment_means = df['Treatment'].apply(np.mean)
x = np.arange(len(df))
plt.plot(x, control_means, label='Control', marker='o')
plt.plot(x, treatment_means, label='Treatment', marker='o')
plt.xlabel('Metric Index')
plt.ylabel('Mean Value')
plt.title('Mean Comparison: Control vs Treatment')
plt.legend()
plt.tight_layout()
plt.show()

Summary of the final matplotlib plot:
The plot compares the mean values of the 'Control' and 'Treatment' groups for each metric in the dataset.
The x-axis represents the metric index (from 0 to 126), while the y-axis shows the mean value for each group.
Each line (with markers) visualizes the trend of mean values across all metrics for both groups.
The legend distinguishes between 'Control' and 'Treatment'.
This visualization helps to quickly identify metrics where the treatment group differs from the control group in terms of mean values.
Inference from the above plot :
For most of the data points between control vs treatment for each metric, we could not see any significant difference. Hence failure to reject null hypothesis.