Business & Data Research
Posts
AB Testing between Control vs Treatment using multiple metrics

AB Testing between Control vs Treatment using multiple metrics

A/B testing using Python and Scipy

Mahesh Gurumoorthi
August 02, 2025

Why AB testing ?:

A/B testing is used to compare two versions (A: control, B: treatment) of a process, product, or feature to determine which performs better. It helps identify if changes lead to statistically significant improvements in key metrics, guiding data-driven decisions and minimizing guesswork.

About the dataset :

The dataset contains 127 metrics, each with results for 'Control' and 'Treatment' groups. For each metric, mean values are calculated and compared between groups. Statistical significance is assessed using t-tests, with p-values provided for most metrics. Most metrics show similar mean values between control and treatment, suggesting no major effect. Metrics with low p-values indicate significant differences and may require further analysis.

Step 1 : Importing Required Libraries and packages

import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import random

### Setting up the random seed: 
random.seed(123)

Step 2 : Reading the dataset using Pandas and perform the exploratory data analysis:

df = pd.read_csv('/Users/Sample Datasets Kaggle/A_B Test Results - Test Results.csv')

df.shape
(128, 4)

df.head()
	Metrics	Unnamed: 1	Variant	Unnamed: 3
0	Metric Category	Metric Name	Control	Treatment
1	Participants	user_counts	5,191	5,314
2	Account Type Switches	Free_to_Paid	4,719	4,603
3	Account Type Switches	Paid_to_Free	1,056	1,088
4	Account Type Switches	upsell	360	362

Note : 
AB testing is between control vs treatment baesd on the different metric details and depends on different metric

df.info
<bound method DataFrame.info of                    Metrics             Unnamed: 1  Variant Unnamed: 3
0          Metric Category            Metric Name  Control  Treatment
1             Participants            user_counts    5,191      5,314
2    Account Type Switches           Free_to_Paid    4,719      4,603
3    Account Type Switches           Paid_to_Free    1,056      1,088
4    Account Type Switches                 upsell      360        362
..                     ...                    ...      ...        ...
123               Template    _5Plus_Table_Clones        2          2
124               Template      _1Plus_PDF_Clones      429        363
125               Template      _5Plus_PDF_Clones       23         11
126               Template  _1Plus_Any_App_Clones      159        159
127               Template  _5Plus_Any_App_Clones        3          7

Step 3 : Perform the missing values and replace NA values with approriate values or with mean values

# Remove the current header and set row 0 as the new header
df.columns = df.iloc[0]
df = df.drop(df.index[0]).reset_index(drop=True)
df.index = range(1, len(df) + 1)  # Set proper indexing starting from 1

df.head()

Metric Category	Metric Name	Control	Treatment
1	Participants	user_counts	5,191	5,314
2	Account Type Switches	Free_to_Paid	4,719	4,603
3	Account Type Switches	Paid_to_Free	1,056	1,088
4	Account Type Switches	upsell	360	362
5	Account Type Switches	downgrade	1,069	1,106

missing_values = df.isna().any()
print("Find missing values with any missing values \n", missing_values)
Find missing values with any missing values 
 0
Metric Category    False
Metric Name        False
Control            False
Treatment          False
dtype: bool

total_missing = df.isna().sum().sum()
print(f'Total Missing values from the dataframe : {total_missing}') 

Total Missing values from the dataframe : 0

print(df.isna().sum())
Metric Category    0
Metric Name        0
Control            0
Treatment          0
dtype: int64

df.describe()
Metric Category	Metric Name	Control	Treatment
count	127	127	127	127
unique	18	127	117	108
top	App and Store	user_counts	3	1
freq	27	1	3	4

df.head()
Metric Category	Metric Name	Control	Treatment
1	Participants	user_counts	5,191	5,314
2	Account Type Switches	Free_to_Paid	4,719	4,603
3	Account Type Switches	Paid_to_Free	1,056	1,088
4	Account Type Switches	upsell	360	362
5	Account Type Switches	downgrade	1,069	1,106

df.columns
Index(['Metric Category', 'Metric Name', 'Control', 'Treatment'], dtype='object', name=0)

Checking the multiple columns NA values present or not

df[['Metric Category', 'Metric Name', 'Control', 'Treatment']].isna().sum()

Metric Category    0
Metric Name        0
Control            0
Treatment          0
dtype: int64

Step 4: Performing t-testing assuming independent samples:

from scipy.stats import ttest_ind


def run_t_test(control, treatment):
    # Ensure control and treatment are arrays or lists
    t_stat, p_val = ttest_ind(control, treatment, equal_var=False)
    return t_stat, p_val

def clean_and_convert(arr):
    return [float(i) for i in arr if i.replace('.', '', 1).isdigit()]

df['Control'] = df['Control'].apply(clean_and_convert)
df['Treatment'] = df['Treatment'].apply(clean_and_convert)

df['Control'] = df['Control'].apply(lambda x: [float(i) for i in x])
df['Treatment'] = df['Treatment'].apply(lambda x: [float(i) for i in x])

Step 5: Running t-test for each metric:

df[['t-statistic','p-value']] = df.apply(
    lambda row: pd.Series(run_t_test(row['Control'], row['Treatment'])),
    axis=1 
)

Step 6 : Generate the AB testing comparison between control vs treatment for all the metrics using matplot lib

plt.figure(figsize=(10, 6))
control_means = df['Control'].apply(np.mean)
treatment_means = df['Treatment'].apply(np.mean)
x = np.arange(len(df))

plt.plot(x, control_means, label='Control', marker='o')
plt.plot(x, treatment_means, label='Treatment', marker='o')
plt.xlabel('Metric Index')
plt.ylabel('Mean Value')
plt.title('Mean Comparison: Control vs Treatment')
plt.legend()
plt.tight_layout()
plt.show()

Summary of the final matplotlib plot:

The plot compares the mean values of the 'Control' and 'Treatment' groups for each metric in the dataset.
The x-axis represents the metric index (from 0 to 126), while the y-axis shows the mean value for each group.
Each line (with markers) visualizes the trend of mean values across all metrics for both groups.
The legend distinguishes between 'Control' and 'Treatment'.
This visualization helps to quickly identify metrics where the treatment group differs from the control group in terms of mean values.

Inference from the above plot :

For most of the data points between control vs treatment for each metric, we could not see any significant difference. Hence failure to reject null hypothesis.