Pandas to SQL Converter

Samples to Understand Pandas to SQL Query Conversion

Sample 1: GroupBy and Aggregate

Pandas Code:

import pandas as pd
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Alice', 'Bob'],
                   'Amount': [100, 200, 150, 50]})
df_grouped = df.groupby('Name').sum()

Expected SQL Output:

SELECT 
  Name, 
  SUM(Amount) AS Amount
FROM 
  TableName
GROUP BY 
  Name;

How it Works:

The Pandas .groupby() method groups the data by the Name column, and .sum() computes the sum of the Amount column for each group.
The equivalent SQL query uses the GROUP BY clause to group the data by Name and the SUM() function to aggregate the Amount values.

Sample 2: Filtering Rows with Condition

Pandas Code:

import pandas as pd
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie', 'David'],
                   'Age': [25, 30, 35, 40]})
df_filtered = df[df['Age'] > 30]

Expected SQL Output:

SELECT 
  Name, 
  Age
FROM 
  TableName
WHERE 
  Age > 30;

How it Works:

In Pandas, filtering is done by applying a condition on the Age column. Only rows where Age > 30 are retained.
In SQL, this translates to a WHERE clause that filters the rows where the Age column is greater than 30.

Sample 3: Joining DataFrames

Pandas Code:

import pandas as pd
df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [1, 2, 4], 'Age': [25, 30, 35]})
df_merged = pd.merge(df1, df2, on='ID', how='inner')

Expected SQL Output:

SELECT 
  df1.ID, 
  df1.Name, 
  df2.Age
FROM 
  Table1 df1
INNER JOIN 
  Table2 df2
ON 
  df1.ID = df2.ID;

How it Works:

The Pandas .merge() function is used to join two DataFrames on a common column (ID) with the how='inner' argument, meaning only rows with matching ID values in both DataFrames are kept.
In SQL, this is done using an INNER JOIN to combine the two tables (Table1 and Table2) on the ID column.

Sample 4: Sorting Data

Pandas Code:

import pandas as pd
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie', 'David'],
                   'Age': [25, 30, 35, 40]})
df_sorted = df.sort_values(by='Age', ascending=False)

Expected SQL Output:

SELECT 
  Name, 
  Age
FROM 
  TableName
ORDER BY 
  Age DESC;

How it Works:

In Pandas, .sort_values() sorts the DataFrame based on the Age column in descending order (ascending=False).
In SQL, this is achieved using the ORDER BY clause, where Age is sorted in descending order (DESC).

Sample 5: Adding a New Column

Pandas Code:

import pandas as pd
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie'],
                   'Age': [25, 30, 35]})
df['Salary'] = [50000, 60000, 70000]

Expected SQL Output:

SELECT 
  Name, 
  Age, 
  Salary
FROM 
  TableName;

How it Works:

In Pandas, a new column Salary is added to the DataFrame by assigning a list of values.
In SQL, adding a column is typically done by including the new column (Salary) in the SELECT statement. Since it’s already part of the data, we simply query the table to include it in the result.