Samples to Understand Pandas to SQL Query Conversion
Sample 1: GroupBy and Aggregate
Pandas Code:
import pandas as pd
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Alice', 'Bob'],
'Amount': [100, 200, 150, 50]})
df_grouped = df.groupby('Name').sum()
Expected SQL Output:
SELECT
Name,
SUM(Amount) AS Amount
FROM
TableName
GROUP BY
Name;
How it Works:
- The Pandas
.groupby()
method groups the data by theName
column, and.sum()
computes the sum of theAmount
column for each group. - The equivalent SQL query uses the
GROUP BY
clause to group the data byName
and theSUM()
function to aggregate theAmount
values.
Sample 2: Filtering Rows with Condition
Pandas Code:
import pandas as pd
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40]})
df_filtered = df[df['Age'] > 30]
Expected SQL Output:
SELECT
Name,
Age
FROM
TableName
WHERE
Age > 30;
How it Works:
- In Pandas, filtering is done by applying a condition on the
Age
column. Only rows whereAge > 30
are retained. - In SQL, this translates to a
WHERE
clause that filters the rows where theAge
column is greater than 30.
Sample 3: Joining DataFrames
Pandas Code:
import pandas as pd
df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [1, 2, 4], 'Age': [25, 30, 35]})
df_merged = pd.merge(df1, df2, on='ID', how='inner')
Expected SQL Output:
SELECT
df1.ID,
df1.Name,
df2.Age
FROM
Table1 df1
INNER JOIN
Table2 df2
ON
df1.ID = df2.ID;
How it Works:
- The Pandas
.merge()
function is used to join two DataFrames on a common column (ID
) with thehow='inner'
argument, meaning only rows with matchingID
values in both DataFrames are kept. - In SQL, this is done using an
INNER JOIN
to combine the two tables (Table1
andTable2
) on theID
column.
Sample 4: Sorting Data
Pandas Code:
import pandas as pd
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40]})
df_sorted = df.sort_values(by='Age', ascending=False)
Expected SQL Output:
SELECT
Name,
Age
FROM
TableName
ORDER BY
Age DESC;
How it Works:
- In Pandas,
.sort_values()
sorts the DataFrame based on theAge
column in descending order (ascending=False
). - In SQL, this is achieved using the
ORDER BY
clause, whereAge
is sorted in descending order (DESC
).
Sample 5: Adding a New Column
Pandas Code:
import pandas as pd
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35]})
df['Salary'] = [50000, 60000, 70000]
Expected SQL Output:
SELECT
Name,
Age,
Salary
FROM
TableName;
How it Works:
- In Pandas, a new column
Salary
is added to the DataFrame by assigning a list of values. - In SQL, adding a column is typically done by including the new column (
Salary
) in theSELECT
statement. Since it’s already part of the data, we simply query the table to include it in the result.