Samples to Understand Pandas to SQL Query Conversion
Sample 1: GroupBy and Aggregate
Pandas Code:
import pandas as pd
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Alice', 'Bob'],
'Amount': [100, 200, 150, 50]})
df_grouped = df.groupby('Name').sum()Expected SQL Output:
SELECT
Name,
SUM(Amount) AS Amount
FROM
TableName
GROUP BY
Name;How it Works:
- The Pandas
.groupby()method groups the data by theNamecolumn, and.sum()computes the sum of theAmountcolumn for each group. - The equivalent SQL query uses the
GROUP BYclause to group the data byNameand theSUM()function to aggregate theAmountvalues.
Sample 2: Filtering Rows with Condition
Pandas Code:
import pandas as pd
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40]})
df_filtered = df[df['Age'] > 30]Expected SQL Output:
SELECT
Name,
Age
FROM
TableName
WHERE
Age > 30;How it Works:
- In Pandas, filtering is done by applying a condition on the
Agecolumn. Only rows whereAge > 30are retained. - In SQL, this translates to a
WHEREclause that filters the rows where theAgecolumn is greater than 30.
Sample 3: Joining DataFrames
Pandas Code:
import pandas as pd
df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [1, 2, 4], 'Age': [25, 30, 35]})
df_merged = pd.merge(df1, df2, on='ID', how='inner')Expected SQL Output:
SELECT
df1.ID,
df1.Name,
df2.Age
FROM
Table1 df1
INNER JOIN
Table2 df2
ON
df1.ID = df2.ID;How it Works:
- The Pandas
.merge()function is used to join two DataFrames on a common column (ID) with thehow='inner'argument, meaning only rows with matchingIDvalues in both DataFrames are kept. - In SQL, this is done using an
INNER JOINto combine the two tables (Table1andTable2) on theIDcolumn.
Sample 4: Sorting Data
Pandas Code:
import pandas as pd
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40]})
df_sorted = df.sort_values(by='Age', ascending=False)Expected SQL Output:
SELECT
Name,
Age
FROM
TableName
ORDER BY
Age DESC;How it Works:
- In Pandas,
.sort_values()sorts the DataFrame based on theAgecolumn in descending order (ascending=False). - In SQL, this is achieved using the
ORDER BYclause, whereAgeis sorted in descending order (DESC).
Sample 5: Adding a New Column
Pandas Code:
import pandas as pd
df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35]})
df['Salary'] = [50000, 60000, 70000]Expected SQL Output:
SELECT
Name,
Age,
Salary
FROM
TableName;How it Works:
- In Pandas, a new column
Salaryis added to the DataFrame by assigning a list of values. - In SQL, adding a column is typically done by including the new column (
Salary) in theSELECTstatement. Since it’s already part of the data, we simply query the table to include it in the result.
