♦Photo by Simon Berger on UnsplashAs a beginner, it’s essential to have a good understanding of all steps in your learning journey. Skipping the fundamentals can leave gaps in your knowledge that might hinder your progress toward becoming a true SQL guru (this advice comes from my experience as a solo learner and many other SQL professionals🧑💻).
Think about how you prioritize your tasks based on urgency. It’s a common productivity tip that helps us manage our time effectively. Interestingly, SQL commands operate on a similar principle, following a specific order of execution that ensures efficient data processing.
Imagine SQL as a general leading a group of soldiers in a training camp. Each command is a soldier trained in tactics and strategies, ready to execute in a precise order to achieve the best results. What would happen if all the soldiers went to battle without any instruction? I don’t think it would end well for them or our SQL engine.
Overview of SQL ClausesBefore moving on to execution tactics, let’s remember all the clauses SQL provides us to solve our tasks:
- SELECT — Here is where you tell SQL what you want, not what you want to eat, but to retrieve from the database 😂; things named columns or attributes, such as FirstName and LastName from a table.
- FROM — When you go on a journey you take your map to know where you want to go, right? That’s the case here; to retrieve something SQL needs to know where it can find the information you demanded earlier, so here you specify the name of the table you want the data to come from.
- JOIN — This clause is a kind of host at a party that helps everyone meet everyone; in our SQL, it helps us to form a bond between 2 tables based on a common column.
♦Source: www.blazesql.com/blog/sql-meme4. WHERE — When you throw a birthday party you give invitations to your guests, because you’ll not invite the whole neighborhood, right? (If you do, please remember to invite me too, even if it’s on Teams 😄). So you create a list and invite your best friends, relatives, and probably a few others; that’s what our WHERE clause does, helps us to filter the data based on certain conditions (more about WHEREclause here)
5. GROUP BY — Some games are played in teams with different roles (warrior, shaman, etc.). To be assigned to a team you need to have some skills in common with the others on the team. This is how GROUP BY clause works, it groups rows that share the same values in specified columns.
6. HAVING — Want to see the products that gave you a profit above $1000, use a HAVING clause in your query, because this clause is top-notch when it comes to putting conditions over an aggregate column (more about HAVING clause here)
7. ORDER BY — I don't know if it happened to you, but in sports classes, I often had the extreme position when we had a handball match, even though I wanted to be a goalkeeper. And that’s because our teacher ordered us according to our height at the beginning of each class. The same thing happens with the ORDER BY clause, it orders your rows ascending or descending based on a column you specify.
8. LIMIT or TOP— imagine that you’re going to a party and at the entry there is a huge guy who checks the tickets and doesn’t let in more than 100 people; the same way the LIMIT or TOP clause ensures only a certain number of rows are retrieved.
Who’s first?Now that we have covered all the clauses typically used when composing a query, let's move on to see which clause comes first and what follows after it.
♦Made by Author with 💓Let the party begin and see which clause has a VIP ticket 😄
- FROM — It seems that someone was fast and caught an extra VIP ticket and has the best seat. This indicates the data sources and the place from which we will extract data.
- JOIN (if applicable) — if a single table is not enough to get what we need, we apply the join and forward the link to another table we will use as a source.
- WHERE — No query contains a filter, especially when working on a real-world project. So after having the sources we need to take only the rows we need and meet our conditions.
- GROUP BY — We got what we needed and now we have to group the rows, especially when we use aggregation functions; so for each group, there will be a calculated sum, average, or any other calculation.
- HAVING — Let’s say that we need a further filter, but this time, based on an aggregation function; that’s where the HAVING clause is next in line and served.
- SELECT — now that we have all the data we found in our sources, we need only a few columns; It’s your turn, baby, because we need to search the columns that were specified in the SELECT clause.
- ORDER BY — the data came wave, after wave, but we need them to be organized and theORDER BY clause in next to be executed and align the data based on the value in the columns that were specified in this clause.
- LIMIT or TOP— want a certain number of rows? This clause is next and the one that caught the last ticket (unfortunately 🙃).
Let’s play a bitIs theory effective without practice? True understanding comes from practical experience, so let's apply what we've learned.
We have 2 tables: Orders and Customers tables are linked by a foreign key on the customer_id column.
♦Made by Author with 💓The request is to take the first 3 customers based on who spent the most after January 1, 2023, and the total spent is more than 1000;
You can already mark it as done by using the following query that contains all the clauses we’ve talked about before:
SELECT TOP 3 customer_name, SUM(order_amount) AS total_spent
FROM orders
JOIN customers ON orders.customer_id = customers.customer_id
WHERE order_date > '2023-01-01'
GROUP BY customer_name
HAVING SUM(order_amount) > 1000
ORDER BY total_spent DESC
Interpretation of the codeThis query will:
- FROM — establishes the source from which to retrieve the data -> orderstable
- JOIN — get in touch with the other source -> customerstable
- WHERE — filter orders to include only those placed after January 1, 2023.
- GROUP BY — group the resulting rows by customer_name
- HAVING — the second filter to take only the total_spent that is bigger than 1000
- SELECT — take customer_name and the total amount spent (total_spent) for each customer_name
- ORDER BY — order the results by total_spent in descending order
- LIMIT or TOP — using SQL Server as IDE I used TOP clause to limit the output to the top 3 customers.
🥁 Result is 🥁
♦Made by Author with 💓Common Mistakes And Tips- Common Mistake: Forgot to add the GROUP BY clause when we have HAVING
Tip: Remember that HAVING is used to filter aggregated data. Use WHERE for non-aggregated data (want more details about these 2, click here 🙃)
- Common Mistake: Incorrectly ordering clauses.
Tip: Follow the logical order of execution I presented to you above to ensure your query runs correctly and has no error that will make you lose some good minutes of your life looking for the problem.
- Common Mistake: Forgetting about the ONconditions for a JOIN
Tip: To avoid any syntax errors (for me these errors are like the red light on my way home after a long day 😣) make sure that your JOIN is correct and completely written. Missing an ON condition is like forgetting the wheels on your bike—you're not going anywhere fast.
- Common Mistake: Assuming aggregation works without grouped data
Tip: Check your query before running it to ensure it contains all you need. Check twice, and run once.
- Common Mistake: Not adding all the non-aggregating values from SELECTin the GROUP BY clause.
Tip: Omitting some of the non-aggregating values leads to the following error and forces you to review more than once your query.
♦Pretty familiar, also for me 😅ConclusionKnowing the order of execution of the SQL clauses is like knowing the choreography of your dance. Each step (clause) must follow the previous one to create a harmonious and error-free query.
When you start writing your next query think of it as organizing a party: establish the location for your party (FROM), make sure the location has a parking spot (JOIN), choose your invitations (WHERE), group guests by their age (GROUP BY) because you still want your guests to enjoy your party with people to talk to, apply any special criteria (HAVING), arrange everything at the spot of your party (ORDER BY), and last but not least, manage the crowd ( LIMIT — for MySQL, or TOP — for SQL server, ROWNUM — for Oracle) at tables of n people.
Till next time I wish you:
Happy querying, and may your data always be well-behaved and your results ever insightful! 🤗
P.S.: Your query will not always contain all the clauses specified above, so even though not all of them are in, the order is the same.
P.S.S.: Below are some exercises to strengthen your new skills
Write a query to find:
- Top 5 Customers by Order Count
- Orders in the Last Month
- Total Revenue by Month
and here are the tables I used in my examples and you can use to solve the task above:
CREATE TABLE customers(
customer_id int NOT NULL PRIMARY KEY,
customer_name varchar(50) NULL
)
CREATE TABLE orders(
order_id int NOT NULL primary key,
customer_id int NULL,
order_date date NULL,
order_amount decimal(10, 2) NULL
)
INSERT customers(customer_id, customer_name) VALUES (1, N'Alice')
INSERT customers(customer_id, customer_name) VALUES (2, N'Bob')
INSERT customers(customer_id, customer_name) VALUES (3, N'Charlie')
INSERT customers(customer_id, customer_name) VALUES (4, N'Diana')
INSERT customers(customer_id, customer_name) VALUES (5, N'Edward')
INSERT orders(order_id, customer_id, order_date, order_amount) VALUES (1, 1, CAST(N'2023-01-15' AS Date), CAST(250.00 AS Decimal(10, 2)))
INSERT orders(order_id, customer_id, order_date, order_amount) VALUES (2, 1, CAST(N'2023-02-15' AS Date), CAST(350.00 AS Decimal(10, 2)))
INSERT orders(order_id, customer_id, order_date, order_amount) VALUES (3, 2, CAST(N'2023-01-20' AS Date), CAST(1000.00 AS Decimal(10, 2)))
INSERT orders(order_id, customer_id, order_date, order_amount) VALUES (4, 2, CAST(N'2023-02-25' AS Date), CAST(150.00 AS Decimal(10, 2)))
INSERT orders(order_id, customer_id, order_date, order_amount) VALUES (5, 3, CAST(N'2023-03-10' AS Date), CAST(750.00 AS Decimal(10, 2)))
INSERT orders(order_id, customer_id, order_date, order_amount) VALUES (6, 3, CAST(N'2023-03-15' AS Date), CAST(300.00 AS Decimal(10, 2)))
INSERT orders(order_id, customer_id, order_date, order_amount) VALUES (7, 4, CAST(N'2023-04-10' AS Date), CAST(1250.00 AS Decimal(10, 2)))
INSERT orders(order_id, customer_id, order_date, order_amount) VALUES (8, 5, CAST(N'2023-04-15' AS Date), CAST(500.00 AS Decimal(10, 2)))
INSERT orders(order_id, customer_id, order_date, order_amount) VALUES (9, 5, CAST(N'2023-05-01' AS Date), CAST(600.00 AS Decimal(10, 2)))
ALTER TABLE orders WITH CHECK ADD FOREIGN KEY(customer_id)
REFERENCES customers(customer_id)
Stay tuned and also, do not forget you can find other interesting data-related articles on my Medium.
If you like my articles, you can support me here. Thank you!🤗
♦♦The Mystery Behind the Curtains: How SQL Processes Clauses in Your Queries was originally published in Code Like A Girl on Medium, where people are continuing the conversation by highlighting and responding to this story.