๐Ÿ  Portal

Lab: SQL Query Patterns and Alternative Approaches

๐ŸŽฏ Learning Objectives

By the end of this lab, you will be able to: - Understand different approaches to solving the same SQL problem - Compare CASE expressions with CTE (Common Table Expression) approaches - Analyze trade-offs between readability, performance, and maintainability - Choose the appropriate query pattern based on context and requirements - Debug complex queries by breaking them down into logical steps

๐Ÿ“‹ Prerequisites

๐Ÿ“š Introduction

In SQL, there are often multiple valid ways to solve the same problem. While all approaches may produce correct results, they can differ significantly in:

This lab explores these trade-offs by examining the same problem solved with different query patterns.

๐ŸŽ“ The Problem: Finding Browse-Heavy Sessions

Note: This lab uses the same dataset from the CASE Expressions lab. If you've already created the `case_lab` database, you can skip the setup and jump to the comparisons.

Setup (if needed)

-- You can use the same database from Lab 06
USE case_lab;

-- Verify data exists
SELECT COUNT(*) FROM app_events;

Business Requirement

Find user sessions that indicate a poor user experience - sessions where users:

  1. Spend > 30 minutes browsing
  2. Have at least 5 scroll events
  3. Have a click-to-scroll ratio < 0.2 (less than 20% clicks relative to scrolls)
  4. Don't make any purchases

๐Ÿ”ฅ Approach 1: Using CASE Expressions

The Query

SELECT 
    session_id,
    user_id,
    TIMESTAMPDIFF(MINUTE, MIN(event_timestamp), MAX(event_timestamp)) AS session_duration_minutes,
    COUNT(CASE WHEN event_type = 'scroll' THEN 1 END) AS scroll_count,
    COUNT(CASE WHEN event_type = 'click' THEN 1 END) AS click_count
FROM app_events
GROUP BY session_id, user_id
HAVING session_duration_minutes > 30
    AND scroll_count >= 5
    AND click_count / scroll_count < 0.2
    AND COUNT(CASE WHEN event_type = 'purchase' THEN 1 END) = 0
ORDER BY scroll_count DESC, session_id;

Analysis

Strengths:

Weaknesses:

๐Ÿ”„ Approach 2: Using CTEs (Common Table Expressions)

The Query

WITH A AS (
    -- Step 1: Count events by type for each session
    SELECT session_id, user_id, event_type, COUNT(*) AS c
    FROM app_events
    GROUP BY session_id, user_id, event_type
),
E AS (
    -- Step 2: Ensure all sessions have a 'click' row (even if count is 0)
    SELECT * FROM A
    UNION
    SELECT session_id, user_id, 'click', 0
    FROM A
    WHERE session_id NOT IN (
        SELECT session_id FROM A WHERE event_type = 'click'
    )
),
T AS (
    -- Step 3: Calculate session duration in seconds
    SELECT  
        session_id,
        UNIX_TIMESTAMP(MAX(event_timestamp)) - UNIX_TIMESTAMP(MIN(event_timestamp)) AS d
    FROM app_events
    GROUP BY session_id
),
TT AS (
    -- Step 4: Join everything together
    SELECT 
        T.session_id,
        E_c.user_id,
        E_s.c AS scroll_count,
        E_c.c AS clicks,
        T.d AS duration
    FROM E E_c
    JOIN E E_s ON E_c.session_id = E_s.session_id
    JOIN T ON E_c.session_id = T.session_id
    WHERE E_s.event_type = 'scroll'
        AND E_c.event_type = 'click'
)

-- Step 5: Apply filters and format output
SELECT 
    session_id,
    user_id,
    duration/60 AS session_duration_minutes,
    scroll_count,
    clicks AS click_count
FROM TT
WHERE session_id NOT IN (
    SELECT session_id FROM E WHERE event_type = 'purchase'
)
    AND duration > 30*60
    AND scroll_count >= 5
    AND clicks/scroll_count < 0.2
ORDER BY scroll_count DESC, session_id;

Analysis

Strengths:

Weaknesses:

๐Ÿ“Š Side-by-Side Comparison

Aspect CASE Expression CTE Approach
Lines of Code ~10 lines ~45 lines
Performance โšก Better - single scan โš ๏ธ Potentially slower - multiple operations
Readability (Experienced developers) โšก High - compact and clear โœ“ Good - explicit steps
Readability (Beginners) โš ๏ธ Requires CASE knowledge โšก Better - step-by-step logic
Debugging โš ๏ธ Harder - can't inspect intermediate steps easily โšก Easier - query each CTE separately
Maintainability โšก Better - less code to maintain โš ๏ธ More complex - multiple dependencies
Production Use โšก Preferred - efficient and concise โš ๏ธ Use cautiously - verify performance
Learning/Teaching โœ“ Good for teaching conditional aggregation โšก Excellent for showing query decomposition

๐Ÿค” When to Use Each Approach

Use CASE Expressions When:

Use CTEs When:

๐Ÿ’ก Hybrid Approach: Best of Both Worlds

Often, the best solution combines techniques strategically:

-- Use CTE for complex calculation, CASE for conditional aggregation
WITH session_stats AS (
    SELECT 
        session_id,
        user_id,
        TIMESTAMPDIFF(MINUTE, MIN(event_timestamp), MAX(event_timestamp)) AS duration_minutes,
        COUNT(CASE WHEN event_type = 'scroll' THEN 1 END) AS scrolls,
        COUNT(CASE WHEN event_type = 'click' THEN 1 END) AS clicks,
        COUNT(CASE WHEN event_type = 'purchase' THEN 1 END) AS purchases
    FROM app_events
    GROUP BY session_id, user_id
)
SELECT *
FROM session_stats
WHERE duration_minutes > 30
    AND scrolls >= 5
    AND clicks / scrolls < 0.2
    AND purchases = 0
ORDER BY scrolls DESC;

Benefits:

๐Ÿงช Practice: Choosing the Right Approach

Exercise 1: Analyze and Choose

For each scenario below, decide whether to use CASE, CTEs, or a hybrid approach. Justify your choice.

  1. Scenario: Calculate monthly revenue by product category for a dashboard that updates every minute. - Data size: 10 million rows - Requirement: Must be fast
  2. Scenario: Debug a complex query that's producing incorrect results. - Context: Multiple joins and aggregations - Need: Understand where the logic is failing
  3. Scenario: Create a one-time report for executives showing user engagement trends. - Context: Ad-hoc analysis - Audience: Non-technical stakeholders
Suggested Answers
  1. Use CASE expressions: Performance is critical with 10M rows and frequent updates. CASE with GROUP BY will be most efficient.
  2. Use CTEs: When debugging, ability to query each step independently is invaluable. CTEs make it easy to isolate the problem.
  3. Flexible - hybrid could work well: One-time report doesn't require peak performance, but should still be reasonably efficient. A hybrid approach balances clarity and performance.

๐Ÿ”‘ Key Takeaways

  1. Multiple valid solutions exist for most SQL problems
  2. CASE expressions are typically more performant and concise
  3. CTEs excel at debugging and breaking down complex logic
  4. Context matters: Choose based on performance needs, team experience, and maintenance requirements
  5. Hybrid approaches often provide the best balance
  6. Don't over-optimize: For small datasets or one-time queries, readability may trump performance
  7. Test both approaches with EXPLAIN ANALYZE when performance is critical

๐Ÿ“š Additional Resources

๐ŸŽฏ Summary

In this lab, you compared CASE expressions with CTE approaches for the same problem. You learned that:

Remember: The "best" query is the one that's correct, maintainable, and performant enough for your use case!