๐Ÿ  Portal

Lab: SQL CASE Expressions

๐ŸŽฏ Learning Objectives

By the end of this lab, you will be able to: - Understand and use both Simple and Searched CASE expressions - Apply CASE expressions in SELECT, WHERE, HAVING, and ORDER BY clauses - Use CASE with aggregate functions for conditional counting and summing - Transform and categorize data using CASE expressions - Apply CASE expressions to solve real-world business problems

๐Ÿ“‹ Prerequisites

๐Ÿ“š Introduction

The CASE expression is one of SQL's most powerful and versatile tools. It allows you to implement conditional logic directly within your SQL queries, making your code more readable and often more efficient.

CASE Expression Syntax

-- Simple CASE (comparing against a single column)
CASE column_name
    WHEN value1 THEN result1
    WHEN value2 THEN result2
    ELSE default_result
END

-- Searched CASE (more flexible with conditions)
CASE
    WHEN condition1 THEN result1
    WHEN condition2 THEN result2
    ELSE default_result
END

Common Use Cases

  1. Conditional Aggregation: Count or sum only rows meeting specific criteria
  2. Data Transformation: Convert values or categorize data
  3. Complex Filtering: Implement sophisticated WHERE/HAVING conditions
  4. Pivot Operations: Transform rows into columns

๐Ÿš€ Getting Started

Step 1: Create the Database and Sample Data

We'll work with an app_events table that tracks user interactions in a mobile application. This realistic scenario demonstrates how CASE expressions can analyze user behavior patterns.

-- Create database
DROP DATABASE IF EXISTS case_lab;
CREATE DATABASE case_lab;
USE case_lab;

-- Create app_events table
CREATE TABLE app_events (
    event_id INT AUTO_INCREMENT PRIMARY KEY,
    session_id INT NOT NULL,
    user_id INT NOT NULL,
    event_type VARCHAR(20) NOT NULL,
    event_timestamp TIMESTAMP NOT NULL,
    INDEX idx_session (session_id),
    INDEX idx_user (user_id),
    INDEX idx_type (event_type)
);

-- Insert sample data with various event types
INSERT INTO app_events (session_id, user_id, event_type, event_timestamp) VALUES
-- Session 1: Active session with lots of scrolling, few clicks, no purchase
(1, 101, 'scroll', '2024-01-15 10:00:00'),
(1, 101, 'scroll', '2024-01-15 10:05:00'),
(1, 101, 'scroll', '2024-01-15 10:10:00'),
(1, 101, 'scroll', '2024-01-15 10:15:00'),
(1, 101, 'scroll', '2024-01-15 10:20:00'),
(1, 101, 'scroll', '2024-01-15 10:25:00'),
(1, 101, 'scroll', '2024-01-15 10:30:00'),
(1, 101, 'click', '2024-01-15 10:35:00'),
(1, 101, 'scroll', '2024-01-15 10:40:00'),

-- Session 2: Short session, doesn't meet criteria
(2, 102, 'scroll', '2024-01-15 11:00:00'),
(2, 102, 'scroll', '2024-01-15 11:05:00'),
(2, 102, 'click', '2024-01-15 11:08:00'),

-- Session 3: Session with purchase (should be excluded)
(3, 103, 'scroll', '2024-01-15 12:00:00'),
(3, 103, 'scroll', '2024-01-15 12:10:00'),
(3, 103, 'scroll', '2024-01-15 12:20:00'),
(3, 103, 'scroll', '2024-01-15 12:30:00'),
(3, 103, 'scroll', '2024-01-15 12:40:00'),
(3, 103, 'scroll', '2024-01-15 12:50:00'),
(3, 103, 'click', '2024-01-15 12:55:00'),
(3, 103, 'purchase', '2024-01-15 13:00:00'),

-- Session 4: Long session with many scrolls, few clicks, no purchase
(4, 104, 'scroll', '2024-01-15 14:00:00'),
(4, 104, 'scroll', '2024-01-15 14:08:00'),
(4, 104, 'scroll', '2024-01-15 14:16:00'),
(4, 104, 'scroll', '2024-01-15 14:24:00'),
(4, 104, 'scroll', '2024-01-15 14:32:00'),
(4, 104, 'scroll', '2024-01-15 14:40:00'),
(4, 104, 'scroll', '2024-01-15 14:48:00'),
(4, 104, 'scroll', '2024-01-15 14:56:00'),
(4, 104, 'scroll', '2024-01-15 15:04:00'),
(4, 104, 'click', '2024-01-15 15:12:00'),

-- Session 5: High scroll-to-click ratio session
(5, 105, 'scroll', '2024-01-15 16:00:00'),
(5, 105, 'scroll', '2024-01-15 16:07:00'),
(5, 105, 'scroll', '2024-01-15 16:14:00'),
(5, 105, 'scroll', '2024-01-15 16:21:00'),
(5, 105, 'scroll', '2024-01-15 16:28:00'),
(5, 105, 'scroll', '2024-01-15 16:35:00'),
(5, 105, 'scroll', '2024-01-15 16:42:00'),
(5, 105, 'scroll', '2024-01-15 16:49:00'),
(5, 105, 'scroll', '2024-01-15 16:56:00'),
(5, 105, 'scroll', '2024-01-15 17:03:00'),
(5, 105, 'click', '2024-01-15 17:10:00');

๐ŸŽ“ Problem: Finding Browse-Heavy Sessions

Business Goal: Identify user sessions that indicate a poor user experience - users who scroll a lot but don't click much and don't make purchases. These sessions may indicate: - Confusing UI/UX - Irrelevant content - Poor product recommendations - Potential bugs or performance issues

Criteria for Browse-Heavy Sessions:

  1. Session duration > 30 minutes
  2. At least 5 scroll events
  3. Click-to-scroll ratio < 0.2 (less than 20% clicks relative to scrolls)
  4. No purchase events

๐Ÿ’ก Solution: Using CASE Expressions

CASE expressions allow us to conditionally count and filter events directly within the query. Let's break down the solution step by step:

-- Write your MySQL query statement below
SELECT 
    session_id,
    user_id,
    TIMESTAMPDIFF(MINUTE, MIN(event_timestamp), MAX(event_timestamp)) AS session_duration_minutes,
    COUNT(CASE WHEN event_type = 'scroll' THEN 1 END) AS scroll_count
FROM app_events
GROUP BY session_id, user_id
HAVING session_duration_minutes > 30
    AND scroll_count >= 5
    AND COUNT(CASE WHEN event_type = 'click' THEN 1 END) / scroll_count < 0.2
    AND COUNT(CASE WHEN event_type = 'purchase' THEN 1 END) = 0
ORDER BY scroll_count DESC, session_id;

How This Query Works:

  1. Conditional Counting with CASE: sql COUNT(CASE WHEN event_type = 'scroll' THEN 1 END)
  2. The CASE expression returns 1 only when event_type is 'scroll'
  3. COUNT ignores NULL values, so only scroll events are counted
  4. This is more readable than filtering with WHERE

  5. Session Duration Calculation: sql TIMESTAMPDIFF(MINUTE, MIN(event_timestamp), MAX(event_timestamp))

  6. Calculates the difference between first and last event in minutes

  7. Click-to-Scroll Ratio: sql COUNT(CASE WHEN event_type = 'click' THEN 1 END) / scroll_count < 0.2

  8. Divides click count by scroll count to get the ratio

  9. No Purchase Condition: sql COUNT(CASE WHEN event_type = 'purchase' THEN 1 END) = 0

  10. Ensures no purchase events exist in the session

Why CASE Expressions Are Powerful:

๐Ÿ’ก Note: For students interested in comparing CASE expressions with alternative query approaches like CTEs (Common Table Expressions), see the Query Patterns and Optimization lab in the advanced section.

๐Ÿงช Practice Exercises

Exercise 1: Basic CASE Usage

Write a query to categorize sessions as 'short' (< 15 min), 'medium' (15-45 min), or 'long' (> 45 min):

-- Your solution here
Solution
SELECT 
    session_id,
    user_id,
    TIMESTAMPDIFF(MINUTE, MIN(event_timestamp), MAX(event_timestamp)) AS duration,
    CASE
        WHEN TIMESTAMPDIFF(MINUTE, MIN(event_timestamp), MAX(event_timestamp)) < 15 THEN 'short'
        WHEN TIMESTAMPDIFF(MINUTE, MIN(event_timestamp), MAX(event_timestamp)) <= 45 THEN 'medium'
        ELSE 'long'
    END AS session_category
FROM app_events
GROUP BY session_id, user_id;

Exercise 2: Multiple Conditional Counts

Write a query that shows for each session: - Total events - Number of scrolls - Number of clicks
- Number of purchases - An engagement score (scrolls * 1 + clicks * 3 + purchases * 10)

-- Your solution here
Solution
SELECT 
    session_id,
    user_id,
    COUNT(*) AS total_events,
    COUNT(CASE WHEN event_type = 'scroll' THEN 1 END) AS scroll_count,
    COUNT(CASE WHEN event_type = 'click' THEN 1 END) AS click_count,
    COUNT(CASE WHEN event_type = 'purchase' THEN 1 END) AS purchase_count,
    (COUNT(CASE WHEN event_type = 'scroll' THEN 1 END) * 1 +
     COUNT(CASE WHEN event_type = 'click' THEN 1 END) * 3 +
     COUNT(CASE WHEN event_type = 'purchase' THEN 1 END) * 10) AS engagement_score
FROM app_events
GROUP BY session_id, user_id
ORDER BY engagement_score DESC;

Exercise 3: CASE in WHERE Clause

Write a query to find sessions where: - If the session has any purchase, include it regardless of other criteria - If no purchase, only include if scroll_count >= 5 and duration > 20 minutes

-- Your solution here
Solution
SELECT 
    session_id,
    user_id,
    TIMESTAMPDIFF(MINUTE, MIN(event_timestamp), MAX(event_timestamp)) AS duration,
    COUNT(CASE WHEN event_type = 'scroll' THEN 1 END) AS scroll_count,
    COUNT(CASE WHEN event_type = 'purchase' THEN 1 END) AS purchase_count
FROM app_events
GROUP BY session_id, user_id
HAVING 
    CASE 
        WHEN COUNT(CASE WHEN event_type = 'purchase' THEN 1 END) > 0 THEN 1
        WHEN COUNT(CASE WHEN event_type = 'scroll' THEN 1 END) >= 5 
             AND TIMESTAMPDIFF(MINUTE, MIN(event_timestamp), MAX(event_timestamp)) > 20 THEN 1
        ELSE 0
    END = 1;

Exercise 4: User Activity Levels

Write a query to classify users based on their total activity across all sessions: - 'Highly Active': >= 30 total events - 'Moderately Active': 15-29 total events - 'Low Activity': < 15 total events

Include user_id, total_events, and activity_level in the output.

-- Your solution here
Solution
SELECT 
    user_id,
    COUNT(*) AS total_events,
    CASE
        WHEN COUNT(*) >= 30 THEN 'Highly Active'
        WHEN COUNT(*) >= 15 THEN 'Moderately Active'
        ELSE 'Low Activity'
    END AS activity_level
FROM app_events
GROUP BY user_id
ORDER BY total_events DESC;

๐ŸŽ“ Advanced Topics

Using CASE with SUM for Weighted Counts

-- Calculate engagement score with different weights
SELECT 
    session_id,
    user_id,
    SUM(CASE 
        WHEN event_type = 'scroll' THEN 1
        WHEN event_type = 'click' THEN 3
        WHEN event_type = 'purchase' THEN 10
        ELSE 0
    END) AS weighted_engagement
FROM app_events
GROUP BY session_id, user_id;

CASE in ORDER BY for Custom Sorting

-- Order by purchase first, then by scroll count
SELECT 
    session_id,
    user_id,
    COUNT(CASE WHEN event_type = 'purchase' THEN 1 END) AS purchases,
    COUNT(CASE WHEN event_type = 'scroll' THEN 1 END) AS scrolls
FROM app_events
GROUP BY session_id, user_id
ORDER BY 
    CASE WHEN COUNT(CASE WHEN event_type = 'purchase' THEN 1 END) > 0 THEN 0 ELSE 1 END,
    COUNT(CASE WHEN event_type = 'scroll' THEN 1 END) DESC;

โšก Performance Tips

๐Ÿ”‘ Key Takeaways

  1. CASE expressions implement conditional logic directly in SQL queries
  2. Two forms: Simple CASE (equality checks) and Searched CASE (complex conditions)
  3. Can be used in SELECT, WHERE, HAVING, and ORDER BY clauses
  4. Conditional Aggregation: CASE with COUNT/SUM enables powerful data analysis
  5. Data Transformation: Perfect for categorizing and pivoting data
  6. Performance: Efficient single-pass through data
  7. Readability: Makes conditional logic explicit and maintainable

๐Ÿ“š Additional Resources

๐ŸŽฏ Summary

In this lab, you learned: - How to use CASE expressions for conditional logic - The difference between Simple and Searched CASE - How CASE expressions work with aggregate functions for conditional counting - Using CASE in different query clauses (SELECT, WHERE, HAVING, ORDER BY) - Real-world applications of CASE expressions in data analysis - Performance considerations when using CASE expressions

๐Ÿ“– Next Steps

Keep practicing! CASE expressions are fundamental to becoming proficient in SQL data analysis.