悠悠楠杉
SQL数据分析实战:常用查询语句与聚合函数应用指南
一、SQL查询:从基础到进阶
1. SELECT核心语法
SELECT是SQL的"望远镜",决定了你能看到哪些数据:sql
-- 基础查询(注意避免SELECT *)
SELECT product_id, product_name, price
FROM products
WHERE price > 100;
实战技巧:
- 字段显式命名提升可读性
- WHERE条件遵循SARG原则(可优化索引使用)
2. 多表连接(JOIN)的四种方式
当数据分散在不同表时,JOIN如同"数据拼图":
sql
-- 内连接(只返回匹配记录)
SELECT o.orderid, c.customername
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id;
-- 左连接(保留左表全部记录)
SELECT p.productname, COALESCE(s.quantity, 0) AS stock
FROM products p
LEFT JOIN stock s ON p.id = s.productid;
易错点:
- 忘记连接条件会导致笛卡尔积
- NULL值处理需用COALESCE/NVL函数
二、聚合函数:数据计算的瑞士军刀
1. 五大核心聚合函数
| 函数 | 作用 | 示例 |
|----------|-----------------------|--------------------------|
| COUNT() | 计数(忽略NULL) | COUNT(DISTINCT user_id)
|
| AVG() | 平均值 | AVG(salary)
|
| SUM() | 求和 | SUM(revenue)
|
| MAX() | 最大值 | MAX(temperature)
|
| MIN() | 最小值 | MIN(create_time)
|
特殊场景:sql
-- 计算非重复值比例
SELECT COUNT(DISTINCT city)/COUNT(*) AS diversity_index
FROM users;
2. GROUP BY分组逻辑
GROUP BY如同"数据分类器":sql
-- 按部门统计薪资
SELECT
department,
AVG(salary) AS avg_salary,
COUNT(*) AS headcount
FROM employees
GROUP BY department
HAVING COUNT(*) > 5; -- 过滤分组结果
关键区别:
- WHERE:过滤原始数据
- HAVING:过滤分组后数据
三、真实业务场景案例
案例1:电商用户行为分析
sql
-- 计算每周复购率
SELECT
DATE_TRUNC('week', order_date) AS week,
COUNT(DISTINCT user_id) AS total_users,
COUNT(DISTINCT CASE WHEN order_count > 1 THEN user_id END) AS repeat_users,
ROUND(100.0 * COUNT(DISTINCT CASE WHEN order_count > 1 THEN user_id END) /
COUNT(DISTINCT user_id), 2) AS repeat_rate
FROM (
SELECT
user_id,
order_date,
COUNT(*) OVER (PARTITION BY user_id) AS order_count
FROM orders
) t
GROUP BY 1;
案例2:销售漏斗分析
sql
WITH funnel AS (
SELECT
COUNT(DISTINCT visit_id) AS visits,
COUNT(DISTINCT CASE WHEN page_type='checkout' THEN visit_id END) AS checkouts,
COUNT(DISTINCT order_id) AS purchases
FROM user_sessions
)
SELECT
visits,
ROUND(100.0*checkouts/visits,2) AS visit_to_checkout,
ROUND(100.0*purchases/checkouts,2) AS checkout_to_purchase
FROM funnel;
四、性能优化建议
- 索引策略:为WHERE、JOIN、GROUP BY字段建索引
- 执行计划:使用EXPLAIN分析查询路径
- 避免全表扫描:限制查询时间范围(如
WHERE date BETWEEN...
)
提示:在MySQL 8.0+中,窗口函数(如ROW_NUMBER())可替代复杂子查询,提升40%以上性能。
总结:SQL数据分析的本质是"用数据库语言提问"。掌握查询逻辑与聚合思维,就能将原始数据转化为商业洞察。建议从简单查询开始,逐步叠加JOIN和聚合,最终构建完整的分析模型。