今天公司需要导出一批手动添加的人脉数据,涉及到一个看起来颇为复杂的 SQL 语句。正好借此机会,我们一起来拆解分析一下它的逻辑与结构,看看能否从中总结出一些实用的查询优化技巧。
前提分析
首先梳理一下这张报表涉及到的核心数据表:
- connections_batch_relation:批量添加人脉表
- connections_company_initialization:公司初始化任务表
- connections_my_relation:我的人脉表
- connections_task_data:数据处理任务表
- user:用户信息表
- winlabel_company:企业信息表
排查的第一步,是从 公司初始化任务表 中取出所有主账号的用户 ID(即 uid),并将其作为下面 SQL 中 @abc 变量的值。
接着,执行核心的 SQL 查询。其大致结构与变量设置如下:
SET @abc = '200001720,200000161,200000348,200000162,200001986,200000164,200002219,
200001983,200000651,200000808,200002217,200000328,200000809,200001774,200002582,
200001850,200002631,200002337,200002516,200000493,200000293,200002163,200001772,
200002720,200002718,200002733,200002736,200002740,200002753,200002819,200000890,
200002834,200002825,200002039,200000423,200002734,200002964,200002985,200002677,
200002998,200002975,200002480,200001992,200003070,200001854,200003172,200003178,
200002010,200003217,200002465,200001721,200002777,200003257,200003159,200003326,
200003398,200003365,200002021,200003491,200003573,200003580,200003465,200003409,
200003622,200000789,200003633,200001758,200001532,200003660,200002986,200003675,
200003677,200003679,200002935,200003705,200003712,200003715,200001745,200003787,
200003825,200001497,200003884,200003912,200003919,200003982,200003983,200003994,
200004002,200004019,200004048,200004063,200004082,200003724,200004098,200004101,
200004148,200004020,200004247,200004294,200004295,200004300,200003768,200002026,
200004440,200004406,200004450,200004463,200004506,200004530,200004697,200004470,
200001786,200001870,200004498,200002603,200004532,200004727,200003619,200004852,
200002896,200004872,200001755,200004720,200002147,200004928,200004369,200004935,
200004936,200002595,200002342,200004972,200004989,200004990,200004994,200005006';
-- SELECT DISTINCT temp2.pid FROM (
SELECT
IF(temp.`企业名称` is not null, temp.`企业名称`,
(SELECT company_name FROM connections_company_initialization
WHERE user_id IN (SELECT IF(parent_id = 0, id, parent_id) as 'pid'
FROM `user` WHERE id = temp.uid))
) AS '企业名称',
(SELECT IF(parent_id = 0, id, parent_id) as 'pid'
FROM `user` WHERE id = temp.uid) as 'pid',
NULL AS '数量',
temp.`uid`,
temp.`添加人脉企业`,
temp.`类型`,
temp.`添加时间`,
(SELECT EXISTS( SELECT * FROM connections_my_relation
WHERE user_id = temp.uid
AND company_name = temp.`添加人脉企业` )) AS '是否删除',
(SELECT phone FROM `user` WHERE id = temp.uid) as '手机号',
(SELECT contact_name FROM `user` WHERE id = temp.uid) as '姓名',
(SELECT CASE
WHEN user_comment is null THEN ''
WHEN user_comment = 0 THEN '过期用户'
WHEN user_comment = 1 THEN '新用户'
WHEN user_comment = 2 THEN '内部用户'
WHEN user_comment = 3 THEN '试用账户'
WHEN user_comment = 4 THEN '成交会员'
END AS userComment
FROM `user` WHERE id = temp.uid) as '用户备注',
(SELECT w.company_name FROM `user` u
LEFT JOIN winlabel_company w ON u.company_id = w.id
WHERE u.id = temp.uid) as '用户注册公司名称'
FROM
(
SELECT
null AS '企业名称',
tt2.user_id AS 'uid',
tt1.`name` AS '添加人脉企业',
(SELECT company_type FROM connections_my_relation mr
WHERE mr.user_id = tt2.user_id AND mr.company_name = tt1.`name`) AS '类型',
tt1.create_time AS '添加时间'
FROM
(
SELECT `name`, batch_id, user_id, create_time
FROM connections_task_data
WHERE join_type = 2
AND user_id IN (SELECT id FROM `user`
WHERE deleted = 0
AND (FIND_IN_SET(id, @abc) OR FIND_IN_SET(parent_id, @abc)))
) tt1
LEFT JOIN connections_batch_relation tt2 ON tt1.batch_id = tt2.batch_id
UNION
SELECT
null AS '企业名称',
user_id AS 'uid',
company_name AS '添加人脉企业',
company_type AS '类型',
create_time AS '添加时间'
FROM connections_my_relation
WHERE user_id IN (SELECT id FROM `user`
WHERE deleted = 0
AND (FIND_IN_SET(id, @abc) OR FIND_IN_SET(parent_id, @abc)))
AND batch_id != '预制数据'
AND CONCAT( company_name, user_id ) NOT IN
(SELECT CONCAT(tt1.`name`, tt2.user_id)
FROM
(SELECT `name`, batch_id, user_id, create_time
FROM connections_task_data
WHERE join_type = 2
AND user_id IN (SELECT id FROM `user`
WHERE deleted = 0
AND (FIND_IN_SET(id, @abc) OR FIND_IN_SET(parent_id, @abc)))
) tt1
LEFT JOIN connections_batch_relation tt2 ON tt1.batch_id = tt2.batch_id)
) temp
WHERE temp.`添加时间` >= '2024-02-19 00:00:00'
ORDER BY '企业名称' ASC
-- ) temp2 ORDER BY temp2.pid
从整体框架来看,这份看似冗余的 SQL 其实是对一个名为 temp 的临时表进行一次外层查询。

那么,temp 表到底是何方神圣?展开代码折叠后可以发现,它由两个字段结果一致的子查询通过 UNION 合并而来。
第一个子查询:起底 connections_task_data
第一个子查询主要关联 tt1 和 tt2 两张表。其中 tt2 就是 批量添加人脉表,而 tt1 本身又是一个子查询。先来看 tt1 的逻辑:
SELECT
`name`,
batch_id,
user_id,
create_time
FROM
connections_task_data
WHERE
join_type = 2
AND user_id IN (
SELECT id FROM `user`
WHERE deleted = 0
AND (FIND_IN_SET( id, @abc ) OR FIND_IN_SET( parent_id, @abc ))
)
) tt1
这部分代码是从 数据处理任务表 中捞取数据。它限制了 join_type = 2(通常用于过滤非直接人脉数据),并确保主账号 uid 是有效未删除的,且处于我们那一长串 ID 列表中。
再来看第一个子查询的主体:
SELECT NULL AS '企业名称',
tt2.user_id AS 'uid',
tt1.`name` AS '添加人脉企业',
( SELECT company_type FROM connections_my_relation mr
WHERE mr.user_id = tt2.user_id AND mr.company_name = tt1.`name` ) AS '类型',
tt1.create_time AS '添加时间'
FROM
( /* 此处为上述 tt1 查询 */ ) tt1
LEFT JOIN connections_batch_relation tt2 ON tt1.batch_id = tt2.batch_id
简单来说,第一个子查询的任务是:封装 tt1 表数据,并通过 batch_id 连接 tt2 表作为补充参考。它输出的字段包括:企业名称、uid、添加人脉企业、类型和添加时间。其业务含义是——从数据处理任务中,找到特定用户对应的人脉数据。
第二个子查询:来自 connections_my_relation 的补充
第二个子查询的结构如下:
SELECT NULL AS '企业名称',
user_id AS 'uid',
company_name AS '添加人脉企业',
company_type AS '类型',
create_time AS '添加时间'
FROM
connections_my_relation
WHERE
user_id IN (
SELECT id FROM `user`
WHERE deleted = 0
AND (FIND_IN_SET( id, @abc ) OR FIND_IN_SET( parent_id, @abc ))
)
AND batch_id != '预制数据'
AND CONCAT( company_name, user_id ) NOT IN (
SELECT CONCAT( tt1.`name`, tt2.user_id )
FROM (
SELECT `name`, batch_id, user_id, create_time
FROM connections_task_data
WHERE join_type = 2
AND user_id IN (
SELECT id FROM `user`
WHERE deleted = 0
AND (FIND_IN_SET( id, @abc ) OR FIND_IN_SET( parent_id, @abc ))
)
) tt1
LEFT JOIN connections_batch_relation tt2 ON tt1.batch_id = tt2.batch_id
)
这个查询直接从 我的人脉表 中取数。除了常规的用户有效性验证外,它还加上了两个关键过滤条件:
- 排除预置数据:
batch_id != '预制数据',过滤掉系统初始化产生的非手动数据。
- 关键去重逻辑:通过
CONCAT(company_name, user_id) 构造唯一标识,确保不会选出那些已经在第一个子查询(connections_task_data 中不展示的直接人脉数据)里出现过的组合。
最后,同样通过与 connections_batch_relation 的关联,来辅助完善批次信息。
由于两个子查询返回的字段列完全一致,UNION 操作符会将它们的结果集合并,并自动剔除重复的行。
经过上述操作,temp 临时表大致呈现了以下样子(运行结果示例):

总结一下:这个 SQL 巧妙地利用 UNION 合并了“任务流数据”与“个人库数据”这两个来源。外层通过复杂的标量子查询,在 user 与 winlabel_company 表间进行关联,补全了每条记录的企业详情、用户手机号、姓名及状态备注。最终,只需在 temp 表结果集上轻松完成数据导出。在日常的后端开发与数据维护工作中,理解这类多层嵌套查询的拆解逻辑,对于后续针对 数据库 性能的优化至关重要。同时,严谨地构建数据源并做好去重,也是保障报表准确性的第一步。