
来源: cnblogs.com/aaron8219/p/9259379.html
我们常听到一种说法:“使用了NULL值的列将会使索引失效”。但如果你实际测试一下就会发现,IS NULL这样的查询其实可以使用索引。可见,这个广为流传的说法存在漏洞。
如果你想直接看结论,可以跳到文章最后。
前言
NULL值是对列的一种特殊约束。当我们创建新列时,如果没有明确使用NOT NULL关键字进行声明,MySQL 就会默认为该列添加NULL约束。
有些开发者出于便利(减少判空代码)的考虑,在创建数据表时直接采用MySQL的默认设置,即允许字段为NULL值。然而,这一习惯很容易在使用NULL的场景中导致不确定的查询结果,甚至引发数据库性能下降。
什么是NULL?
NULL并不意味着什么都没有,我们必须注意 NULL 与 ''(空字符串)是两个完全不同的值。
MySQL中主要使用三个操作符来处理NULL值:
IS NULL
IS NOT NULL
<=> (太空船操作符)。这个操作符很像=,例如SELECT NULL<=>NULL会返回true,但SELECT NULL=NULL会返回false。
此外还有一个IFNULL()函数,具体用法请自行查阅。
示例
NULL通过任何比较操作符(除<=>外)与其他值比较,结果都会得到NULL。
1 (root@localhost mysql3306.sock)[zlm]> create table test_null(
2 -> id int not null,
3 -> name varchar(10)
4 -> );
5 Query OK, 0 rows affected (0.02 sec)
6
7 (root@localhost mysql3306.sock)[zlm]> insert into test_null values(1,'zlm');
8 Query OK, 1 row affected (0.00 sec)
9
10 (root@localhost mysql3306.sock)[zlm]> insert into test_null values(2,null);
11 Query OK, 1 row affected (0.00 sec)
12
13 (root@localhost mysql3306.sock)[zlm]> select * from test_null;
14 +----+------+
15 | id | name |
16 +----+------+
17 | 1 | zlm |
18 | 2 | NULL |
19 +----+------+
20 2 rows in set (0.00 sec)
21 // ---------------------->这个很有代表性<----------------------
22 (root@localhost mysql3306.sock)[zlm]> select * from test_null where name=null;
23 Empty set (0.00 sec)
24
25 (root@localhost mysql3306.sock)[zlm]> select * from test_null where name is null;
26 +----+------+
27 | id | name |
28 +----+------+
29 | 2 | NULL |
30 +----+------+
31 1 row in set (0.00 sec)
32
33 (root@localhost mysql3306.sock)[zlm]> select * from test_null where name is not null;
34 +----+------+
35 | id | name |
36 +----+------+
37 | 1 | zlm |
38 +----+------+
39 1 row in set (0.00 sec)
40
41 (root@localhost mysql3306.sock)[zlm]> select * from test_null where null=null;
42 Empty set (0.00 sec)
43
44 (root@localhost mysql3306.sock)[zlm]> select * from test_null where null<>null;
45 Empty set (0.00 sec)
46
47 (root@localhost mysql3306.sock)[zlm]> select * from test_null where null<=>null;
48 +----+------+
49 | id | name |
50 +----+------+
51 | 1 | zlm |
52 | 2 | NULL |
53 +----+------+
54 2 rows in set (0.00 sec)
55 // null<=>null always return true,it‘s equal to ”where 1=1“.
NULL代表一个“缺失且未知”的值。这意味着,即使是两个NULL,它们也不一定相等(类似于C语言中未初始化的局部变量)。
1 (root@localhost mysql3306.sock)[zlm]> SELECT 0 IS NULL, 0 IS NOT NULL, ‘‘ IS NULL, ‘‘ IS NOT NULL;
2 +-----------+---------------+------------+----------------+
3 | 0 IS NULL | 0 IS NOT NULL | ‘‘ IS NULL | ‘‘ IS NOT NULL |
4 +-----------+---------------+------------+----------------+
5 | 0 | 1 | 0 | 1 |
6 +-----------+---------------+------------+----------------+
7 1 row in set (0.00 sec)
8
9 //It‘s not equal to zero number or vacant string.
10 //In MySQL,0 means fasle,1 means true.
11
12 (root@localhost mysql3306.sock)[zlm]>SELECT 1 = NULL, 1 <> NULL, 1 < NULL, 1 > NULL;
13 +----------+-----------+----------+----------+
14 | 1 = NULL | 1 <> NULL | 1 < NULL | 1 > NULL |
15 +----------+-----------+----------+----------+
16 | NULL | NULL | NULL | NULL |
17 +----------+-----------+----------+----------+
18 1 row in set (0.00 sec)
19
20 //It cannot be compared with number.
21 //In MySQL,null means false,too.
任何包含NULL值的表达式,其结果都会是NULL。
1 (root@localhost mysql3306.sock)[zlm]> select ifnull(null, ’First is null‘),ifnull(null+10, ’First is null‘),ifnull(concat(’abc‘,null), ’First is null‘);
2 +------------------------------+---------------------------------+--------------------------------------------+
3 | ifnull(null,‘First is null‘) | ifnull(null+10,‘First is null‘) | ifnull(concat(’abc‘,null),‘First is null‘) |
4 +------------------------------+---------------------------------+--------------------------------------------+
5 | First is null | First is null | First is null |
6 +------------------------------+---------------------------------+--------------------------------------------+
7 1 row in set (0.00 sec)
8
9 //null value needs to be disposed with ifnull() function,what usually causes sql statement more complex.
10 //As we all know,MySQL does not support funcion index.Therefore,indexes on the column may not be used.That‘s really worse.
NULL带来的具体问题
影响统计结果
使用count(*)与count(null column)的结果不同,count(null column)的结果会小于或等于count(*)。
1 (root@localhost mysql3306.sock)[zlm]> select count(*),count(name) from test_null;
2 +----------+-------------+
3 | count(*) | count(name) |
4 +----------+-------------+
5 | 2 | 1 |
6 +----------+-------------+
7 1 row in set (0.00 sec)
8
9 //count(*) returns all rows ignore the null while count(name) returns the non-null rows in column ”name“.
10 //This will also leads to uncertainty if someone is unaware of the details above.
如果使用者对NULL属性不熟悉,很容易统计出错误的结果。
干扰分组、排序和去重
虽然SELECT NULL=NULL的结果为false,但在使用distinct、group by、order by时,所有NULL值会被认为是相同的。
1 (root@localhost mysql3306.sock)[zlm]> insert into test_null values(3,null);
2 Query OK, 1 row affected (0.00 sec)
3
4 (root@localhost mysql3306.sock)[zlm]> select distinct name from test_null;
5 +------+
6 | name |
7 +------+
8 | zlm |
9 | NULL |
10 +------+
11 2 rows in set (0.00 sec)
12
13 //Two rows of null value returned one and the result became two.
14
15 (root@localhost mysql3306.sock)[zlm]> select name from test_null group by name;
16 +------+
17 | name |
18 +------+
19 | NULL |
20 | zlm |
21 +------+
22 2 rows in set (0.00 sec)
23
24 //Two rows of null value were put into the same group.
25 //By default,group by will also sort the result(null row showed first).
26
27 (root@localhost mysql3306.sock)[zlm]> select id,name from test_null order by name;
28 +----+------+
29 | id | name |
30 +----+------+
31 | 2 | NULL |
32 | 3 | NULL |
33 | 1 | zlm |
34 +----+------+
35 3 rows in set (0.00 sec)
36
37 //Three rows were sorted(two null rows showed first).
关于索引的真相
与Oracle不同,MySQL支持在包含NULL值的列上使用索引。我们常说的“列上含有NULL会使索引失效”对MySQL而言并不完全准确。
1 (root@localhost mysql3306.sock)[sysbench]> show tables;
2 +--------------------+
3 | Tables_in_sysbench |
4 +--------------------+
5 | sbtest1 |
6 | sbtest10 |
7 | sbtest2 |
8 | sbtest3 |
9 | sbtest4 |
10 | sbtest5 |
11 | sbtest6 |
12 | sbtest7 |
13 | sbtest8 |
14 | sbtest9 |
15 +--------------------+
16 10 rows in set (0.00 sec)
17
18 (root@localhost mysql3306.sock)[sysbench]> show create table sbtest1\G
19 *************************** 1. row ***************************
20 Table: sbtest1
21 Create Table: CREATE TABLE `sbtest1` (
22 `id` int(11) NOT NULL AUTO_INCREMENT,
23 `k` int(11) NOT NULL DEFAULT ‘0‘,
24 `c` char(120) NOT NULL DEFAULT ‘‘,
25 `pad` char(60) NOT NULL DEFAULT ‘‘,
26 PRIMARY KEY (`id`),
27 KEY `k_1` (`k`)
28 ) ENGINE=InnoDB AUTO_INCREMENT=100001 DEFAULT CHARSET=utf8
29 1 row in set (0.00 sec)
30
31 (root@localhost mysql3306.sock)[sysbench]> alter table sbtest1 modify k int null,modify c char(120) null,modify pad char(60) null;
32 Query OK, 0 rows affected (4.14 sec)
33 Records: 0 Duplicates: 0 Warnings: 0
34
35 (root@localhost mysql3306.sock)[sysbench]> insert into sbtest1 values(100001,null,null,null);
36 Query OK, 1 row affected (0.00 sec)
37
38 (root@localhost mysql3306.sock)[sysbench]> explain select id,k from sbtest1 where id=100001;
39 +----+-------------+---------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
40 | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
41 +----+-------------+---------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
42 | 1 | SIMPLE | sbtest1 | NULL | const | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | NULL |
43 +----+-------------+---------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
44 1 row in set, 1 warning (0.00 sec)
45
46 (root@localhost mysql3306.sock)[sysbench]> explain select id,k from sbtest1 where k is null;
47 +----+-------------+---------+------------+------+---------------+------+---------+-------+------+----------+--------------------------+
48 | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
49 +----+-------------+---------+------------+------+---------------+------+---------+-------+------+----------+--------------------------+
50 | 1 | SIMPLE | sbtest1 | NULL | ref | k_1 | k_1 | 5 | const | 1 | 100.00 | Using where; Using index |
51 +----+-------------+---------+------------+------+---------------+------+---------+-------+------+----------+--------------------------+
52 1 row in set, 1 warning (0.00 sec)
53
54 //In the first query,the newly added row is retrieved(检索) by primary key.
55 //In the second query,the newly added row is retrieved by secondary key ”k_1“
56 //It has been proved that indexes can be used on the columns which contain null value.
通过explain可以看到MySQL确实支持在含有NULL值的列上使用索引。
需要注意的是,列“k”是int类型,占用4字节,但执行计划中key_len却显示为5。这是因为存储NULL值需要额外的1个字节来标记该行为NULL。
下面是一个补充的测试例子:
mysql> select * from test_1;
+-----------+------+------+
| name | code | id |
+-----------+------+------+
| gaoyi | wo | 1 |
| gaoyi | w | 2 |
| chuzhong | wo | 3 |
| chuzhong | w | 4 |
| xiaoxue | dd | 5 |
| xiaoxue | dfdf | 6 |
| sujianhui | su | 99 |
| sujianhui | NULL | 99 |
+-----------+------+------+
8 rows in set (0.00 sec)
mysql> explain select * from test_1 where code is NULL;
+----+-------------+--------+------------+------+---------------+------------+---------+-------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+------+---------------+------------+---------+-------+------+----------+-----------------------+
| 1 | SIMPLE | test_1 | NULL | ref | index_code | index_code | 161 | const | 1 | 100.00 | Using index condition |
+----+-------------+--------+------------+------+---------------+------------+---------+-------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)
mysql> explain select * from test_1 where code is not NULL;
+----+-------------+--------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
| 1 | SIMPLE | test_1 | NULL | range | index_code | index_code | 161 | NULL | 7 | 100.00 | Using index condition |
+----+-------------+--------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)
mysql> explain select * from test_1 where code=‘dd‘;
+----+-------------+--------+------------+------+---------------+------------+---------+-------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+------+---------------+------------+---------+-------+------+----------+-----------------------+
| 1 | SIMPLE | test_1 | NULL | ref | index_code | index_code | 161 | const | 1 | 100.00 | Using index condition |
+----+-------------+--------+------------+------+---------------+------------+---------+-------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)
mysql> explain select * from test_1 where code like ”dd%“;
+----+-------------+--------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
| 1 | SIMPLE | test_1 | NULL | range | index_code | index_code | 161 | NULL | 1 | 100.00 | Using index condition |
+----+-------------+--------+------------+-------+---------------+------------+---------+------+------+----------+-----------------------+
1 row in set, 1 warning (0.00 sec)
总结
在列中使用NULL值容易引发不可控的问题,有时还会严重拖慢系统性能。
例如:
- 影响聚合函数结果:对含有
NULL值的列进行统计计算(如count()、max()、min()),结果可能不符合预期。
- 干扰数据操作:影响
distinct、group by、order by的行为,导致错误的排序或分组结果。
- 增加代码复杂度:为了处理
NULL带来的不确定性,往往需要在SQL中使用IFNULL()等函数来确保结果可控,这增加了程序的复杂性。
- 额外存储开销:
NULL值并非占用原有字段的空间存储,而是需要额外申请一个字节来标记该字段为NULL(就像一个额外的标志位)。
鉴于以上种种弊端,我们不推荐在列中设置NULL作为默认值。最佳实践是在所有列上明确使用NOT NULL约束,并使用0、空字符串‘’或其他的业务默认值来替代NULL,从而确保数据的确定性和查询的高效性。
如果你对数据库设计规范、SQL优化等话题感兴趣,欢迎到 云栈社区 与更多开发者一起交流探讨。