文章目录
  1. 1. ER分片
    1. 1.1. 按主键关联的ER分片
    2. 1.2. 按其他字段的ER分片
  2. 2. 连续分片
    1. 2.1. 自定义数字范围分片
    2. 2.2. 自然月分片
  3. 3. 离散分片
    1. 3.1. 枚举分片
    2. 3.2. 固定分片hash算法

MyCAT提供了多种分片规则,本文以实例展示这些常用的分片方法。

ER分片

传统的表分片多数是采用连续(如时间分片)或离散(如hash分片)分片,这样分片在有表join的时候,需要访问所有分片上的表,消耗巨大。ER分片解决了这个问题,将有关联的数据放在同一个分片上,这样,在需要表join的时候,只需要访问单个分片上有关联的数据即可,降低了系统资源的消耗。常见的ER分片包括,按主键关联的ER分片,按其他字段的ER分片等。

按主键关联的ER分片

假设有customer和orders表,customer按主键id分片,orders表的customer_id列与customer按主键id关联。schema.xml中可以如下配置,

1
2
3
4
5
6
<table name="customer1" primaryKey="ID" dataNode="dn1,dn2,dn3"
rule="mod-long">
<childTable name="orders1" primaryKey="ID" joinKey="customer_id"
parentKey="id">
</childTable>
</table>

rule.xml中可以如下配置,

1
2
3
4
<function name="mod-long" class="org.opencloudb.route.function.PartitionByMod">
<!-- how many data nodes -->
<property name="count">3</property>
</function>

建表语句,

1
2
create table customer1 (id bigint not null primary key, name varchar(100), register_date DATE);
create table orders1 (id bigint not null primary key, customer_id bigint, order_date DATE, money_paid decimal);

看一下插入是否已经执行分片,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
mysql> explain insert into customer1(id, name, register_date) values (10000, 'aaa', '20160131');
+-----------+----------------------------------------------------------------------------------+
| DATA_NODE | SQL |
+-----------+----------------------------------------------------------------------------------+
| dn2 | insert into customer1(id, name, register_date) values (10000, 'aaa', '20160131') |
+-----------+----------------------------------------------------------------------------------+
1 row in set (0.20 sec)
mysql> explain insert into customer1(id, name, register_date) values (20000, 'bbb', '20160131');
+-----------+----------------------------------------------------------------------------------+
| DATA_NODE | SQL |
+-----------+----------------------------------------------------------------------------------+
| dn3 | insert into customer1(id, name, register_date) values (20000, 'bbb', '20160131') |
+-----------+----------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> explain insert into customer1(id, name, register_date) values (30000, 'ccc', '20160131');
+-----------+----------------------------------------------------------------------------------+
| DATA_NODE | SQL |
+-----------+----------------------------------------------------------------------------------+
| dn1 | insert into customer1(id, name, register_date) values (30000, 'ccc', '20160131') |
+-----------+----------------------------------------------------------------------------------+
1 row in set (0.01 sec)

看来已经执行了分片,运行如下插入语句,

1
2
3
4
5
6
7
insert into customer1(id, name, register_date) values (10000, 'aaa', '20160131');
insert into customer1(id, name, register_date) values (20000, 'bbb', '20160131');
insert into customer1(id, name, register_date) values (30000, 'ccc', '20160131');
insert into orders1(id, customer_id, order_date, money_paid) values (90000, 10000, '20160131', 100);
insert into orders1(id, customer_id, order_date, money_paid) values (90000, 20000, '20160131', 100);
insert into orders1(id, customer_id, order_date, money_paid) values (90000, 30000, '20160131', 100);

执行中发现,对于order表的insert,没有explain,如果执行explain的话,会直接插入数据。执行查询语句,可以发现,已经使用了ER分片,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
mysql> explain select c.id, c.name, o.id from customer1 c, orders1 o where c.id=o.customer_id and c.id=10000;
+-----------+-----------------------------------------------------------------------------------------------+
| DATA_NODE | SQL |
+-----------+-----------------------------------------------------------------------------------------------+
| dn2 | select c.id, c.name, o.id from customer1 c, orders1 o where c.id=o.customer_id and c.id=10000 |
+-----------+-----------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
mysql> select c.id, c.name, o.id from customer1 c, orders1 o where c.id=o.customer_id and c.id=10000;
+-------+------+-------+
| id | name | id |
+-------+------+-------+
| 10000 | aaa | 90000 |
+-------+------+-------+
1 row in set (0.00 sec)

select语句对应的log如下,

1
2
3
4
5
6
7
8
9
10
11
12
01/31 18:12:33.823 DEBUG [$_NIOREACTOR-0-RW] (EnchachePool.java:76) -SQLRouteCache miss cache ,key:TESTDBselect c.id, c.name, o.id from customer1 c, orders1 o where c.id=o.customer_id and c.id=10000
01/31 18:12:33.825 DEBUG [$_NIOREACTOR-0-RW] (RouterUtil.java:951) -try to find cache by primary key
01/31 18:12:33.825 DEBUG [$_NIOREACTOR-0-RW] (EnchachePool.java:76) -TableID2DataNodeCache.TESTDB_CUSTOMER1 miss cache ,key:10000
01/31 18:12:33.826 DEBUG [$_NIOREACTOR-0-RW] (EnchachePool.java:59) -SQLRouteCache add cache ,key:TESTDBselect c.id, c.name, o.id from customer1 c, orders1 o where c.id=o.customer_id and c.id=10000 value:select c.id, c.name, o.id from customer1 c, orders1 o where c.id=o.customer_id and c.id=10000, route={
1 -> dn2{select c.id, c.name, o.id from customer1 c, orders1 o where c.id=o.customer_id and c.id=10000}
}
01/31 18:12:33.826 DEBUG [$_NIOREACTOR-0-RW] (NonBlockingSession.java:113) -ServerConnection [id=1, schema=TESTDB, host=192.168.1.6, user=test,txIsolation=3, autocommit=true, schema=TESTDB]select c.id, c.name, o.id from customer1 c, orders1 o where c.id=o.customer_id and c.id=10000, route={
1 -> dn2{select c.id, c.name, o.id from customer1 c, orders1 o where c.id=o.customer_id and c.id=10000}
} rrs
01/31 18:12:33.826 DEBUG [$_NIOREACTOR-0-RW] (PhysicalDBPool.java:452) -select read source hostM1 for dataHost:localhost1
01/31 18:12:33.827 DEBUG [$_NIOREACTOR-3-RW] (NonBlockingSession.java:229) -release connection MySQLConnection [id=7, lastTime=1454235153813, user=root, schema=db2, old shema=db2, borrowed=true, fromSlaveDB=false, threadId=2, charset=latin1, txIsolation=3, autocommit=true, attachment=dn2{select c.id, c.name, o.id from customer1 c, orders1 o where c.id=o.customer_id and c.id=10000}, respHandler=SingleNodeHandler [node=dn2{select c.id, c.name, o.id from customer1 c, orders1 o where c.id=o.customer_id and c.id=10000}, packetId=6], host=192.168.1.6, port=3306, statusSync=null, writeQueue=0, modifiedSQLExecuted=false]
01/31 18:12:33.827 DEBUG [$_NIOREACTOR-3-RW] (PhysicalDatasource.java:403) -release channel MySQLConnection [id=7, lastTime=1454235153813, user=root, schema=db2, old shema=db2, borrowed=true, fromSlaveDB=false, threadId=2, charset=latin1, txIsolation=3, autocommit=true, attachment=null, respHandler=null, host=192.168.1.6, port=3306, statusSync=null, writeQueue=0, modifiedSQLExecuted=false]

可见,只执行了dn2上的sql,其他分片上没有sql执行。

按其他字段的ER分片

仍然假设有customer和orders表,customer按列register_date分片,orders表的customer_id列与customer按主键id关联。schema.xml中可以如下配置,

1
2
3
4
5
6
<table name="customer2" primaryKey="ID" dataNode="dn1,dn2,dn3"
rule="sharding-by-namehash">
<childTable name="orders2" primaryKey="ID" joinKey="customer_id"
parentKey="id">
</childTable>
</table>

rule.xml中可以如下配置,

1
2
3
4
5
6
7
8
9
10
11
12
13
<tableRule name="sharding-by-namehash">
<rule>
<columns>name</columns>
<algorithm>namehash</algorithm>
</rule>
</tableRule>
<function name="namehash"
class="org.opencloudb.route.function.PartitionByMurmurHash">
<property name="seed">0</property><!-- 默认是0 -->
<property name="count">3</property><!-- 要分片的数据库节点数量,必须指定,否则没法分片 -->
<property name="virtualBucketTimes">160</property>
</function>

建表语句和插入语句与前文类似,如下,

1
2
3
4
5
6
7
8
9
10
create table customer2 (id bigint not null primary key, name varchar(100), register_date DATE);
create table orders2 (id bigint not null primary key, customer_id bigint, order_date DATE, money_paid decimal);
insert into customer2(id, name, register_date) values (10000, 'aaa', '20160131');
insert into customer2(id, name, register_date) values (20000, 'bbb', '20160131');
insert into customer2(id, name, register_date) values (30000, 'eee', '20160131');
insert into orders2(id, customer_id, order_date, money_paid) values (90000, 10000, '20160131', 100);
insert into orders2(id, customer_id, order_date, money_paid) values (90000, 20000, '20160131', 100);
insert into orders2(id, customer_id, order_date, money_paid) values (90000, 30000, '20160131', 100);

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
mysql> explain insert into customer2(id, name, register_date) values (10000, 'aaa', '20160131');
+-----------+----------------------------------------------------------------------------------+
| DATA_NODE | SQL |
+-----------+----------------------------------------------------------------------------------+
| dn2 | insert into customer2(id, name, register_date) values (10000, 'aaa', '20160131') |
+-----------+----------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> explain insert into customer2(id, name, register_date) values (20000, 'bbb', '20160131');
+-----------+----------------------------------------------------------------------------------+
| DATA_NODE | SQL |
+-----------+----------------------------------------------------------------------------------+
| dn1 | insert into customer2(id, name, register_date) values (20000, 'bbb', '20160131') |
+-----------+----------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> explain insert into customer2(id, name, register_date) values (30000, 'eee', '20160131');
+-----------+----------------------------------------------------------------------------------+
| DATA_NODE | SQL |
+-----------+----------------------------------------------------------------------------------+
| dn3 | insert into customer2(id, name, register_date) values (30000, 'eee', '20160131') |
+-----------+----------------------------------------------------------------------------------+
1 row in set (0.00 sec)

查看select的explain,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
mysql> explain select c.id, c.name, o.id from customer2 c, orders2 o where c.id=o.customer_id and c.name='aaa';
+-----------+-------------------------------------------------------------------------------------------------+
| DATA_NODE | SQL |
+-----------+-------------------------------------------------------------------------------------------------+
| dn2 | select c.id, c.name, o.id from customer2 c, orders2 o where c.id=o.customer_id and c.name='aaa' |
+-----------+-------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> select c.id, c.name, o.id from customer2 c, orders2 o where c.id=o.customer_id and c.name='aaa';
+-------+------+-------+
| id | name | id |
+-------+------+-------+
| 10000 | aaa | 90000 |
+-------+------+-------+
1 row in set (0.00 sec)

select语句对应的log如下,

1
2
3
4
5
6
7
8
9
10
11
12
01/31 19:32:55.975 DEBUG [$_NIOREACTOR-0-RW] (ServerQueryHandler.java:56) -ServerConnection [id=1, schema=TESTDB, host=192.168.1.6, user=test,txIsolation=3, autocommit=true, schema=TESTDB]select c.id, c.name, o.id from customer2 c, orders2 o where c.id=o.customer_id and c.name='aaa'
01/31 19:32:55.976 DEBUG [$_NIOREACTOR-0-RW] (EnchachePool.java:70) -SQLRouteCache hit cache ,key:TESTDBselect c.id, c.name, o.id from customer2 c, orders2 o where c.id=o.customer_id and c.name='aaa'
01/31 19:32:55.976 DEBUG [$_NIOREACTOR-0-RW] (NonBlockingSession.java:113) -ServerConnection [id=1, schema=TESTDB, host=192.168.1.6, user=test,txIsolation=3, autocommit=true, schema=TESTDB]select c.id, c.name, o.id from customer2 c, orders2 o where c.id=o.customer_id and c.name='aaa', route={
1 -> dn2{select c.id, c.name, o.id from customer2 c, orders2 o where c.id=o.customer_id and c.name='aaa'}
} rrs
01/31 19:32:55.976 DEBUG [$_NIOREACTOR-0-RW] (PhysicalDBPool.java:452) -select read source hostM1 for dataHost:localhost1
01/31 19:32:55.977 DEBUG [$_NIOREACTOR-3-RW] (NonBlockingSession.java:229) -release connection MySQLConnection [id=3, lastTime=1454239975957, user=root, schema=db2, old shema=db2, borrowed=true, fromSlaveDB=false, threadId=26, charset=latin1, txIsolation=3, autocommit=true, attachment=dn2{select c.id, c.name, o.id from customer2 c, orders2 o where c.id=o.customer_id and c.name='aaa'}, respHandler=SingleNodeHandler [node=dn2{select c.id, c.name, o.id from customer2 c, orders2 o where c.id=o.customer_id and c.name='aaa'}, packetId=6], host=192.168.1.6, port=3306, statusSync=null, writeQueue=0, modifiedSQLExecuted=false]
01/31 19:32:55.977 DEBUG [$_NIOREACTOR-3-RW] (PhysicalDatasource.java:403) -release channel MySQLConnection [id=3, lastTime=1454239975957, user=root, schema=db2, old shema=db2, borrowed=true, fromSlaveDB=false, threadId=26, charset=latin1, txIsolation=3, autocommit=true, attachment=null, respHandler=null, host=192.168.1.6, port=3306, statusSync=null, writeQueue=0, modifiedSQLExecuted=false]
01/31 19:32:56.751 DEBUG [Timer0] (SQLJob.java:85) -con query sql:select user() to con:MySQLConnection [id=5, lastTime=1454239976751, user=root, schema=db3, old shema=db3, borrowed=true, fromSlaveDB=false, threadId=18, charset=utf8, txIsolation=0, autocommit=true, attachment=null, respHandler=null, host=192.168.1.6, port=3306, statusSync=null, writeQueue=0, modifiedSQLExecuted=false]
01/31 19:32:56.752 DEBUG [Timer0] (SQLJob.java:85) -con query sql:select user() to con:MySQLConnection [id=11, lastTime=1454239976752, user=root, schema=db3, old shema=db3, borrowed=true, fromSlaveDB=false, threadId=6, charset=utf8, txIsolation=0, autocommit=true, attachment=null, respHandler=null, host=192.168.1.6, port=3316, statusSync=null, writeQueue=0, modifiedSQLExecuted=false]
01/31 19:32:56.752 DEBUG [$_NIOREACTOR-1-RW] (PhysicalDatasource.java:403) -release channel MySQLConnection [id=5, lastTime=1454239976741, user=root, schema=db3, old shema=db3, borrowed=true, fromSlaveDB=false, threadId=18, charset=utf8, txIsolation=0, autocommit=true, attachment=null, respHandler=null, host=192.168.1.6, port=3306, statusSync=null, writeQueue=0, modifiedSQLExecuted=false]
01/31 19:32:56.753 DEBUG [$_NIOREACTOR-3-RW] (PhysicalDatasource.java:403) -release channel MySQLConnection [id=11, lastTime=1454239976741, user=root, schema=db3, old shema=db3, borrowed=true, fromSlaveDB=false, threadId=6, charset=utf8, txIsolation=0, autocommit=true, attachment=null, respHandler=null, host=192.168.1.6, port=3316, statusSync=null, writeQueue=0, modifiedSQLExecuted=false]

可见,只在dn2上执行了sql,其他分片没有执行sql。

连续分片

MyCAT连续分片有很多方法,如,自定义数字范围分片、按日期(天)分片、按单月小时分片、自然月分片等。其特点是天然的支持范围查询sql,不需要去每个分片上查找,缺点是,多数分片方法有数据范围限制,本文介绍其中两种。

自定义数字范围分片

自定义数字范围分片是对指定的数字(一般是主键)指定所在范围进行分片,比较简单直观。需要先在schema.xml中配置,

1
<table name="test_c1" dataNode="dn1,dn2,dn3" rule="auto-sharding-range" />

rule.xml中配置,

1
2
3
4
5
6
7
8
9
10
<tableRule name="auto-sharding-range">
<rule>
<columns>id</columns>
<algorithm>rang-long</algorithm>
</rule>
</tableRule>
<function name="auto-sharding-range"
class="org.opencloudb.route.function.AutoPartitionByLong">
<property name="mapFile">auto-sharding-range.txt</property>
</function>

以上配置指定了,在mycat的conf目录下的auto-sharding-range.txt文件,作为范围分片文件,文件内容如下,

1
2
3
4
5
# range start-end ,data node index
# K=1000,M=10000.
0-50K=0
50K-100K=1
100K-150K=2

该配置表示,id在0~50K的时候,存在0分片(即dn1);id在50~100K的时候,存在1分片(即dn2);id在100K~150K的时候,存在2分片(即dn3)。

测试表的建表和插入语句如下,

1
2
3
4
5
create table test_c1(id bigint not null primary key, name varchar(100), register_date DATE);
insert into test_c1(id, name, register_date) values (10000, 'aaa', '20160131');
insert into test_c1(id, name, register_date) values (70000, 'bbb', '20160131');
insert into test_c1(id, name, register_date) values (130000, 'ccc', '20160131');

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
mysql> explain insert into test_c1(id, name, register_date) values (10000, 'aaa', '20160131');
+-----------+--------------------------------------------------------------------------------+
| DATA_NODE | SQL |
+-----------+--------------------------------------------------------------------------------+
| dn1 | insert into test_c1(id, name, register_date) values (10000, 'aaa', '20160131') |
+-----------+--------------------------------------------------------------------------------+
1 row in set (0.01 sec)
mysql> explain insert into test_c1(id, name, register_date) values (70000, 'bbb', '20160131');
+-----------+--------------------------------------------------------------------------------+
| DATA_NODE | SQL |
+-----------+--------------------------------------------------------------------------------+
| dn2 | insert into test_c1(id, name, register_date) values (70000, 'bbb', '20160131') |
+-----------+--------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> explain insert into test_c1(id, name, register_date) values (130000, 'ccc', '20160131');
+-----------+---------------------------------------------------------------------------------+
| DATA_NODE | SQL |
+-----------+---------------------------------------------------------------------------------+
| dn3 | insert into test_c1(id, name, register_date) values (130000, 'ccc', '20160131') |
+-----------+---------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> explain insert into test_c1(id, name, register_date) values (150001, 'aaa', '20160131');
ERROR 1064 (HY000): can't find any valid datanode :TEST_C1 -> ID -> 150001

以上可以看出,如果插入数据超出了分片的范围,就会报错。一般select语句的explain如下,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
mysql> select * from test_c1 where id between 100 and 20000;
+-------+------+---------------+
| id | name | register_date |
+-------+------+---------------+
| 10000 | aaa | 2016-01-31 |
+-------+------+---------------+
1 row in set (0.10 sec)
mysql> explain select * from test_c1 where id between 100 and 20000;
+-----------+----------------------------------------------------------------+
| DATA_NODE | SQL |
+-----------+----------------------------------------------------------------+
| dn1 | SELECT * FROM test_c1 WHERE id BETWEEN 100 AND 20000 LIMIT 100 |
+-----------+----------------------------------------------------------------+
1 row in set (0.00 sec)

可见,只访问的dn1,自动范围分片生效。

自然月分片

自然月分片是说每个自然月一个分片。需要注意的是:需要提前将分片规划好,建好,否则有可能日期超出实际配置分片数。需要先在schema.xml中配置,

1
<table name="test_c2" dataNode="dn1,dn2,dn3" rule="auto-sharding-month" />

rule.xml中配置,

1
2
3
4
5
6
7
8
9
10
11
<tableRule name="auto-sharding-month">
<rule>
<columns>register_date</columns>
<algorithm>auto-sharding-month</algorithm>
</rule>
</tableRule>
<function name="auto-sharding-month" class="org.opencloudb.route.function.PartitionByMonth">
<property name="dateFormat">yyyy-MM-dd</property>
<property name="sBeginDate">2016-01-01</property>
</function>

测试表的建表和插入语句如下,注意日期格式已经定义为yyyy-MM-dd形式,

1
2
3
4
5
create table test_c2(id bigint not null primary key, name varchar(100), register_date DATE);
insert into test_c2(id, name, register_date) values (10000, 'aaa', '2016-01-01');
insert into test_c2(id, name, register_date) values (70000, 'bbb', '2016-02-01');
insert into test_c2(id, name, register_date) values (130000, 'ccc', '2016-03-01');

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
mysql> explain insert into test_c2(id, name, register_date) values (10000, 'aaa', '2016-01-01');
+-----------+----------------------------------------------------------------------------------+
| DATA_NODE | SQL |
+-----------+----------------------------------------------------------------------------------+
| dn1 | insert into test_c2(id, name, register_date) values (10000, 'aaa', '2016-01-01') |
+-----------+----------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> explain insert into test_c2(id, name, register_date) values (70000, 'bbb', '2016-02-01');
+-----------+----------------------------------------------------------------------------------+
| DATA_NODE | SQL |
+-----------+----------------------------------------------------------------------------------+
| dn2 | insert into test_c2(id, name, register_date) values (70000, 'bbb', '2016-02-01') |
+-----------+----------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> explain insert into test_c2(id, name, register_date) values (130000, 'ccc', '2016-03-01');
+-----------+-----------------------------------------------------------------------------------+
| DATA_NODE | SQL |
+-----------+-----------------------------------------------------------------------------------+
| dn3 | insert into test_c2(id, name, register_date) values (130000, 'ccc', '2016-03-01') |
+-----------+-----------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> explain insert into test_c2(id, name, register_date) values (230000, 'ddd', '2016-04-01');
ERROR 1064 (HY000): Index: 3, Size: 3

以上可以看出,如果插入数据超出了分片的时间范围,就会报错。一般select语句的explain如下,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
mysql> explain select * from test_c2 where register_date between '2016-01-01' and '2016-01-15';
+-----------+-------------------------------------------------------------------------------------------+
| DATA_NODE | SQL |
+-----------+-------------------------------------------------------------------------------------------+
| dn1 | SELECT * FROM test_c2 WHERE register_date BETWEEN '2016-01-01' AND '2016-01-15' LIMIT 100 |
+-----------+-------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> select * from test_c2 where register_date between '2016-01-01' and '2016-01-15';
+-------+------+---------------+
| id | name | register_date |
+-------+------+---------------+
| 10000 | aaa | 2016-01-01 |
+-------+------+---------------+
1 row in set (0.00 sec)

可见,只访问的dn1,按月自然分片生效。

离散分片

MyCAT离散分片也有很多方法,如,枚举分片、程序指定分区的分片、十进制求模分片、字符串hash分片、一致性哈希分片等。离散分片的特点是,分布比较均匀,但是对范围查找可能需要全部分片查找。其实,前文在ER分片的时候已经介绍了两种,分别是求模分片和一致性哈希分片,本文再介绍两种。

枚举分片

枚举分片是通过在配置文件中配置可能的枚举id,自己配置分片,本规则适用于特定的场景,比如有些业务需要按照省份或区县来做保存,而全国省份区县固定的。需要先在schema.xml中配置,

1
<table name="test_d1" dataNode="dn1,dn2,dn3" rule="sharding-province" />

rule.xml中配置,

1
2
3
4
5
6
7
8
9
10
11
12
<tableRule name="sharding-province">
<rule>
<columns>province</columns>
<algorithm>sharding-province</algorithm>
</rule>
</tableRule>
<function name="sharding-province"
class="org.opencloudb.route.function.PartitionByFileMap">
<property name="mapFile">sharding-province.txt</property>
<property name="type">1</property> <!-- type默认值为0(0表示Integer,非零表示String) -->
<property name="defaultNode">0</property> <!-- 默认节点,枚举分片时,如果碰到不识别的枚举值,就让它路由到默认节点 -->
</function>

以上配置指定了,在mycat的conf目录下的sharding-province.txt文件,作为枚举分片文件,文件内容如下,

1
2
3
4
Shanghai=0
Beijingg=1
Guangzhou=2
Hangzhou=0

该配置表示,province是Shanghai和Hangzhou的时候,存在0分片(即dn1);province是Beijingg的时候,存在1分片(即dn2);province是Guangzhou的时候,存在2分片(即dn3)。

测试表的建表和插入语句如下,

1
2
3
4
5
6
7
create table test_d1(id bigint not null primary key, name varchar(100), province varchar(100));
insert into test_d1(id, name, province) values (10000, 'aaa', 'Shanghai');
insert into test_d1(id, name, province) values (20000, 'bbb', 'Beijingg');
insert into test_d1(id, name, province) values (30000, 'ccc', 'Guangzhou');
insert into test_d1(id, name, province) values (40000, 'ddd', 'Huangzhou');
insert into test_d1(id, name, province) values (50000, 'eee', 'Shenzhen');

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
mysql> explain insert into test_d1(id, name, province) values (10000, 'aaa', 'Shanghai');
+-----------+---------------------------------------------------------------------------+
| DATA_NODE | SQL |
+-----------+---------------------------------------------------------------------------+
| dn1 | insert into test_d1(id, name, province) values (10000, 'aaa', 'Shanghai') |
+-----------+---------------------------------------------------------------------------+
1 row in set (0.01 sec)
mysql> explain insert into test_d1(id, name, province) values (20000, 'bbb', 'Beijingg');
+-----------+-------------------------------------------------------------------------+
| DATA_NODE | SQL |
+-----------+-------------------------------------------------------------------------+
| dn2 | insert into test_d1(id, name, province) values (20000, 'bbb', 'Beijingg') |
+-----------+-------------------------------------------------------------------------+
1 row in set (0.01 sec)
mysql> explain insert into test_d1(id, name, province) values (30000, 'ccc', 'Guangzhou');
+-----------+----------------------------------------------------------------------------+
| DATA_NODE | SQL |
+-----------+----------------------------------------------------------------------------+
| dn3 | insert into test_d1(id, name, province) values (30000, 'ccc', 'Guangzhou') |
+-----------+----------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> explain insert into test_d1(id, name, province) values (40000, 'ddd', 'Huangzhou');
+-----------+----------------------------------------------------------------------------+
| DATA_NODE | SQL |
+-----------+----------------------------------------------------------------------------+
| dn1 | insert into test_d1(id, name, province) values (40000, 'ddd', 'Huangzhou') |
+-----------+----------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> explain insert into test_d1(id, name, province) values (50000, 'eee', 'Shenzhen');
+-----------+---------------------------------------------------------------------------+
| DATA_NODE | SQL |
+-----------+---------------------------------------------------------------------------+
| dn1 | insert into test_d1(id, name, province) values (50000, 'eee', 'Shenzhen') |
+-----------+---------------------------------------------------------------------------+
1 row in set (0.00 sec)

以上可以看出,即使插入的数据超出了枚举分片的范围,但是因为有默认分片,所以,超出范围的数据被分配在了默认的dn1分片。一般select语句的explain如下,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
mysql> explain select * from test_d1 where province = 'Guangzhou';
+-----------+--------------------------------------------------------------+
| DATA_NODE | SQL |
+-----------+--------------------------------------------------------------+
| dn3 | SELECT * FROM test_d1 WHERE province = 'Guangzhou' LIMIT 100 |
+-----------+--------------------------------------------------------------+
1 row in set (0.01 sec)
mysql> select * from test_d1 where province = 'Guangzhou';
+-------+------+-----------+
| id | name | province |
+-------+------+-----------+
| 30000 | ccc | Guangzhou |
+-------+------+-----------+
1 row in set (0.01 sec)

可见,分片已经成功。

固定分片hash算法

固定分片hash(二进制),本条规则类似于十进制的求模运算,区别在于是二进制的操作,是取id的二进制低10位,即id二进制&1111111111,实际效果与求模范围类似。此算法根据二进制则可能会分到连续的分片。严格来说,该分片方法应该算是综合分片方法,即部分连续分片和部分离散分片的综合,在两者之间取了折中。需要先在schema.xml中配置,

1
<table name="test_d2" dataNode="dn1,dn2,dn3" rule="sharding-fix-hash" />

rule.xml中配置,

1
2
3
4
5
6
7
8
9
10
11
<tableRule name="sharding-fix-hash">
<rule>
<columns>id</columns>
<algorithm>sharding-fix-hash</algorithm>
</rule>
</tableRule>
<function name="sharding-fix-hash"
class="org.opencloudb.route.function.PartitionByLong">
<property name="partitionCount">2,1</property>
<property name="partitionLength">256,512</property>
</function>

以上参数中,partitionCount 分片个数列表,partitionLength是分区长度,默认为最大2^n=1024,即最大支持1024分区。注意,count,length两个数组的长度必须是一致的,1024 = sum((count[i]*length[i]))。 count和length两个向量的点积恒等于1024。上述配置,将数据水平分成3份,前两份各占25%,第三份占50%,是一种非均匀分片。测试表的建表和插入语句如下,

1
2
3
4
5
create table test_d2(id bigint not null primary key, name varchar(100), province varchar(100));
insert into test_d2(id, name, province) values (20000, 'bbb', 'Beijing');
insert into test_d2(id, name, province) values (30000, 'ccc', 'Guangzhou');
insert into test_d2(id, name, province) values (40000, 'ddd', 'Huangzhou');

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
mysql> explain insert into test_d2(id, name, province) values (20000, 'bbb', 'Beijing');
+-----------+-------------------------------------------------------------------------+
| DATA_NODE | SQL |
+-----------+-------------------------------------------------------------------------+
| dn3 | insert into test_d2(id, name, province) values (20000, 'bbb', 'Beijing') |
+-----------+-------------------------------------------------------------------------+
1 row in set (0.01 sec)
mysql> explain insert into test_d2(id, name, province) values (30000, 'ccc', 'Guangzhou');
+-----------+----------------------------------------------------------------------------+
| DATA_NODE | SQL |
+-----------+----------------------------------------------------------------------------+
| dn2 | insert into test_d2(id, name, province) values (30000, 'ccc', 'Guangzhou') |
+-----------+----------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> explain insert into test_d2(id, name, province) values (40000, 'ddd', 'Huangzhou');
+-----------+----------------------------------------------------------------------------+
| DATA_NODE | SQL |
+-----------+----------------------------------------------------------------------------+
| dn1 | insert into test_d2(id, name, province) values (40000, 'ddd', 'Huangzhou') |
+-----------+----------------------------------------------------------------------------+
1 row in set (0.00 sec)

一般select语句的explain如下,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
mysql> explain select * from test_d2 where id between 20000 and 20100;
+-----------+------------------------------------------------------------------+
| DATA_NODE | SQL |
+-----------+------------------------------------------------------------------+
| dn3 | SELECT * FROM test_d2 WHERE id BETWEEN 20000 AND 20100 LIMIT 100 |
+-----------+------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> select * from test_d2 where id between 20000 and 20100;
+-------+------+----------+
| id | name | province |
+-------+------+----------+
| 20000 | bbb | Beijing |
+-------+------+----------+
1 row in set (0.01 sec)
mysql> select * from test_d2 where id between 20000 and 30000;
ERROR 1064 (HY000): NullPointerException

可见,分片已经生效,而且,虽然部分采用了hash分片,但仍然有连续分片的一些特点。但是,大范围查找的时候会报错,不确定这是否是一个bug,对应的log如下,

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
01/31 21:40:52.308 DEBUG [$_NIOREACTOR-0-RW] (ServerQueryHandler.java:56) -ServerConnection [id=1, schema=TESTDB, host=192.168.1.6, user=test,txIsolation=3, autocommit=true, schema=TESTDB]select * from test_d2 where id between 20000 and 30000
01/31 21:40:52.308 DEBUG [$_NIOREACTOR-0-RW] (EnchachePool.java:76) -SQLRouteCache miss cache ,key:TESTDBselect * from test_d2 where id between 20000 and 30000
01/31 21:40:52.309 WARN [$_NIOREACTOR-0-RW] (ServerConnection.java:209) -ServerConnection [id=1, schema=TESTDB, host=192.168.1.6, user=test,txIsolation=3, autocommit=true, schema=TESTDB]select * from test_d2 where id between 20000 and 30000 err:java.lang.NullPointerException
java.lang.NullPointerException
at org.opencloudb.route.util.RouterUtil.findRouteWithcConditionsForTables(RouterUtil.java:1007)
at org.opencloudb.route.util.RouterUtil.tryRouteForOneTable(RouterUtil.java:896)
at org.opencloudb.route.util.RouterUtil.tryRouteForTables(RouterUtil.java:780)
at org.opencloudb.parser.druid.impl.DruidSelectParser.tryRoute(DruidSelectParser.java:376)
at org.opencloudb.parser.druid.impl.DruidSelectParser.changeSql(DruidSelectParser.java:264)
at org.opencloudb.parser.druid.impl.DefaultDruidParser.parser(DefaultDruidParser.java:66)
at org.opencloudb.route.impl.DruidMycatRouteStrategy.routeNormalSqlWithAST(DruidMycatRouteStrategy.java:64)
at org.opencloudb.route.impl.AbstractRouteStrategy.route(AbstractRouteStrategy.java:76)
at org.opencloudb.route.RouteService.route(RouteService.java:116)
at org.opencloudb.server.ServerConnection.routeEndExecuteSQL(ServerConnection.java:201)
at org.opencloudb.server.ServerConnection.execute(ServerConnection.java:154)
at org.opencloudb.server.handler.SelectHandler.handle(SelectHandler.java:102)
at org.opencloudb.server.ServerQueryHandler.query(ServerQueryHandler.java:77)
at org.opencloudb.net.FrontendConnection.query(FrontendConnection.java:308)
at org.opencloudb.net.handler.FrontendCommandHandler.handle(FrontendCommandHandler.java:71)
at org.opencloudb.net.FrontendConnection.rawHandle(FrontendConnection.java:435)
at org.opencloudb.net.FrontendConnection.handle(FrontendConnection.java:417)
at org.opencloudb.net.AbstractConnection.onReadData(AbstractConnection.java:313)
at org.opencloudb.net.NIOSocketWR.asynRead(NIOSocketWR.java:186)
at org.opencloudb.net.AbstractConnection.asynRead(AbstractConnection.java:274)
at org.opencloudb.net.NIOReactor$RW.run(NIOReactor.java:96)
at java.lang.Thread.run(Thread.java:745)

文章目录
  1. 1. ER分片
    1. 1.1. 按主键关联的ER分片
    2. 1.2. 按其他字段的ER分片
  2. 2. 连续分片
    1. 2.1. 自定义数字范围分片
    2. 2.2. 自然月分片
  3. 3. 离散分片
    1. 3.1. 枚举分片
    2. 3.2. 固定分片hash算法

欢迎来到Valleylord的博客!

本博的文章尽量原创。