谷粒商城-集群-90-分库分表利器 ShardingSphere

一、ShardingSphere简介

ShardingSphere是一套开源的分布式数据库中间件解决方案组成的生态圈,它由Sharding-JDBC、Sharding-Proxy和Sharding-Sidecar(计划中)这3款相互独立的产品组成。 他们均提供标准化的数据分片、分布式事务和数据库治理功能,可适用于如Java同构、异构语言、云原生等各种多样化的应用场景。

二、Sharding-Proxy

定位为透明化的数据库代理端,提供封装了数据库二进制协议的服务端版本,用于完成对异构语言的支持。

file

三、数据分片

1、核心概念

逻辑表

水平拆分的数据库(表)的相同逻辑和数据结构表的总称。例:订单数据根据主键尾数拆分为10张表,分别是t_order_0到t_order_9,他们的逻辑表名为t_order

真实表

在分片的数据库中真实存在的物理表。即上个示例中的t_order_0到t_order_9。

数据节点

数据分片的最小单元。由数据源名称和数据表组成,例:ds_0.t_order_0。

绑定表

指分片规则一致的主表和子表。例如:t_order表和t_order_item表,均按照order_id分片,则此两张表互为绑定表关系。绑定表之间的多表关联查询不会出现笛卡尔积关联,关联查询效率将大大提升。举例说明,如果SQL为:

SELECT i.* FROM t_order o JOIN t_order_item i ON o.order_id=i.order_id WHERE o.order_id in (10, 11);

在不配置绑定表关系时,假设分片键order_id将数值10路由至第0片,将数值11路由至第1片,那么路由后的SQL应该为4条,它们呈现为笛卡尔积:

SELECT i.* FROM t_order_0 o JOIN t_order_item_0 i ON o.order_id=i.order_id WHERE o.order_id in (10, 11);

SELECT i.* FROM t_order_0 o JOIN t_order_item_1 i ON o.order_id=i.order_id WHERE o.order_id in (10, 11);

SELECT i.* FROM t_order_1 o JOIN t_order_item_0 i ON o.order_id=i.order_id WHERE o.order_id in (10, 11);

SELECT i.* FROM t_order_1 o JOIN t_order_item_1 i ON o.order_id=i.order_id WHERE o.order_id in (10, 11);

在配置绑定表关系后,路由的SQL应该为2条:

SELECT i.* FROM t_order_0 o JOIN t_order_item_0 i ON o.order_id=i.order_id WHERE o.order_id in (10, 11);

SELECT i.* FROM t_order_1 o JOIN t_order_item_1 i ON o.order_id=i.order_id WHERE o.order_id in (10, 11);

其中t_order在FROM的最左侧,ShardingSphere将会以它作为整个绑定表的主表。 所有路由计算将会只使用主表的策略,那么t_order_item表的分片计算将会使用t_order的条件。故绑定表之间的分区键要完全相同。

四、配置

分片规则

分片规则配置的总入口。包含数据源配置表配置绑定表配置以及读写分离配置等。

数据源配置

真实数据源列表。

表配置

逻辑表名称、数据节点与分表规则的配置。

数据节点配置

用于配置逻辑表与真实表的映射关系。可分为均匀分布和自定义分布两种形式。

  • 均匀分布

指数据表在每个数据源内呈现均匀分布的态势,例如:

db0
  ├── t_order0 
  └── t_order1 
db1
  ├── t_order0 
  └── t_order1

那么数据节点的配置如下:

db0.t_order0, db0.t_order1, db1.t_order0, db1.t_order1
  • 自定义分布
    指数据表呈现有特定规则的分布,例如:
    db0
    ├── t_order0 
    └── t_order1 
    db1
    ├── t_order2
    ├── t_order3
    └── t_order4

那么数据节点的配置如下:

db0.t_order0, db0.t_order1, db1.t_order2, db1.t_order3, db1.t_order4

分片策略配置

对于分片策略存有数据源分片策略和表分片策略两种维度。

  • 数据源分片策略
    对应于DatabaseShardingStrategy。用于配置数据被分配的目标数据源。

  • 表分片策略
    对应于TableShardingStrategy。用于配置数据被分配的目标表,该目标表存在与该数据的目标数据源内。故表分片策略是依赖与数据源分片策略的结果的。

两种策略的API完全相同。

自增主键生成策略
通过在客户端生成自增主键替换以数据库原生自增主键的方式,做到分布式主键无重复。

配置手册

数据分片 + 读写分离

schemaName: sharding_master_slave_db

dataSources:
  ds0:
    url: jdbc:postgresql://localhost:5432/ds0
    username: root
    password: 
    connectionTimeoutMilliseconds: 30000
    idleTimeoutMilliseconds: 60000
    maxLifetimeMilliseconds: 1800000
    maxPoolSize: 65
  ds0_slave0:
    url: jdbc:postgresql://localhost:5432/ds0_slave0
    username: root
    password: 
    connectionTimeoutMilliseconds: 30000
    idleTimeoutMilliseconds: 60000
    maxLifetimeMilliseconds: 1800000
    maxPoolSize: 65
  ds0_slave1:
    url: jdbc:postgresql://localhost:5432/ds0_slave1
    username: root
    password: 
    connectionTimeoutMilliseconds: 30000
    idleTimeoutMilliseconds: 60000
    maxLifetimeMilliseconds: 1800000
    maxPoolSize: 65
  ds1:
    url: jdbc:postgresql://localhost:5432/ds1
    username: root
    password: 
    connectionTimeoutMilliseconds: 30000
    idleTimeoutMilliseconds: 60000
    maxLifetimeMilliseconds: 1800000
    maxPoolSize: 65
  ds1_slave0:
    url: jdbc:postgresql://localhost:5432/ds1_slave0
    username: root
    password: 
    connectionTimeoutMilliseconds: 30000
    idleTimeoutMilliseconds: 60000
    maxLifetimeMilliseconds: 1800000
    maxPoolSize: 65
  ds1_slave1:
    url: jdbc:postgresql://localhost:5432/ds1_slave1
    username: root
    password: 
    connectionTimeoutMilliseconds: 30000
    idleTimeoutMilliseconds: 60000
    maxLifetimeMilliseconds: 1800000
    maxPoolSize: 65

shardingRule:  
  tables:
    t_order: 
      actualDataNodes: ms_ds${0..1}.t_order${0..1}
      databaseStrategy:
        inline:
          shardingColumn: user_id
          algorithmExpression: ms_ds${user_id % 2}
      tableStrategy: 
        inline:
          shardingColumn: order_id
          algorithmExpression: t_order${order_id % 2}
      keyGenerator:
        type: SNOWFLAKE
        column: order_id
    t_order_item:
      actualDataNodes: ms_ds${0..1}.t_order_item${0..1}
      databaseStrategy:
        inline:
          shardingColumn: user_id
          algorithmExpression: ms_ds${user_id % 2}
      tableStrategy:
        inline:
          shardingColumn: order_id
          algorithmExpression: t_order_item${order_id % 2}
      keyGenerator:
        type: SNOWFLAKE
        column: order_item_id
  bindingTables:
    - t_order,t_order_item
  broadcastTables:
    - t_config

  defaultDataSourceName: ds0
  defaultTableStrategy:
    none:

  masterSlaveRules:
    ms_ds0:
      masterDataSourceName: ds0
      slaveDataSourceNames:
        - ds0_slave0
        - ds0_slave1
      loadBalanceAlgorithmType: ROUND_ROBIN
    ms_ds1:
      masterDataSourceName: ds1
      slaveDataSourceNames: 
        - ds1_slave0
        - ds1_slave1
      loadBalanceAlgorithmType: ROUND_ROBIN

五、实战

1、下载Sharding-Proxy

下载 Sharding-Proxy

2、引入依赖

如果后端连接MySQL数据库,需要下载MySQL Connector/J, 或者使用国内的镜像源sohu MySQL Connector/J解压缩后,将mysql-connector-java-5.1.47.jar拷贝到${sharding-proxy}\lib目录。

file

将下载的jar包拷贝到上边下载的Sharding-Proxy lib目录下:
file

3、规则配置

编辑%SHARDING_PROXY_HOME%\conf\server.yaml。详情请参见配置手册

Sharding-Proxy使用conf/server.yaml配置注册中心、认证信息以及公用属性。

权限验证:
用于执行登录Sharding Proxy的权限验证。配置用户名、密码、可访问的数据库后,必须使用正确的用户名、密码才可登录Proxy。

authentication:
  users:
    root: # 自定义用户名
      password: root # 自定义用户名
    sharding: # 自定义用户名
      password: sharding # 自定义用户名
      authorizedSchemas: sharding_db, masterslave_db # 该用户授权可访问的数据库,多个用逗号分隔。缺省将拥有root权限,可访问全部数据库。

认证信息配置:
conf/server.yaml

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

######################################################################################################
# 
# If you want to configure orchestration, authorization and proxy properties, please refer to this file.
# 
######################################################################################################
#
#orchestration:
#  orchestration_ds:
#    orchestrationType: registry_center,config_center,distributed_lock_manager
#    instanceType: zookeeper
#    serverLists: localhost:2181
#    namespace: orchestration
#    props:
#      overwrite: false
#      retryIntervalMilliseconds: 500
#      timeToLiveSeconds: 60
#      maxRetries: 3
#      operationTimeoutMilliseconds: 500
#
authentication:
  users:
    root:
      password: root
    sharding:
      password: sharding 
      authorizedSchemas: sharding_db
props:
#  max.connections.size.per.query: 1
#  acceptor.size: 16  # The default value is available processors count * 2.
  executor.size: 16  # Infinite by default.
#  proxy.frontend.flush.threshold: 128  # The default value is 128.
#    # LOCAL: Proxy will run with LOCAL transaction.
#    # XA: Proxy will run with XA transaction.
#    # BASE: Proxy will run with B.A.S.E transaction.
#  proxy.transaction.type: LOCAL
#  proxy.opentracing.enabled: false
#  proxy.hint.enabled: false
#  query.with.cipher.column: true
  sql.show: true
#  allow.range.query.with.inline.sharding: false

编辑%SHARDING_PROXY_HOME%\conf\config-xxx.yaml。详情请参见配置手册

分库分表配置:
conf/config-sharding.yaml

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

######################################################################################################
# 
# Here you can configure the rules for the proxy.
# This example is configuration of sharding rule.
#   
# If you want to use sharding, please refer to this file; 
# if you want to use master-slave, please refer to the config-master_slave.yaml.
# 
######################################################################################################
#
#schemaName: sharding_db
#
#dataSources:
#  ds_0:
#    url: jdbc:postgresql://127.0.0.1:5432/demo_ds_0?serverTimezone=UTC&useSSL=false
#    username: postgres
#    password: postgres
#    connectionTimeoutMilliseconds: 30000
#    idleTimeoutMilliseconds: 60000
#    maxLifetimeMilliseconds: 1800000
#    maxPoolSize: 50
#  ds_1:
#    url: jdbc:postgresql://127.0.0.1:5432/demo_ds_1?serverTimezone=UTC&useSSL=false
#    username: postgres
#    password: postgres
#    connectionTimeoutMilliseconds: 30000
#    idleTimeoutMilliseconds: 60000
#    maxLifetimeMilliseconds: 1800000
#    maxPoolSize: 50
#
#shardingRule:
#  tables:
#    t_order:
#      actualDataNodes: ds_${0..1}.t_order_${0..1}
#      tableStrategy:
#        inline:
#          shardingColumn: order_id
#          algorithmExpression: t_order_${order_id % 2}
#      keyGenerator:
#        type: SNOWFLAKE
#        column: order_id
#    t_order_item:
#      actualDataNodes: ds_${0..1}.t_order_item_${0..1}
#      tableStrategy:
#        inline:
#          shardingColumn: order_id
#          algorithmExpression: t_order_item_${order_id % 2}
#      keyGenerator:
#        type: SNOWFLAKE
#        column: order_item_id
#  bindingTables:
#    - t_order,t_order_item
#  defaultDatabaseStrategy:
#    inline:
#      shardingColumn: user_id
#      algorithmExpression: ds_${user_id % 2}
#  defaultTableStrategy:
#    none:

######################################################################################################
#
# If you want to connect to MySQL, you should manually copy MySQL driver to lib directory.
#
######################################################################################################

schemaName: sharding_db

dataSources:
  ds_0:
    url: jdbc:mysql://192.168.10.10:3307/demo_ds_0?serverTimezone=UTC&useSSL=false
    username: root
    password: root
    connectionTimeoutMilliseconds: 30000
    idleTimeoutMilliseconds: 60000
    maxLifetimeMilliseconds: 1800000
    maxPoolSize: 50
  ds_1:
    url: jdbc:mysql://192.168.10.10:3307/demo_ds_1?serverTimezone=UTC&useSSL=false
    username: root
    password: root
    connectionTimeoutMilliseconds: 30000
    idleTimeoutMilliseconds: 60000
    maxLifetimeMilliseconds: 1800000
    maxPoolSize: 50
#
shardingRule:
  tables:
    t_order:
      actualDataNodes: ds_${0..1}.t_order_${0..1}
      tableStrategy:
        inline:
          shardingColumn: order_id
          algorithmExpression: t_order_${order_id % 2}
      keyGenerator:
        type: SNOWFLAKE
        column: order_id
    t_order_item:
      actualDataNodes: ds_${0..1}.t_order_item_${0..1}
      tableStrategy:
        inline:
          shardingColumn: order_id
          algorithmExpression: t_order_item_${order_id % 2}
      keyGenerator:
        type: SNOWFLAKE
        column: order_item_id
  bindingTables:
    - t_order,t_order_item
  defaultDatabaseStrategy:
    inline:
      shardingColumn: user_id
      algorithmExpression: ds_${user_id % 2}
  defaultTableStrategy:
    none:

读写分离配置:
第一个数据库的读写分离配置:
conf/config-master_slave.yaml

######################################################################################################
#
# If you want to connect to MySQL, you should manually copy MySQL driver to lib directory.
#
######################################################################################################

########################
#
# 第一个数据库的读写分离
# 读从库,写更新删除在主库
# 
########################

schemaName: sharding_db_1
#
dataSources:
  master_0_ds:
    url: jdbc:mysql://192.168.10.10:3307/demo_ds_0?serverTimezone=UTC&useSSL=false
    username: root
    password: root
    connectionTimeoutMilliseconds: 30000
    idleTimeoutMilliseconds: 60000
    maxLifetimeMilliseconds: 1800000
    maxPoolSize: 50
  slave_ds_0:
    url: jdbc:mysql://192.168.10.10:3317/demo_ds_0?serverTimezone=UTC&useSSL=false
    username: root
    password: root
    connectionTimeoutMilliseconds: 30000
    idleTimeoutMilliseconds: 60000
    maxLifetimeMilliseconds: 1800000
    maxPoolSize: 50

masterSlaveRule:
  name: ms_ds
  masterDataSourceName: master_0_ds
  slaveDataSourceNames:
    - slave_ds_0
#    - slave_ds_1
  loadBalanceAlgorithmType: ROUND_ROBIN

第二个数据库的读写分离配置:
conf/config-master_slave_2.yaml

全部配置完之后,然后启动服务:

######################################################################################################
#
# If you want to connect to MySQL, you should manually copy MySQL driver to lib directory.
#
######################################################################################################

########################
#
# 第2个数据库的读写分离
# 读从库,写更新删除在主库
# 
########################

schemaName: sharding_db_2
#
dataSources:
  master_1_ds:
    url: jdbc:mysql://192.168.10.10:3307/demo_ds_1?serverTimezone=UTC&useSSL=false
    username: root
    password: root
    connectionTimeoutMilliseconds: 30000
    idleTimeoutMilliseconds: 60000
    maxLifetimeMilliseconds: 1800000
    maxPoolSize: 50
  slave_ds_1:
    url: jdbc:mysql://192.168.10.10:3317/demo_ds_1?serverTimezone=UTC&useSSL=false
    username: root
    password: root
    connectionTimeoutMilliseconds: 30000
    idleTimeoutMilliseconds: 60000
    maxLifetimeMilliseconds: 1800000
    maxPoolSize: 50

masterSlaveRule:
  name: ms_ds
  masterDataSourceName: master_1_ds
  slaveDataSourceNames:
    - slave_ds_1
#    - slave_ds_1
  loadBalanceAlgorithmType: ROUND_ROBIN

4、修改主从同步配置

1、先停止主库和从库

[root@localhost master]# docker stop  mysql-master mysql-slaver-01
mysql-master
mysql-slaver-01

2、修改主库同步库
vi /mydata/mysql/master/conf/my.cnf

[client]
default-character-set=utf8

[mysql]
default-character-set=utf8
[mysqld]
init_connect='SET collation_connection = utf8_unicode_ci'
init_connect='SET NAMES utf8'
character-set-server=utf8
collation-server=utf8_unicode_ci
skip-character-set-client-handshake
skip-name-resolve

#master-slaver repication
server_id=1
log-bin=mysql-bin
read-only=0
binlog-do-db=gulimall_ums
binlog-do-db=gulimall_pms
binlog-do-db=gulimall_oms
binlog-do-db=gulimall_wms
binlog-do-db=gulimall_admin
# new add
binlog-do-db=demo_ds_0
binlog-do-db=demo_ds_1
replicate-ignore-db=mysql
replicate-ignore-db=sys
replicate-ignore-db=information_schema
replicate-ignore-db=performance_schema

3、修改从库同步库
vi /mydata/mysql/slaver/conf/my.cnf

[client]
default-character-set=utf8

[mysql]
default-character-set=utf8
[mysqld]
init_connect='SET collation_connection = utf8_unicode_ci'
init_connect='SET NAMES utf8'
character-set-server=utf8
collation-server=utf8_unicode_ci
skip-character-set-client-handshake
skip-name-resolve

#master-slaver repication
server_id=2
log-bin=mysql-bin
read-only=1
binlog-do-db=gulimall_ums
binlog-do-db=gulimall_pms
binlog-do-db=gulimall_oms
binlog-do-db=gulimall_wms
binlog-do-db=gulimall_admin
# new add
binlog-do-db=demo_ds_0
binlog-do-db=demo_ds_1

replicate-ignore-db=mysql
replicate-ignore-db=sys
replicate-ignore-db=information_schema
replicate-ignore-db=performance_schema

4、重启主从库

[root@localhost conf]# docker start mysql-master mysql-slaver-01
mysql-master
mysql-slaver-01

5、创建库
手动在master库创建 demo_ds_0, demo_ds_1 测试库,可以看到我们的从库也同步了刚创建的这两个库。

6、启动sharing-proxy
启动sharing-proxy,并且指定端口为 3388

cd /Users/kaiyiwang/Web/sharding-proxy/bin
➜  bin ./start.sh 3388
Starting the Sharding-Proxy ...
The port is 3388
The classpath is /Users/kaiyiwang/Web/sharding-proxy/conf:.:..:/Users/kaiyiwang/Web/sharding-proxy/lib/*:/Users/kaiyiwang/Web/sharding-proxy/lib/*:/Users/kaiyiwang/Web/sharding-proxy/ext-lib/*
Please check the STDOUT file: /Users/kaiyiwang/Web/sharding-proxy/logs/stdout.log

日志打印:

➜  cat /Users/kaiyiwang/Web/sharding-proxy/logs/stdout.log 
...
[INFO ] 13:02:02.701 [nioEventLoopGroup-2-1] i.n.handler.logging.LoggingHandler - [id: 0x17cbeb2f] REGISTERED
[INFO ] 13:02:02.703 [nioEventLoopGroup-2-1] i.n.handler.logging.LoggingHandler - [id: 0x17cbeb2f] BIND: 0.0.0.0/0.0.0.0:3388
[INFO ] 13:02:02.707 [nioEventLoopGroup-2-1] i.n.handler.logging.LoggingHandler - [id: 0x17cbeb2f, L:/0.0.0.0:3388] ACTIVE

可以看到sharding-proxy 已经启动成功了。

然后本地连接,可以看到连接成功了:
file

file

在这里我们可以看到三个库,其实是我们在上边配置文件时写的名字,schemaName,以后我们只操作 sharding-db 库。

创建测试表:

CREATE TABLE IF NOT EXISTS t_order (order_id BIGINT NOT NULL AUTO_INCREMENT, user_id INT NOT NULL, status VARCHAR(50), PRIMARY KEY (order_id))

CREATE TABLE IF NOT EXISTS t_order_item (order_item_id BIGINT NOT NULL AUTO_INCREMENT, order_id BIGINT NOT NULL, user_id INT NOT NULL, PRIMARY KEY (order_item_id))

INSERT INTO t_order (user_id, status) VALUES (10, 'INIT')
INSERT INTO t_order_item (order_id, user_id) VALUES (%d, 10) # %d,1-10,可以多执行几次

INSERT INTO t_order (user_id, status) VALUES (11, 'INIT')
INSERT INTO t_order_item (order_id, user_id) VALUES (%d, 11)  # %d,1-10,可以多执行几次

逻辑表:
file

逻辑分表:
file

真实库分表:
file

我们可以看到,按照订单的用户ID进行分库,再按照订单ID进行分表,sharding-sphere非常强大,不需要修改代码即可完成分库分表操作,并且有对应的逻辑表,非常方便我们整体查询。

小结

sharding-sphere是一天非常强大的分布式数据库中间件解决方法。

有简单易懂的行表达式用于配置数据节点和数据分片算法。

有自己的诸多大杀器,比如强制路由等。

官方文档齐全,实例代码项目case较全,能够在较短时间完成分库分表。


相关文章:
Shardingsphere数据分片核心概念
分库分表利器——sharding-sphere

为者常成,行者常至