This is an automated archive.

The original was posted on /r/mysql by /u/victorvic369 on 2023-11-30 05:32:16+00:00.


I have a table with 5 Billion rows and 14 columns. In case you are interested to know:

CREATE TABLE IF NOT EXISTS `school_tab` (
  `school_id` bigint(20) UNSIGNED NOT NULL PRIMARY KEY, 
  `account_id` bigint(20) UNSIGNED NOT NULL, 
  `external_id` bigint(20) UNSIGNED, 
  `class_type` tinyint UNSIGNED NOT NULL, 
  `class_status` tinyint UNSIGNED NOT NULL, 
  `is_online` tinyint(1) UNSIGNED NOT NULL, 
  `hours` bigint(20) NOT NULL, 
  `period` bigint(20) UNSIGNED NOT NULL DEFAULT 0, 
  `room_id` bigint(20) UNSIGNED NOT NULL DEFAULT 0, 
  `has_special_instructor` tinyint(1) NOT NULL, 
  `ext_info` varchar(4096), 
  `ctime` int(11) UNSIGNED NOT NULL DEFAULT 0, 
  `mtime` int(11) UNSIGNED NOT NULL DEFAULT 0, 
  `teacher_id` bigint(20) UNSIGNED NULL,

  UNIQUE INDEX `idx_external_id_class_type` (`external_id`, `class_type`), 
  UNIQUE INDEX `idx_account_id_class_type` (
    `account_id`, `class_type`, `class_status`
  ),
  INDEX `idx_is_online_status_class_type_mtime` (
    `is_online`, `class_status`, `class_type`, `mtime`
  )
)

I will need to use MySQL to help me store these data.Since there are a lot of rows in the table, one idea comes into mind is to do sharding.

So, my questions are:

  1. What should I shard my table on (which key)?

For example if I shard on account_id, and if the clients don’t pass account_id in their request, how can I query the table? 2. How can I ensure the following constraints are met after sharded (rmb that there are 5 Billion rows in total), for examples:

UNIQUE INDEX `idx_external_id_class_type` (`external_id`, `class_type`),
UNIQUE INDEX `idx_account_id_class_type` (
  `account_id`, `class_type`, `class_status`
)

Thank you r/mysql masters