MongoDB Shard Cluster Data Distribution
Implement a MongoDB shard cluster to improve data availability and scalability. This setup distributes data across multiple nodes for better performance and resilience as your database needs to expand.
This document will help you successfully deploy Pulse components and configure MongoDB sharding. These steps are designed to ensure a comprehensive setup, enabling efficient data management and scalability for your MongoDB clusters.
MongoDB Shard Cluster Deployment Using accelo
Prerequisites
Before deploying a MongoDB Sharded cluster using Accelo, ensure you meet the following prerequisites:
- Minimum Node Requirement: While it's possible to run this setup on a single node, additional nodes may be required for optimal performance.
- MongoDB Version: Ensure you are using MongoDB version 6 for this setup.
- Pulse Integration Version: This setup is designed to work with Pulse version 3.3.20.
Deployment Steps
Deploying Pulse Components
- Set Environment Variables: Configure your environment by setting the MONGO_URI with the output of the previous command. Enable MongoDB encryption and specify the Pulse service node:
export MONGO_URI="" # Paste the URI within the quotes
export MONGO_ENCRYPTED=true
export PULSE_SA_NODE=true
- Prepare the Installation Directory:
mkdir -p /data01/acceldata/
mv accelo.linux /data01/acceldata/accelo
cd /data01/acceldata/
- Initialize accelo:
./accelo init
source /etc/profile.d/ad.sh
accelo init
accelo info
Setting Up MongoDB Sharding
To initiate the MongoDB shard cluster, ensure the following components are configured and initiated:
Deploy the Query router after initiating the replica set for the config server and adding shards following the initiation of the replica set for the shard server. Therefore, it's advisable to initiate both config servers and shard servers before deploying the Query router.
For more information on which networks to use, see Pulse Server Configuration Requirements documentation.
- Deploy Config Servers: Set up config servers as a replica set on different nodes (an odd number is recommended, with a minimum of 3).
- Deploy Shard Servers: Shard servers (optional replica set) should each have a different disk or node. Deploy shards in a round-robin manner.
- Deploy a Query Router: Install a Query Router on the main Pulse node on port
30000
. Use the Accelo command to initiate the sharding ecosystem. - Reroute
ad-sa-router
: After the above steps, bring downad-db_default
on the main Pulse node and reroutead-sa-router
asad-db
, setting the original standalone database asad-db-old
.
Configuring Each Component
Detailed configurations for each component, including config servers, shard servers, and query routers, are essential. Steps include generating configuration files, starting services, and adding members to replica sets.
Setting Up Configuration Server
To establish a configuration server for your system, follow these steps:
- Generate Configuration File: Start by creating a configuration file for your configuration server with the command:
accelo admin makeconfig ad-sa-configs
- Configure the Generated File: Next, open the configuration file that was generated. You'll find it at the specified path. Edit the file to match the following structure, ensuring to replace placeholders with actual values specific to your setup:
version: "2"
services:
ad-sa-configs:
image: ad-database
container_name: ad-sa-configs
environment:
- MONGO_DATA_DIR=/data/db
- MONGO_LOG_DIR=/dev/null
- ENABLE_MONGODB_RS=true
- CONFIG_SVR=true
- MONGODB_RS_NAME=cfgrs
- IS_MONGODB_PRIMARY=true # Set to true only for the primary member of the replica set
volumes:
- /etc/localtime:/etc/localtime:ro
- /data01/acceldata/config/db/mongokey:/mongokey
- /data01/acceldata/data/db_cs:/data/db
ulimits: {}
ports:
- 27019:27017
depends_on: []
opts: {}
restart: ""
extra_hosts: # Add the IPs of all members of the config server and shard replica sets
- ad-cs:10.90.5.111
- ad-cs2:10.90.5.112
- ad-cs3:10.90.5.113
- ad-sh:10.90.5.111
- ad-sh2:10.90.5.112
- ad-sh3:10.90.5.113
network_alias:
- ad-cs # Provide an alias for this container
label: CONFIG SERVER DB
- Start the Service: With the configuration file set up, initiate the configuration server service by running:
accelo deploy addons
? Select the SA components you would like to install: [Use arrows to move, enter to select,type to filter ]
[] Kafka Connector
[] Kafka 0.10.2 Connector
[] StandAlone Connector
[] Events
[x]CONFIG SERVER DB
[] QUERY ROUTER DB
> [] SHARD SERVER DB
When prompted, select the CONFIG SERVER DB from the list of services to install.
- Verify Service Status: Confirm the service is running correctly by executing:
docker ps
- Configure Replica Set Members: Login to the MongoDB shell of the primary configuration server container to add the other members to the replica set:
docker exec -it ad-sa-configs bash
mongosh mongodb://admin:<password>@localhost:27017/admin
db.adminCommand({
"replSetReconfig": {
"_id": "cfgrs",
"version": 2,
"configsvr": true,
"members": [
{ "_id": 0, "host": "ad-cs:27019" },
{ "_id": 1, "host": "ad-cs2:27019" },
{ "_id": 2, "host": "ad-cs3:27019" }
]
}
})
- Check Replica Set Status: Finally, ensure that a primary member is elected in the replica set by checking its status:
rs.status()
Setting Up a Shard Server
Follow these steps to configure a shard server:
- Generate Shard Server Configuration File: Create a configuration file for your shard server:
version: "2"
services:
ad-sa-shard:
image: ad-database
container_name: ad-sa-shard
environment:
- MONGO_DATA_DIR=/data/db
- MONGO_LOG_DIR=/dev/null
- ENABLE_MONGODB_RS=true
- MONGODB_RS_NAME=shard1rs # Each shard should have a unique replica set name
- SHARD_SVR=true
- IS_MONGODB_PRIMARY=true # True only for the primary member of the replica set
volumes:
- /etc/localtime:/etc/localtime:ro
- /data01/acceldata/config/db/mongokey:/mongokey
- /data01/acceldata/data/db_sh:/data/db
ulimits: {}
ports:
- 27018:27017
depends_on: []
opts: {}
restart: ""
extra_hosts: # Include the IPs of all config server and shard server members
- ad-cs:10.90.5.111
- ad-cs2:10.90.5.112
- ad-cs3:10.90.5.113
- ad-sh:10.90.5.111
- ad-sh2:10.90.5.112
- ad-sh3:10.90.5.113
network_alias:
- ad-sh
label: SHARD SERVER DB
- Start the Shard Server Service: With your configuration file ready, launch the shard server service by running:
accelo deploy addons
? Select the SA components you would like to install: [Use arrows to move, enter to select, type to filter]
[] Kafka Connector
[] Kafka 0.10.2 Connector
[] StandAlone Connector
[] Events
[] CONFIG SERVER DB
[] QUERY ROUTER DB
> [x] SHARD SERVER DB
Select SHARD SERVER DB from the list of available services to install.
- Verify Service Status: Confirm that the shard server service is operational by executing:
docker ps
- Add Members to the Replica Set: Login to the MongoDB shell of the primary shard server container. Here, you will configure the members of your replica set. The following is an example command to configure a single-member replica set:
docker exec -it ad-sa-shard bash
mongosh mongodb://admin:<password>@localhost:27017/admin
db.adminCommand({
"replSetReconfig": {
"_id": "SHARD_RS_NAME",
"version": 2,
"members": [
{ "_id": 0, "host": "ad-sh:27018" }
]
}
})
Remember to repeat this process for each shard replica set, adjusting the replica set name (SHARD
_RS_NAME
) and member details as necessary.
- Check the status of the replica set.
rs.status()
- Adding Additional Shards: If you need to add more shards on the same virtual machine (VM), create another service within the same configuration yaml file. Make sure to use unique names, IP addresses, and ports for the new shard. Additionally, update all related config server and router members with the new shard's details.
Setting Up a Query Router
Follow these steps to configure a query router:
- Generate the Router Configuration File: Start by creating the configuration file for your Query Router:
accelo admin makeconfig ad-sa-router
- Edit the Generated Configuration: Navigate to the provided path and modify the yaml file according to the specifications below. Ensure the
CONFIGSVR_PATH
reflects your actual configuration server setup:
version: "2"
services:
ad-sa-router:
image: ad-database
container_name: ad-sa-router
environment:
- MONGO_DATA_DIR=/data/db
- MONGO_LOG_DIR=/dev/null
- MONGO_ROUTER=yes
- CONFIGSVR_PATH=cfgrs/ad-cs:27019,ad-cs2:27019,ad-cs3:27019
volumes:
- /etc/localtime:/etc/localtime:ro
- /data01/acceldata/config/db/mongokey:/mongokey
- /data01/acceldata/data/db_router:/data/db
ulimits: {}
ports:
- 30000:27017
depends_on: []
opts: {}
restart: ""
extra_hosts:
- ad-cs:10.90.5.111
- ad-cs2:10.90.5.112
- ad-cs3:10.90.5.113
- ad-sh:10.90.5.111
- ad-sh2:10.90.5.112
- ad-sh3:10.90.5.113
network_alias:
- ad-router
label: QUERY ROUTER DB
- Deploy the Query Router Service: With your configuration set, initiate the Query Router service by executing:
accelo deploy addons
? Select the SA components you would like to install:[Use arrows to move, enter to select, type to filter]
[] Kafka Connector
[] Kafka 0.10.2 Connector
[] StandAlone Connector
[] Events
[] CONFIG SERVER DB
> [x] QUERY ROUTER DB
[] SHARD SERVER DB
When prompted, ensure you select the QUERY ROUTER DB option to install.
- Confirm the Service is Running: Verify that the Query Router service has started successfully:
docker ps
Initializing Sharding and Data Migration
- Define Initial Shards: Determine the initial number of shards and zones, considering the data retention requirements.
- Migrate Static Files: Use Accelo commands to migrate static files and create initial zones.
To initialize sharding and migrate data within your MongoDB setup, follow these detailed steps:
- Initialize Database Sharding: On the main Pulse node, where both
ad-db
andad-sa-router
are operational, execute the command to initiate sharding across all clusters:
accelo admin database init-sharding-db
Respond to the prompts to configure sharding:
INFO: Initiating database sharding
Is the 'Database Service' up and running? [y/n]: : y
WARN: Gauntlet is running in dry run mode. Disable this to delete indices from elastic and purge data from mongo DB
Enabling sharding for cluster odp_odin
✔ Enter initial shards count: : 5
✔ Are all the shards added in the Query Router? [y/n]: : n
Enter Shard Name: : shard1rs
Enter Connection string for shard1rs: shard1rs/ad-sh:27018
Are all the shards added in the Query Router? [y/n]: : y
Enter initial data limit count in days for each zone: : 30
Zone limits will start for 3 months from today. So give the zone count accordingly!
Enter initial zone count: : 6
✔ Do you want to migrate static configuration collections from standalone database(ad-db) ? [y/n]: : y
- Verify Sharding Status: Check the sharding setup and zone configurations:
docker exec -it ad-sa-router bash
mongosh mongodb://admin:<password>@localhost:27017/admin
sh.status()
- Convert
ad-db
to a Router: Modify thead-db
service configuration to function as a router by updating the environment variables and adding the required hosts in the/data01/acceldata/config/docker/ad-core.yml
file:
version: "2"
services:
ad-db:
image: ad-database
container_name: ""
environment:
- MONGO_DATA_DIR=/data/db
- MONGO_LOG_DIR=/dev/null
- NUMA_MONGO=no
- ENABLE_MONGODB_RS=false
- MONGO_ROUTER=yes # Newly added
- CONFIGSVR_PATH=cfgrs/ad-cs:27019,ad-cs2:27019,ad-cs3:27019 # Newly added
volumes:
- /data01/acceldata/config/db/mongokey:/mongokey
- /etc/localtime:/etc/localtime:ro
- /data01/acceldata/data/db_router:/data/db
ulimits: {}
ports:
- 27017:27017
depends_on: []
opts:
memory: "53687091200"
restart: ""
extra_hosts: # Newly added
- ad-cs:10.90.5.111
- ad-cs2:10.90.5.112
- ad-cs3:10.90.5.113
- ad-sh:10.90.5.111
- ad-sh2:10.90.5.112
- ad-sh3:10.90.5.113
network_alias: []
ad-deployer:
- Add the configs for
ad-db-tmp
in thead-sa-router.yml
and deploy the Query Router again to incorporate these changes. - Verify the rerouting and create new indices:
docker exec -it ad-db_default bash
mongosh mongodb://admin:<password>@localhost:27017/admin
sh.status()
################################################
docker exec -it ad-db-old_default bash
mongosh mongodb://admin:<password>@localhost:27017/admin
show dbs
#IT SHOULD CONSIST OF OLD DATA
###############################################
accelo admin database index-db
accelo admin database reload-events
- Deploy another additional router on a different node following the above steps, but excluding step 5.
- Migrate Data: Transfer data from
ad-db-tmp_default
to the newly configured routerad-router_default
, excluding the Acceldata database as it is already migrated as static data:
docker exec -it ad-sa-router bash
cd /data/db
nohup mongodump -h <ad-db-old IP>:<ad-db-old PORT> -u accel -p <username> --authenticationDatabase admin &
# After completion, restore the dump to the new router
nohup mongorestore -h localhost:27017 -u accel -p <username> --authenticationDatabase admin ./dump &
- Verify Data Migration: Ensure the migration was successful by checking the databases:
docker exec -it ad-db_default bash
mongosh mongodb://admin:<password>@localhost:27017/admin
show dbs
Gauntlet Configurations to Update Zones on a Timely Basis
To update the zones on a timely basis with Gauntlet configurations after deploying a MongoDB Shard Cluster using Accelo, follow these steps:
Configure Data Retention:
- Execute the accelo config retention command to set up data retention according to your needs. When prompted, confirm that MongoDB is sharded by answering y to
Is MongoDB sharded [y/n]?
:
accelo config retention
- Specify the number of days you wish to retain data in MongoDB for HDFS reports, and in TSDB.
- Indicate how often MongoDB cleanup and compaction should occur by providing a comma-separated string of hours (e.g., "0,8,12,15,18").
✔ How many days of data would you like to retain at Mongo DB ?: 15
How many days of data would you like to retain at Mongo DB for HDFS reports ?: 15
How many days of data would you like to retain at TSDB ?: 31
✔ How often should Mongo DB clean up & compaction run, provide a comma separated string of hours (valid values are [0,23] (Ex. 8,12,15,18)?: 0
Is Mongo DB sharded [y/n] ?: y
INFO: Updating accelo.yml with gauntlet config info
INFO: Updated accelo.yml
Push Configuration Updates:
- Run
accelo admin database push-config
to apply the configured settings.
Adjust Zone Ranges:
- If necessary, generate or update the ad-core.yml file to change the range of a zone. Use the
accelo admin makeconfig ad-core
command to generate this file if it doesn't already exist. - Open ad-core.yml in a text editor and insert the ZONE_TIMESPAN_IN_HOURS=<value> line under the environment section for ad-gauntlet, substituting <value> with the desired time in hours for a zone's range.
- Example snippet from ad-core.yml:
ad-gauntlet:
image: ad-gauntlet
container_name: ad-gauntlet
environment:
- MONGO_URI=ZN4v8cuUTXYvdnDJIDp+R8Z+ZsVXXjv8zDOvh8UwQXosC8vfVkGYGWGPNnX64ZVSp9yHgErQknPBAfYZ9cOG1A==
- MONGO_ENCRYPTED=true
- ELASTIC_ADDRESSES=http://ad-elastic:9200
- DRY_RUN_ENABLE=true
- CRON_TAB_DURATION=*/5 * * * *
- ZONE_TIMESPAN_IN_HOURS=24
Ensure there's no conflict between the snap_mongo_cleanup_frequency_in_hours
value in accelo.yml
and the cron configuration for Gauntlet in ad-core.yml
. The default values are optimized for most scenarios and are generally recommended not to be altered.
Adding New Shards in the Future
Purpose:
- To extend data retention capabilities.
- To enhance the performance of the existing cluster.
Procedure:
- Deploy a New Shard Server: Begin by deploying a new shard server.
- Integrate the Shard with the Router: Use the following command to add the new shard to your router:
sh.addShard("shardXrs/IP:HOST")
- Establish a New Zone for the Shard: Execute this command to create a new zone for the newly added shard:
mongosh --quiet mongodb://admin:ACCELROOT_01082018@localhost --eval "db.adminCommand({addShardToZone: \"$Shard_NAME\", zone: \"$ZONE_NAME\"})"
- Set New Zone Limits:
- Retrieve the maximum limit from the current setup and define the new zone's range accordingly.
- Adjust the zone key range for various collections by running commands similar to the following, substituting the placeholders with your actual data:
mongosh --quiet mongodb://admin:ACCELROOT_01082018@localhost --eval "db.adminCommand({updateZoneKeyRange: \"$CLUSTER_NAME.collectionName\", min: {key: $MIN_LIMIT}, max: {key: $MAX_LIMIT}, zone: \"$ZONE_NAME\"})"
- Repeat the above step for all relevant collections, modifying the collectionName and key as necessary to match your cluster's configuration.
Adjusting Zone Ranges:
- To decrease the performance by altering the size of each zone, follow the Gauntlet Configurations to Update Zones on a Timely Basis instructions.
- To extend the data retention period, also refer to the Gauntlet Configurations to Update Zones on a Timely Basis steps for updating zones as required.
Final Steps
Ensure you've correctly set up and configured each part of your MongoDB sharding environment. Regularly check the status of your sharding setup and make adjustments as needed to maintain optimal performance and data management.