MongoDB Shard Cluster Data Distribution

Implement a MongoDB shard cluster to improve data availability and scalability. This setup distributes data across multiple nodes for better performance and resilience as your database needs to expand.

This document will help you successfully deploy Pulse components and configure MongoDB sharding. These steps are designed to ensure a comprehensive setup, enabling efficient data management and scalability for your MongoDB clusters.

MongoDB Shard Cluster Deployment Using accelo

Prerequisites

Before deploying a MongoDB Sharded cluster using Accelo, ensure you meet the following prerequisites:

  • Minimum Node Requirement: While it's possible to run this setup on a single node, additional nodes may be required for optimal performance.
  • MongoDB Version: Ensure you are using MongoDB version 6 for this setup.
  • Pulse Integration Version: This setup is designed to work with Pulse version 3.3.20.

Deployment Steps

Deploying Pulse Components

  1. Set Environment Variables: Configure your environment by setting the MONGO_URI with the output of the previous command. Enable MongoDB encryption and specify the Pulse service node:
Bash
Copy
  1. Prepare the Installation Directory:
Bash
Copy
  1. Initialize accelo:
Bash
Copy

Setting Up MongoDB Sharding

To initiate the MongoDB shard cluster, ensure the following components are configured and initiated:

Deploy the Query router after initiating the replica set for the config server and adding shards following the initiation of the replica set for the shard server. Therefore, it's advisable to initiate both config servers and shard servers before deploying the Query router.

For more information on which networks to use, see Pulse Server Configuration Requirements documentation.

  1. Deploy Config Servers: Set up config servers as a replica set on different nodes (an odd number is recommended, with a minimum of 3).
  2. Deploy Shard Servers: Shard servers (optional replica set) should each have a different disk or node. Deploy shards in a round-robin manner.
  3. Deploy a Query Router: Install a Query Router on the main Pulse node on port 30000. Use the Accelo command to initiate the sharding ecosystem.
  4. Reroute ad-sa-router: After the above steps, bring down ad-db_default on the main Pulse node and reroute ad-sa-router as ad-db, setting the original standalone database as ad-db-old.

Configuring Each Component

Detailed configurations for each component, including config servers, shard servers, and query routers, are essential. Steps include generating configuration files, starting services, and adding members to replica sets.

Setting Up Configuration Server

To establish a configuration server for your system, follow these steps:

  1. Generate Configuration File: Start by creating a configuration file for your configuration server with the command:
Bash
Copy
  1. Configure the Generated File: Next, open the configuration file that was generated. You'll find it at the specified path. Edit the file to match the following structure, ensuring to replace placeholders with actual values specific to your setup:
Bash
Copy
  1. Start the Service: With the configuration file set up, initiate the configuration server service by running:
Bash
Copy

When prompted, select the CONFIG SERVER DB from the list of services to install.

  1. Verify Service Status: Confirm the service is running correctly by executing:
Bash
Copy
  1. Configure Replica Set Members: Login to the MongoDB shell of the primary configuration server container to add the other members to the replica set:
Bash
Copy
  1. Check Replica Set Status: Finally, ensure that a primary member is elected in the replica set by checking its status:
Bash
Copy

Setting Up a Shard Server

Follow these steps to configure a shard server:

  1. Generate Shard Server Configuration File: Create a configuration file for your shard server:
Bash
Copy
  1. Start the Shard Server Service: With your configuration file ready, launch the shard server service by running:
Bash
Copy

Select SHARD SERVER DB from the list of available services to install.

  1. Verify Service Status: Confirm that the shard server service is operational by executing:
Bash
Copy
  1. Add Members to the Replica Set: Login to the MongoDB shell of the primary shard server container. Here, you will configure the members of your replica set. The following is an example command to configure a single-member replica set:
Bash
Copy

Remember to repeat this process for each shard replica set, adjusting the replica set name (SHARD_RS_NAME) and member details as necessary.

  1. Check the status of the replica set.
Bash
Copy
  1. Adding Additional Shards: If you need to add more shards on the same virtual machine (VM), create another service within the same configuration yaml file. Make sure to use unique names, IP addresses, and ports for the new shard. Additionally, update all related config server and router members with the new shard's details.

Setting Up a Query Router

Follow these steps to configure a query router:

  1. Generate the Router Configuration File: Start by creating the configuration file for your Query Router:
Bash
Copy
  1. Edit the Generated Configuration: Navigate to the provided path and modify the yaml file according to the specifications below. Ensure the CONFIGSVR_PATH reflects your actual configuration server setup:
Bash
Copy
  1. Deploy the Query Router Service: With your configuration set, initiate the Query Router service by executing:
Bash
Copy

When prompted, ensure you select the QUERY ROUTER DB option to install.

  1. Confirm the Service is Running: Verify that the Query Router service has started successfully:
Bash
Copy

Initializing Sharding and Data Migration

  • Define Initial Shards: Determine the initial number of shards and zones, considering the data retention requirements.
  • Migrate Static Files: Use Accelo commands to migrate static files and create initial zones.

To initialize sharding and migrate data within your MongoDB setup, follow these detailed steps:

  1. Initialize Database Sharding: On the main Pulse node, where both ad-db and ad-sa-router are operational, execute the command to initiate sharding across all clusters:
Bash
Copy

Respond to the prompts to configure sharding:

Bash
Copy
  1. Verify Sharding Status: Check the sharding setup and zone configurations:
Bash
Copy
  1. Convert ad-db to a Router: Modify the ad-db service configuration to function as a router by updating the environment variables and adding the required hosts in the /data01/acceldata/config/docker/ad-core.yml file:
Bash
Copy
  1. Add the configs for ad-db-tmp in the ad-sa-router.yml and deploy the Query Router again to incorporate these changes.
  2. Verify the rerouting and create new indices:
Bash
Copy
  1. Deploy another additional router on a different node following the above steps, but excluding step 5.
  2. Migrate Data: Transfer data from ad-db-tmp_default to the newly configured router ad-router_default , excluding the Acceldata database as it is already migrated as static data:
Bash
Copy
  1. Verify Data Migration: Ensure the migration was successful by checking the databases:
Bash
Copy

Gauntlet Configurations to Update Zones on a Timely Basis

To update the zones on a timely basis with Gauntlet configurations after deploying a MongoDB Shard Cluster using Accelo, follow these steps:

Configure Data Retention:

  • Execute the accelo config retention command to set up data retention according to your needs. When prompted, confirm that MongoDB is sharded by answering y to Is MongoDB sharded [y/n]?:
Bash
Copy
  • Specify the number of days you wish to retain data in MongoDB for HDFS reports, and in TSDB.
  • Indicate how often MongoDB cleanup and compaction should occur by providing a comma-separated string of hours (e.g., "0,8,12,15,18").
Bash
Copy

Push Configuration Updates:

  • Run accelo admin database push-config to apply the configured settings.

Adjust Zone Ranges:

  • If necessary, generate or update the ad-core.yml file to change the range of a zone. Use the accelo admin makeconfig ad-core command to generate this file if it doesn't already exist.
  • Open ad-core.yml in a text editor and insert the ZONE_TIMESPAN_IN_HOURS=<value> line under the environment section for ad-gauntlet, substituting <value> with the desired time in hours for a zone's range.
  • Example snippet from ad-core.yml:
Bash
Copy

Ensure there's no conflict between the snap_mongo_cleanup_frequency_in_hours value in accelo.yml and the cron configuration for Gauntlet in ad-core.yml. The default values are optimized for most scenarios and are generally recommended not to be altered.

Adding New Shards in the Future

Purpose:

  • To extend data retention capabilities.
  • To enhance the performance of the existing cluster.

Procedure:

  1. Deploy a New Shard Server: Begin by deploying a new shard server.
  2. Integrate the Shard with the Router: Use the following command to add the new shard to your router:
Bash
Copy
  1. Establish a New Zone for the Shard: Execute this command to create a new zone for the newly added shard:
Bash
Copy
  1. Set New Zone Limits:
    • Retrieve the maximum limit from the current setup and define the new zone's range accordingly.
    • Adjust the zone key range for various collections by running commands similar to the following, substituting the placeholders with your actual data:
Bash
Copy
  • Repeat the above step for all relevant collections, modifying the collectionName and key as necessary to match your cluster's configuration.

Adjusting Zone Ranges:

Final Steps

Ensure you've correctly set up and configured each part of your MongoDB sharding environment. Regularly check the status of your sharding setup and make adjustments as needed to maintain optimal performance and data management.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard