For very large migrations, AWS Database Migration Service (AWS DMS) replication can run for hours or days depending on the data being replicated. It’s advisable to monitor the AWS DMS resources for a smooth migration. Monitoring your resources can help you detect anomalies and trigger notifications based on the threshold metrics configured. You can use Amazon CloudWatch to collect, track, and monitor AWS resources using metrics. With CloudWatch, you can create alarms that watch metrics and send notifications when a threshold is breached. You can configure CloudWatch to monitor your replication using the AWS Management Console, the AWS Command Line Interface (AWS CLI), or AWS DMS API.
AWS DMS replication is set up between two databases with multiple tasks performing full load or change data capture (CDC). With AWS DMS, you can perform one-time migrations and replicate ongoing changes to keep sources and targets in sync. These replication tasks run for hours or days, and can run into various replication issues, such as the following:
- Low memory – The replication instance is running low on memory
- High CPU – CPU is used at capacity
- Excessive swap usage – Not enough memory is allocated
- Network disruptions – Network failure between source and target instances
To avoid these replication issues and redundant activities by DBAs, which can delay the migration timelines, it’s recommended to identify and detect these failures ahead of time to avoid any replication disruption. Setting up the CloudWatch monitoring alarms on AWS DMS replication and its tasks helps you alert proactively, so that you can act accordingly.
This post describes how to set up and configure the monitoring of AWS DMS resources. You can implement the solution with the AWS CLI or the console. For this post, we use the AWS CLI to create our CloudWatch alarms.
The AWS CLI is an open-source tool that enables you to interact with AWS services using commands in your command line shell. For more information about installing and configuring the AWS CLI, see Installing the AWS CLI.
AWS DMS helps you migrate databases to AWS securely. It supports homogeneous and heterogeneous migrations between different database platforms, such as Oracle to Amazon Aurora. AWS DMS supports continuous data replication while maintaining high availability, and has been widely adopted for database migrations because it’s easy to configure. For more information, see What Is AWS Database Migration Service?
To use AWS DMS, you need to create a DMS replication instance, create a source endpoint that connects the source database to read data, and create a target endpoint that connects to the target database and loads the data. You can create one or more replication tasks to replicate and migrate your data between source and target databases. In this post, we describe the metrics that you can monitor on your AWS DMS resources.
An AWS DMS replication instance is an Amazon Elastic Compute Cloud (Amazon EC2) instance that performs the actual data migration between source and target databases. The replication instance also caches the changes during the migration, which is very CPU and memory intensive. You can set up various metrics for AWS DMS replication. For this post, we set up the following:
AWS DMS replication tasks are responsible for migrating and replicating data between the source and target endpoints. For this post, we set up the following key metrics:
To follow along with this solution, you need the following resources:
- An AWS account with permissions to access resources in AWS DMS and CloudWatch
- Access to a Linux/Unix machine installed and configured with the AWS CLI
- Necessary AWS Identity and Access Management (IAM) permissions granted to the EC2 instance for accessing the CloudWatch alarms
- An AWS DMS replication instance running in your AWS account
Setting up your replication instance
To set up monitoring, it’s a best practice to name your CloudWatch alarm metrics uniquely. For this post, we create an AWS DMS replication instance (sample-dms-replication-instance) and AWS DMS tasks (dms-task-1) and configure its source and target endpoints. Complete the following steps:
- On the AWS DMS console, choose Replication instances.
- Choose your desired instance (for this post, we use sample-dms-replication-instance).
To name your alarm uniquely, use the following parameters:
- team_tag_value – The parameter should be unique tag value
- resource_name – The unique resource name of your choice (this post uses
- metric_name – The alarm metrics
- replicaton_instance_identifier – The
replication_instance_identifiername (for this post, we use
- region – The Region of your AWS DMS replication instance
- replication_instance_arn – The ARN of your replication instance (for this post, it’s
- Run the following commands on any Linux/Unix system to set up the environment variables referring to the preceding parameters:
- To check all the available metrics on your AWS DMS resource, enter the following code:
Setting up AWS DMS replication instance CloudWatch metrics
The following are some of the significant CloudWatch metrics for monitoring the AWS DMS replication instance:
- CPUUtilization – Amount of CPU used
- FreeStorageSpace – Amount of available storage space in bytes
- FreeableMemory – Amount of available random-access memory
- WriteIOPS – Average number of disk write I/O operations per second
To set up an alarm for CPU utilization, run the following command, which creates a CloudWatch alarm for the replication instance when the CPU utilization is more than 70 percent for a period of 5 minutes for 3 data points:
To check the allocated storage (in GB) for your running AWS DMS instance, run the following command:
To set up an alarm for free storage space (in bytes), run the following command:
Freeable memory is memory that can be freed and used for other processes; it’s the amount of cache used on the replication instance. Although the
FreeableMemory metric doesn’t reflect actual free memory available, the combination the
SwapUsage metrics can indicate if the replication instance is overloaded.
To set up an alarm for freeable memory less than 1 GB, run the following command:
To set up the alarm for the average number of
WriteIOPS, run the following command:
After you set up these metrics, you can see the alarms in CloudWatch. The following screenshot shows the alarms set up for sample-dms-replication-instance with the state OK.
To describe the alarm that you configured, run the following command (the following code checks the alarm for
Setting up an alarm for replication tasks
After you configure the AWS DMS replication alarms, you set up the replication tasks.
- On the AWS DMS console, choose Database migration tasks.
- Choose the task you want to set up an alarm for.
You need the following parameters to name your alarm uniquely:
- replication_task_identifier – Replication task identifier
- replication_task_arn – Replication task ARN
- task_name – Name of the replication task
Run the following commands on any Linux/Unix system to set up the CloudWatch metrics:
To check the available metrics for your task, run the following command:
Setting up AWS DMS replication tasks CloudWatch metrics
The following are some of the significant metrics for monitoring AWS DMS replication tasks:
- CDCThroughputRowsTarget – Outgoing task changes for the target in rows per second
- CDCThroughputRowsSource – Incoming task changes from the source in rows per second
To describe the task the
ReplicationTaskIdentifier, enter the following code:
To set up the alarm for
CDCThroughputRowsTarget, run the following command:
To set up the alarm for
CDCThroughputRowsSource, run the following command:
These alarms aren’t initialized until the task is started.
The following screenshot shows the alarms configured after running the alarm setup commands.
To describe the alarms for this task, run the following command:
- CDCLatencySource – The gap, in seconds, between the last event captured from the source endpoint and current system timestamp of the AWS DMS instance. If no changes have been captured from the source due to task scoping, AWS DMS sets this value to zero.
- CDCLatencyTarget – The gap, in seconds, between the first event timestamp waiting to commit on the target and the current timestamp of the AWS DMS instance. Target latency should never be smaller than the source latency.
- NetworkTransmitThroughput – The outgoing (transmit) network traffic on the replication instance, including customer database traffic and AWS DMS traffic used for monitoring and replication.
- NetworkReceiveThroughput – The incoming (receive) network traffic on the replication instance, including customer database traffic and AWS DMS traffic used for monitoring and replication.
This post discussed some of the CloudWatch metrics you can use to monitor your AWS DMS replication instance and replication tasks.
For information about debugging AWS DMS, see Debugging Your AWS DMS Migrations: What to Do When Things Go Wrong (Part 1) and Debugging Your AWS DMS Migrations: What to Do When Things Go Wrong (Part 2).
Try this approach in your environment and see the benefits. We hope this post helps you with your AWS DMS replication monitoring. Please reach out with questions or feature requests via the comments.
About the Authors
Jeevith Anumalla is an Oracle Database Cloud Architect with the Professional Services team at Amazon Web Services. He works as database migration specialist to help internal and external Amazon customers to move their on-premises database environment to AWS data stores.
Sagar Patel is a Database Specialty Architect with the Professional Services team at Amazon Web Services. He works as a database migration specialist to provide technical guidance and help Amazon customers to migrate their on-premises databases to AWS.