AWS Database Migration Service (AWS DMS) is a cloud service that makes it easy to migrate relational databases, data warehouses, NoSQL databases, and other types of data stores. It’s used to migrate data into the AWS Cloud between on-premises instances or between combinations of cloud and on-premises setups.

During data migration with AWS DMS, it’s important to monitor the status of the ongoing replication tasks, which you can do on the task’s control table and with Amazon CloudWatch. You can monitor your task’s progress and the resources and network connectivity used via the AWS Management Console, the AWS Command Line Interface (AWS CLI), or AWS DMS API.

You may often use multiple tasks to perform a migration. These tasks are independent and can run concurrently, and the number of replication tasks can vary depending on the circumstances. When you have many ongoing replication tasks, monitoring the progress of each task manually becomes tedious.

In this post, we provide you an automated solution using AWS CloudFormation templates. The solution includes the following steps:

  1. Create a CloudWatch alarm for the replication task.
  2. Create AWS DMS event subscriptions.
  3. Configure Amazon Simple Notification Service (Amazon SNS) to notify you of errors in the CloudWatch logs for the task.
  4. Create an AWS Lambda function to send an SNS notification for recurring CloudWatch alarms.

Prerequisites

Before you get started, you must have the following resources:

After you have these prerequisites, you can start automating your replication task monitoring.

CloudWatch alarms for an AWS DMS replication task

Creating CloudWatch alarms for a replication task is the preferable way to monitor task status because it sends an alarm whenever changes occur in the replication task metrics.

We recommend setting alarms for the following metrics:

  • CDCLatencySource
  • CDCLatencyTarget
  • CDCChangesDiskSource
  • CDCChangesDiskTarget

For more information about AWS DMS metrics, see AWS Database Migration Service metrics.

CDCLatencySource

CDCLatencySource is the gap, in seconds, between the last event captured from the source endpoint and the current system timestamp of the AWS DMS instance. If no changes are captured from the source due to task scoping, AWS DMS sets this value to zero.

AWS DMS reads changes from the source database transaction logs during ongoing replication.

Depending on the source DB engine, the source transaction log may have uncommitted data. During ongoing replication, AWS DMS reads incoming changes from the transaction logs, but forwards only committed changes to the target. Eventually, this results in source latency.

CDCLatencyTarget

CDCLatencyTarget is the gap, in seconds, between the first event timestamp waiting to commit on the target and the current timestamp of the AWS DMS instance. This value occurs if there are transactions that the target doesn’t handle. Otherwise, target latency is the same as source latency if all transactions are applied. Target latency should never be smaller than source latency.

Target latency is higher than source latency because it’s the total latency of a record from insertion time in the source database until a commit happens to that row.

CDCChangesDiskSource

CDCChangesDiskSource is the number of rows accumulating on disk and waiting to be committed from the source.

All the rows shown as part of CDCChangesDiskSource were once in memory and spilled because they hit the threshold of time allowed to reside in memory. Our goal is to understand the internals of the engine and minimize CDCChangesDiskSource as much as possible using task settings. Some task settings that can help achieve that are MemoryLimitTotal and MemoryKeepTime. For more information, see Debugging Your AWS DMS Migrations: What to Do When Things Go Wrong (Part 2).

CDCChangesDiskTarget

CDCChangesDiskTarget is the amount of rows accumulating on disk and waiting to be committed to the target.

We should concentrate on making sure processing occurs in memory. If CDCChangesDiskTarget is increasing, it can mean a couple of things: the memory on the replication instance might be overutilized, or the target DB instance might not be able to accept changes at the rate AWS DMS is sending them.

Creating a CloudWatch alarm

The following CloudFormation stack creates the CloudWatch alarms for your AWS DMS task:

Provide the stack with the following information:

  • Stack name
  • AWS DMS task identifier
  • AWS DMS replication instance name
  • SNS topic ARN

You can leave all other settings at their defaults.

Creating AWS DMS event subscriptions

You can use AWS DMS event subscriptions to receive notifications when a specified event occurs for a replication task or replication instance (for example, when an instance is created or deleted).

For replication tasks, create subscriptions for the following events:

  • Configuration change
  • Creation
  • Deletion
  • Failure
  • State change

For replication instances, create subscriptions for the following events:

  • Configuration change
  • Creation
  • Deletion
  • Failover
  • Failure
  • Low storage
  • Maintenance

AWS DMS sends event notifications to the addresses you provide when you create an event subscription. You might want to create several different subscriptions, such as one subscription to receive all event notifications and another subscription that includes only critical events for your production AWS DMS resources.

You can easily turn off notifications without deleting a subscription by setting the Enabled option to No on the AWS DMS console or by setting the Enabled parameter to false using the AWS DMS API. For more information, see Working with events and notifications in AWS Database Migration Service.

The following CloudFormation stack creates event subscriptions for your AWS DMS task:

Provide the stack with the following information:

  • Stack name
  • AWS DMS task name
  • SNS topic ARN

Leave the other settings at their defaults.

Creating an SNS notification for errors in CloudWatch logs

AWS DMS can publish detailed task information to CloudWatch Logs. You can use this to help monitor your task’s progress as it runs and diagnose any problems that occur.

By default, logs are stored in the log stream dms-task-<Task Identifier> in the log group dms-tasks-<replication-instance-name>. For more information, see Logging task settings.

To get a notification of an error logged in your CloudWatch log, create a subscription filter on the log group. See the following Python script:

from __future__ import print_function
import json
import base64, zlib
import boto3
import os

def logstream_handler(event, context):
    bstream_data = event.get("awslogs").get("data")
    decoded_data = json.loads(zlib.decompress(base64.b64decode(bstream_data),16 + zlib.MAX_WBITS))
    client = boto3.client('sns')
    subscriptionFilters = decoded_data.get("subscriptionFilters")
    subject = ""
    if subscriptionFilters:
        subject = "Log Filter Alert : {0}".format(subscriptionFilters[0])
    decoded_msg = decoded_data.get("logEvents")
    msg = "logGroup : {0}nlogStream : {1}".format(
        decoded_data.get("logGroup"),
        decoded_data.get("logStream"))
    msg = "{0}nnMessages: n".format(msg)
    for m in decoded_msg:
        msg = "{0}n{1}".format(msg,m.get("message"))
    topicARN=os.environ.get("topicARN")
    args = {}
    args["TargetArn"]=topicARN
    args["Message"]=msg
    if subject:
        args["Subject"]=subject
    response = client.publish(**args)
    return {
        "statusCode": 200,
        "body": json.dumps('Sent Message.')
    }

The following CloudFormation stack creates the environment to send an SNS notification for errors:

Provide the stack with the following information:

  • Stack name
  • Log group name
  • SNS topic ARN
  • Filter Pattern

Leave all other settings at their defaults.

Creating a Lambda function to send SNS notifications for recurring CloudWatch alarms

You can create multiple CloudWatch alarms to know when the state of an alarm changes. In some use cases, the alarm stays in alert state for a long time and you might miss the alarm already sent. To get recurring alarms, we provide you with a Lambda function that checks the state of the alarm and the duration for that state and sends a notification.

The following Python script sends an SNS notification invoked by a CloudWatch Events rule:

import json
import boto3
import os

cloudwatch = boto3.client('cloudwatch')
sns = boto3.client('sns')

subject_str = '{}: "{}" in {}'
message_str = """You are receiving this email because your Amazon CloudWatch Alarm "{}" in the {} region has entered the {} state, because "{}".

Alarm Details :
    - Name: {}
    - Description: {}
    - Reason for State Change: {}


Monitored Metric:
    - MetricNamespace: {}
    - MetricName: {}
    - Dimensions: {}
    - Period: {}
    - Statistic: {}
    - Unit: {}
    - TreatMissingData: {}
"""

def send_alarm(topic, subject, message):
    """ Sends SNS Notification to given topic """
    response = sns.publish(
                    TopicArn=topic,
                    Message=message,
                    Subject=subject
                )
    print("Alarm Sent Subject : {}".format(subject))
    return

def main_handler(event, context):
    """
        Describes existing alarms in current region and check it's state
        If state matches to alarmState that sent as input send alarm.

        Parameters
        ----------
        alarmNames - ['string']
        alarmState - string (Alarm/OK/INSUFFICIENT_DATA)
    """
    alarm_names = event['alarmNames']
    alarm_state = event.get('alarmState', 'Alarm').lower()
    region = os.environ["AWS_REGION"]
    response = cloudwatch.describe_alarms(
        AlarmNames=alarm_names
        )
    metric_alarms = response["MetricAlarms"]
    if len(metric_alarms) == 0:
        return {
            'statusCode': 200,
            'body': json.dumps('No Alarms Configured')
        }
    for alarm in metric_alarms:
        if alarm["StateValue"].lower() != alarm_state:
            continue
        topics = alarm["AlarmActions"] if alarm_state == 'alarm' else alarm["OKActions"] if alarm_state == 'ok' else alarm['InsufficientDataActions'] if alarm_state == 'insufficient_data' else []
        if len(topics) == 0:
            print('No Topics Configured for state %s to %s' %(alarm_state, alarm['AlarmName']))
            continue
        subject = subject_str.format(alarm["StateValue"], alarm['AlarmName'], region)
        message = message_str.format(alarm['AlarmName'], region,
                                    alarm['StateValue'], alarm['StateReason'],
                                    alarm['AlarmName'], alarm['AlarmDescription'],
                                    alarm['StateReason'], alarm['Namespace'],
                                    alarm['MetricName'], str(["{}={}".format(d['Name'], d['Value']) for d in alarm["Dimensions"]]),
                                    alarm['Period'], alarm['Statistic'],
                                    alarm.get('Unit', 'not specified'), alarm['TreatMissingData'])
        for topic in topics:
            send_alarm(topic, subject, message)
    return {
        'statusCode': 200,
        'body': json.dumps('Success')
    }

The following CloudFormation stack creates the environment to send SNS notifications of errors:

Provide the stack with the following information:

  • Stack name
  • AWS DMS task name
  • SNS topic ARN

Leave all other settings at their defaults.

Conclusion

In this post, we showed you how to use CloudWatch, AWS DMS event subscriptions, Amazon SNS, and AWS Lambda to automate the monitoring and alerts of AWS DMS replication tasks.

With this solution, you can easily track your replication task status without using the console; you’re notified of every event change and get alarms if any errors occur.

We hope this post is useful for monitoring your database migration using AWS DMS.


About the authors

Venkata Naveen Koppula is an Associate Consultant with AWS Professional Services. He works with AWS DMS, SCT, Aurora PostgreSQL to bring the best possible experience to their customers.

 

 

 

Vijaya Diddi is an Associate Consultant with AWS Professional Services. She works with AWS DMS, SCT, AWS Config, and SSM Documents. She enjoys working on automation tools using Python for migrations to become easier.