AWS Data Migration Service – Replication Aspect

20 Oct, 2017 | 4 minutes read

AWS Data Migration Service (DMS) reduces migration time and migrates databases to and from AWS easily and securely. This AWS service supports homogeneous migrations such as MS SQL to MSSQL, heterogeneous migrations among different database platforms, such as Oracle to PostgreSQL or MySQL to Oracle or migration to different versions (upgrade or downgrade) from the same databases with virtually no downtime. Migration can be done to and from the most widely used commercial and open-source databases, such as Oracle, MySQL, and PostgreSQL.

AWS Database Migration Service can migrate your data in several ways:

  • Migrating Data to the Target Database – This process creates files or tables in the target database, automatically defines the metadata that is required at the target, and populates the tables with data from the source. The data from the tables is loaded in parallel for improved efficiency. This process is the Migrate existing data option in the AWS console and is called Full Load in the API.
  • Capturing Changes During Migration – This process captures changes to the source database that occur while the data is being migrated from the source to the target. When the migration of the originally requested data has been completed, the change data capture (CDC) process then applies the captured changes to the target database. Changes are captured and applied as units of single committed transactions, and several different target tables can be updated as a single source commit. This approach guarantees transactional integrity in the target database. This process is the Migrate existing data and replicate ongoing changes option in the AWS console and is called full-load-and-cdc in the API and will be in focus of this blog.
  • Replicating Only Data Changes on the Source Database – This process reads the recovery log file of the source database management system (DBMS) and groups together the entries for each transaction. If AWS DMS can’t apply changes to the target within a reasonable time (for example, if the target is not accessible), AWS DMS buffers the changes on the replication server for as long as necessary. It doesn’t reread the source DBMS logs, which can take a large amount of time. This process is the Replicate data changes only option in the AWS DMS console.

Using AWS DMS allows customized table mappings and translations. This post is focused on the continuous data replication feature, which keeps the target database up to date after the initial migration.

Migrate from on-premise MS SQL to AWS RDS PostgreSQL database – continuous replication

The main players in one AWS DMS migration process are replication instances, endpoints, and migration tasks.

how DMS operates

To understand what happened, we need to consider how DMS operates. DMS functions as an intermediate broker to a variety of different database engines. To make this conversion happen, DMS first maps the source data type (JSON in this case) to an intermediate DMS type and then maps that type to the target data type (long text). It’s very important to take into account these mappings when designing your migration strategy. Note that I used the default mapping. DMS also allows you to customize the mappings if you want.

AWS DMS provides comprehensive ongoing replication of data, although it replicates only a limited amount of data definition language (DDL). AWS DMS doesn’t propagate items such as indexes, users, privileges, stored procedures, and other database changes not directly related to table data.

If you want to use ongoing replication, you must enable the Multi-AZ option on your replication instance. The Multi-AZ option provides high availability and failover support for the replication instance.

Also, there are several configuration steps that needs to be done on source Endpoint. For example, if you migrate from an MS SQL source, you need to decide how Capture Data Change will be done:

  • MS-Replication
  • MS-CDC
  • Without MS-Replication and MS-CDC only for INSERT and DELETE

If MS SQL source is configured to use MS-CDC there are limitation listed here:

http://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.SQLServer.html#CHAP_Source.SQLServer.CDCLimitations .

After the source is configured are ready to go, you can configure the DMS replication task.

Ongoing replication setup

Let’s create one task that will support the replication of source changes to the target. We assume that endpoints and replication instances are already set up: http://iwconnect.com/data-migration-service/

Next is to create a task:

  • Navigate to Tasks from the DMS main menu
  • Select Create Task
  • Choose Task name, Replication instance, Source, and Target endpoints
  • For Migration type choose Migrate existing data and replicate ongoing changes
  • Because the task will do the initial load in Task Setting you can choose Drop tables on target (if any)
  • If you have LOB, choose a setting that suits your need
  • You can enable logging for extra info about the migration process
  • That’s it. You can hit Create task.

DMS first migrates initial data and then enables replication for ongoing changes. If you make data change in the source you will quickly see the change in the destination data set.

DMS provides a good statistics dashboard for Tasks: Task Monitoring and Table Statistics.

Conclusion

Companies change their database resources and migrate the data which usually causes them extended outages. The AWS DMS provides quick setup, and fast database migration workloads to AWS or database’s engine change and minimizes any associated downtime.