Generic Guide to Database Migration
The process of database migration involves transferring definitions of meta-objects, data, stored procedures, functions and triggers from one platform to another, while also implementing necessary changes to the application. This includes preparing, selecting, extracting, and transforming data with respect to differences between the source and target database management systems.
There are many reasons for database migration:
- Cost – commercial DBMS may have very high price of licenses and strict terms of licensing. On the other hand, there are open-source database management systems with similar capabilities that are free to install and use.
- Customization – unlike commercial systems, open-source equivalents are usually supplied with wide collection of extensions and add-ons to implement database management and development tasks and most of them are free to use.
- Flexibility – every open-source DBMS provides easy integration with DBaaS (cloud) providers such as AWS, so there is no risk of the single vendor lock-in.
Because the migration involves transferring data between two relational database management systems (RDBMS) with different structures and data types, the process can be difficult and time-consuming. The steps involved in the database migration process can be grouped into the following phases:
- Assessment (generic compatibility, architecture and application code)
- Migration of schemas and related meta-objects
- Functional and performance testing
- Migration of Data
Assessment. The first phase in planning a database migration involves an assessment of the application to determine the feasibility of moving it from the source DBMS to the target DBMS. This phase requires a comprehensive analysis of technology-related issues, including an evaluation of the compatibility of the client, application server, data access, and database features.
One critical aspect that can be easily overlooked when considering a move to a new database management system is verifying that the packaged software application you are using is properly certified for the target DBMS, especially if you do not control your own application. If the application is not certified, you will need to either persuade the vendor to add support for the new database or select another application.
After ensuring compatibility between the source and target databases, certain prerequisites should be met before initiating data migration, including server resources, operating system, and the installation and configuration of data migration software and related drivers. It is important to ensure that the target server resources are sufficient and scalable enough to handle the volume of data being received. If the data volume is extensive, an online migration may not be feasible, and an export-and-reload approach may be necessary. It is also recommended to use a migration strategy that divides the migration into parts, as discussed in the section on migration strategies below.
Schema Migration. After completing the assessment phase, the next step in the migration process is to identify any discrepancies in schema and data formatting between the source and target database management systems. It is essential to address these differences before the data migration to prevent potential errors that can be frustrating and time-consuming.
For example, if one of DBMS supports ANSI standard of SQL syntax and data types and another does not, the unsupported features should be identified and then converted manually with supported syntax of the target system or feature workarounds.
Performance testing. This is a crucial step in the migration phase because there may be variations in the functionality of some of the source database built-in transactions or features when transitioning to the target platform, which can impact the application. This phase involves identifying and capturing these differences, which can then be addressed through tuning at the application, data access (drivers), and database levels.
On this stage it is also crucial to test the converted schema on a sample dataset. One recommended approach is to load sample data from a source database development or testing environment with production sample data into the target database, then set up an application connection using appropriate data access (drivers). Once the application has connected to the database, conduct full functional testing on the converted objects with DMLs.
To ensure accuracy, it is advisable to load the small fragment of dataset in both source and target databases, then compare the SQL results to confirm their identity. Any issues revealed by the functional tests should be reviewed and addressed promptly.
Data Migration. Various approaches and tools are available in the market for data migration. These approaches can generally be classified into three categories:
- Snapshot. This method involves taking a snapshot of the source database state and applying it to the target database. Data is moved from source DBMS to the target all at once, and no WRITE operations are permitted on the source database during the snapshot process. The snapshot approach is considered one of the cleaner and simpler ways of data migration.
- Parallel (multi-threaded) snapshot. Parallel in chunks is another variation of snapshot method providing data is split into fragments and snapshots are taken simultaneously. Most tools support this method, and the process is triggered in parallel. There are two ways to perform a snapshot in chunks: on a table-by-table basis, or by splitting a large table into smaller sets using primary keys or any unique row identifiers. With this approach, the snapshot duration and downtime window are significantly reduced. Good scripting skills are required to configure data migration tools for table or large table migration.
- Change Data Capture (CDC). This is a well-established approach for data migration that has been around for decades. In this approach, software is used to track and capture changes in real-time from the source database, and then apply those changes to the target database. CDC software is highly sought after today because it offers reliable, low-latency, and scalable data distribution between heterogeneous databases. The most commonly used CDC approaches for database migration are the Trigger-based and Transaction Log-based methods.
Approaches 1 and 2 require application downtime because data is written only once from the source database management system to the target. In contrast, approach 3 involves continuous data loading with a smaller downtime window. It is essential to select the appropriate data migration approach that fits within the required downtime window.