Achieving Seamless PostgreSQL Upgrades from 10 to 12 on AWS: Lessons from Tata 1mg

Co-authors:

  1. Swati Grover ( Associate Technical Architect @ TataĀ 1mg)
  2. Prashant Mishra (Technical Architect @ TataĀ 1mg)

*Introduction

At Tata 1mg, we recently undertook the challenge of upgrading our PostgreSQL databases from version 10 to 12 on AWS. Through a Proof of Concept (POC) approach and meticulous planning, we encountered and resolved a host of challenges that we believe can benefit others in similar situations. In this article, weā€™ll detail the critical steps we took to ensure a seamless upgrade process, outlining our journey from preparation to post-upgrade tasks.

DB UpgradeĀ Overall

*Problem Statement

Upgrading a critical component like a PostgreSQL database is a fundamental requirement for maintaining optimal performance, security, and accessing new features. However, the process of upgrading a database system introduces challenges, particularly when it comes to minimizing or eliminating downtime, which can significantly impact the availability and functionality of applications that rely on the database.Traditional upgrade methods often involve taking the database offline during the upgrade process, resulting in service disruption, dissatisfied users, and potential revenue loss for businesses.The overarching problem is to devise a strategy that allows for the seamless upgrade of PostgreSQL databases while minimizing or eliminating downtime, ensuring that applications can continue serving users uninterrupted.

*Pre-Requisites: Setting the Stage forĀ Success

Before delving into the upgrade process, we established several essential prerequisites:Create a Parameter Group

  • Create a parameter group named production-postgres-12.

Create and Promote ReadĀ Replica

  • Create a read replica with the same size as the master database.
  • Follow the naming convention: source ā†’ source-db (e.g., chronos-production ā†’ chronos-production-db).
  • Promote the read replica to master once itā€™s ready and available, Now this will be our targetĀ RDS.

Storage Considerations SourceĀ RDS

  • Make sure thereā€™s ample available storage in the source RDS. Expand the size of the source RDS by adding a buffer of 30 to 50% of its current size. This precautionary step will prevent storage problems in the source RDS during the upgradeĀ process.
  • Since the source RDS will be deleted after the upgrade, any cost increase due to the size expansion will be short-term and wonā€™t have lasting cost implications.

Target RDSĀ Settings

  • Apply newly created parameter group(production-postgres-12) to thisĀ RDS.
  • Apply various security and critical settings which were applied on sourceĀ RDS.

Unsupported Usage checks on TargetĀ RDS.Commit or roll back all open prepared transactions before attempting anĀ upgrade.SELECT count(*) FROM pg_catalog.pg_prepared_xacts;Verify ā€œunknownā€ data type does not exist in the DatabaseĀ schemaSELECT DISTINCT data_type FROM information_schema.columns WHERE data_type ILIKE 'unknown';Truncate on TargetĀ RDS.Be careful and doubly check that you are executing this command on target(new) RDSĀ only.7a) target-rds-master=> CREATE OR REPLACE FUNCTION truncate_tables(username IN VARCHAR) RETURNS void AS $$DECLARE statements CURSOR FOR SELECT tablename FROM pg_tables WHERE tableowner='username' AND schemaname='public';BEGIN FOR stmt IN statements LOOP EXECUTE 'TRUNCATE TABLE ' || quote_ident(stmt.tablename) || ' CASCADE;'; END LOOP;END;$$ LANGUAGE plpgsql;-------------------------------------7b) target-rds-master=> SELECT truncate_tables('username'); // it should be particular DB user nameAccess Control and Enable Logical Replication on SourceĀ RDSEmpower the master user with SELECT privileges across all tables in the publicĀ schema.source-rds-master=>grant SELECT on all tables in schema public to dev;Check for logical replication slotsource-rds-master=> select * from pg_replication_slots;slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn -----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------(0 rows)Create a publication for all tables or selective table in the source database. See the following codesource-rds-master=> CREATE PUBLICATION alltables FOR ALL TABLES;CREATE PUBLICATION OR source-rds-master=> CREATE PUBLICATION <pub_name> FOR table <tablename>; CREATE PUBLICATIONsource-rds-master=> select * from pg_replication_slots; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn -----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------(0 rows)SELECT * FROM pg_publication; // verify is publication createdEnable Subscription on TargetĀ RDSCreate a subscription on the target database. See the following codetarget-rds-master=> CREATE SUBSCRIPTION sub CONNECTION 'dbname=<db to be replicated> host=<source-rds> user=<master-user> password=pasxxxxxx' PUBLICATION alltables;eg - patient_service_db=> CREATE SUBSCRIPTION sub CONNECTION 'dbname=patient_service_db host=patient.cwpw16dip7iz.ap-south-1.rds.amazonaws.com user=master password=pasxxxx' PUBLICATION alltables;NOTICE: created replication slot "sub" on publisherCREATE SUBSCRIPTIONVerify that subscription slots with temporary flag true areĀ createdpatient_service_db=> select * from pg_replication_slots; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn----------------------+----------+-----------+--------+--------------------+-----------+--------+------------+------+--------------+--------------+--------------------- sub | pgoutput | logical | 16401 | patient_service_db | f | t | 25397 | | 57094487 | 456/254A52F0 | 456/254DE170 sub_50777_sync_16906 | pgoutput | logical | 16401 | patient_service_db | t | t | 26081 | | 57083701 | 455/DD8BEAE0 | 455/DD8BEB18 sub_50777_sync_16870 | pgoutput | logical | 16401 | patient_service_db | t | t | 25403 | | 57083682 | 455/DD5748F8 | 455/DD574930(3 rows)Enforcing Cohesive Replication Slots for Dependent Multi-Table ReplicationIn case of multiple tables replication ensure dependent tables are part of sameĀ slot.Fine tuning

  • Fine-tune source RDS storage and IOPS for smooth replication.
  • Fine-tune Target RDS storage type, instance type, IOPS settings for faster replication. This step is very crucial if we see below message under ā€œlogs andĀ eventsā€
Log messages

*Executing the Upgrade: Guided Steps toĀ Success

Systematically perform the upgrade process when replication slots are in a synchronized state (temporary flag in the replication slot is set to false). This activity is carried out when both the source RDS and target RDS are synchronized, meaning the latest changes from the source RDS have been successfully replicated to the targetĀ RDS.Achieving this synchronization might take a variable amount of time, typically spanning from a few hours to several days, contingent upon the size of the databases. Therefore, ensure that you initiate the following activity only when both RDS instances are in this synchronized state:

  • Validate schema and index consistency on both RDS 10 and RDSĀ 12.
  • Confirm data counts and parity between RDS 10 and RDSĀ 12.

Source RDS activities

  1. With master user perform belowĀ commands

REVOKE CONNECT ON DATABASE $DB_NAME from PUBLIC; REVOKE CONNECT ON DATABASE $DB_NAME from $DB_USER_NAME; Wait for few seconds generally ~5 seconds;2. With DB_USER perform belowĀ commandsselect pg_terminate_backend(pid) from pg_stat_activity where datname = '$DB_NAME' and usename = '$DB_USER_NAME' and pid <> pg_backend_pid();select pid, query from pg_stat_activity where usename = '$DB_USER_NAME'; # should give 0 rows3. We need to synchronize sequences between Source and target. Execute below command on SourceĀ RDSselect 'SELECT SETVAL(' ||quote_literal(sequence_name) || ',' ||nextval(sequence_name) || ');' from information_schema.sequences; ##Copy full output from above command and keep it safe in notepad

  • Make sure that the terminal connection established with the source RDS remains open until the activity is finished. If the connection is accidentally closed, you will need to reconnect using the master user, grant connect access to the DB_USER, and then restart from stepĀ 1.

Target RDS Activities

  • Connect with DB_USER and execute sequences command saved inĀ notepad.
  • Grant necessary permissions on the this RDS to fortify the environment.

*Post-Upgrade Tasks: Ensuring Stability and Excellence

After a successful upgrade, address the following:1.Database Validation (within approximately 10 seconds) through parallel execution of activitiesĀ :

  • Validate data counts and parity between RDS v10 and RDS v12 by executing pre-written queries.
  • Leverage CloudWatch to monitor RDS performance.
  • Ensure data integrity by executing sampleĀ queries.

2. DNS switch andĀ Downtime

  • Adjust the DNS to point to new RDS and monitor application connectivity.
  • The entire process, including blocking new connections on the source RDS, updating sequences on the target RDS, performing Database Validation, and switching DNS, entails an estimated downtime of around 30Ā seconds.

2. Application Validation:

  • Confirm application connectivity with the upgraded database.
  • Rigorously test end-to-end service functionality.

*Rollback Strategy: Navigating Unexpected Terrain

GRANT CONNECT ON DATABASE $DB_NAME TO PUBLIC on SourceĀ RDS;GRANT CONNECT ON DATABASE $DB_NAME to $DB_USER_NAME on SourceĀ RDS;DNS Switch back from target to sourceĀ RDS.Migrating delta records from target to source RDS ifĀ any.Delete Replication slots on both Source and TargetĀ RDSsource_rds_master=> DROP PUBLICATION alltables; // You can check publication name using SELECT * FROM pg_publication;target_rds_master=> DROP SUBSCRIPTION sub; // You can check subscription name using SELECT * FROM pg_subscription;

*Conclusion: Elevating Expertise in Database Management

Mastering PostgreSQL database upgrades is an intricate dance of planning, precision, and execution. This guide empowers adept database experts to navigate the landscape with confidence. Armed with precise steps and code snippets, youā€™re ready to embrace database evolution while safeguarding data integrity and application. Elevate your expertise and embark on a journey of seamless upgrades, ensuring a harmonious integration of technology and proficiency.

*References:

  • Upgrading the PostgreSQL DB engine for Amazon RDS
  • Using logical replication to replicate managed Amazon RDS for PostgreSQL and Amazon Aurora to self-managed PostgreSQL | Amazon Web Services



Achieving Seamless PostgreSQL Upgrades from 10 to 12 on AWS: Lessons from Tata 1mg was originally published in Tata 1mg Technology on Medium, where people are continuing the conversation by highlighting and responding to this story.

GUID
https://medium.com/p/64712fd962ea
Category Feed
rds
postgres
zero-downtime
seamless
database-upgrade
Blog Author
Pankaj Pandey
Feed Source