In this article, I’ll tell you how we migrated our databases from our previous provider to AWS RDS.
Our previous DB infrastructure was already in the cloud and the entire db was sharded, so we had about 350 GB of data split in about 1000 databases. Want to know more about how we achieved database sharding? Read here!
So, how did we migrate to AWS?
We could stop our apps for about an hour (I know, we were very fortunate!) and so we designed the migration process to fit in this interval. We first designed the new infrastructure entirely as code using Ansible as automation tool. Hence, running some Ansible scripts, we created the entire new infrastructure (web, db, services, DNS, …) on AWS (we gave a speech at AWS Rome Meetup about it!).
Then, to move all the relational data, we had two possibilities: using DMS or make a custom Ansible playbook. We had about 15/20 minutes for moving data so, despite the fact DMS is a great solution, we choose to stick with custom migration scripts, being simpler than setting up DMS for our case!
Thanks for having everything as scripts, we could simulate the entire migration many times and so we were quite confident about the entire process.
Here is the main playbook for moving data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
vars: mysql_user: root user_destination: ubuntu remote_key_ssh_publ: /home/{{ user_destination }}/.ssh/authorized_keys hostname_dest_aws: nuvola-{{ nuvola_env }}-database-{{ ansible_hostname[2:4] }}.ops.madisoft.it vars_files: - vars/vars_dbserver_{{ provider }}.yml tasks: - name: MIGRATE ALL DB SERVER | Backup remote public ssh on remote server command: cp ~/.ssh/authorized_keys ~/.ssh/authorized_keys_orig delegate_to: "{{ hostname_dest_aws }}" - name: MIGRATE ALL DB SERVER | Copy ssh public key on remote server authorized_key: user={{ user_destination }} key="{{ lookup('file', \ 'roles/mysql/files/id_rsa_public_dbserver') }}" delegate_to: "{{ hostname_dest_aws }}" - name: MIGRATE ALL DB SERVER | Fetch all databases on the source server shell: echo "SHOW databases "|mysql -s|tr "\n" " " become: true become_user: root register: dblist - name: MIGRATE ALL DB SERVER | Copy raw dump from source to destination server shell: mysqldump --databases {{ dblist.stdout }} | \ ssh -p{{ ssh_port }} -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no \ {{ user_destination }}@{{ hostname_dest_aws }} "cat |mysql -s" become: true become_user: root - name: MIGRATE ALL DB SERVER | Restore remote public ssh on {{ hostname_dest_aws }} command: mv ~/.ssh/authorized_keys_orig ~/.ssh/authorized_keys delegate_to: "{{ hostname_dest_aws }}" |
Thanks to ansible_hostname we could declared the new hostname:
1 |
hostname_dest_aws: nuvola-{{ nuvola_env }}-database-{{ ansible_hostname[2:4] }}.ops.madisoft.it |
Then we read the databases list on the source server, made a dump and restored. Please note these three steps were done using “pipe”.
1 2 3 |
shell: mysqldump -u{{ mysql_user }} -p{{ mysql.root_password }} --databases {{ dblist.stdout }} | \ ssh -p{{ ssh_port }} -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no \ {{ user_destination }}@{{ hostname_dest_aws }} "cat |mysql -s" |
This is an example of the the inventory file:
1 2 3 |
[dbserver] db10.nuvola.madisoft.it ansible_host=0.0.0.0 private_ip_address=10.0.0.100 db11.nuvola.madisoft.it ansible_host=0.0.0.0 private_ip_address=10.0.0.101 |
Finally we created a bash script to run the playbook:
1 2 3 4 5 6 7 8 9 10 |
#!/bin/bash LIMIT='dbserver' NUVOLA_ENV='prod' PROVIDER='blabla' ansible-playbook -l $LIMIT \ ansible/mysql_migrate_dbserver.yml -e "provider=$PROVIDER nuvola_env=$NUVOLA_ENV" \ --vault-password-file ./secrets/provision_nuvola_dbserver.secret \ $TAGS_OPTION |
We also prepared a python script to compare migrated databases:
1 2 3 4 5 6 7 8 9 10 11 12 |
MESSAGE="Want to check databases (y/n) ? " . libs/confirm_action.sh LIST_DB_SERVER=$(ansible -m debug -a "var=groups['$LIMIT']" localhost|grep -Ev "localhost|group|]|}"|tr \ "\"" " "|tr "," " ") COUNT=0; for DBS in $LIST_DB_SERVER do ./tools/db/mysql_dbserver_compare.py -s $DBS -u root -d nuvola-$NUVOLA_ENV-database-$COUNT.ops.madisoft.it \ -U ubuntu > /tmp/$DBS.log & ((COUNT++)) done |
Here is an extract: for each server we compared the number of databases, the number of tables per database and the number of rows for each table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
num_databases_src=src("mysql -N --silent -e \ \"SELECT COUNT(*) FROM information_schema.SCHEMATA WHERE \ SCHEMA_NAME LIKE 'nuvola%';\"") num_databases_dest=dest("mysql -N --silent -e \ \"SELECT COUNT(*) FROM information_schema.SCHEMATA WHERE \ SCHEMA_NAME LIKE 'nuvola%';\"") num_tables_src=src("mysql %s -N --silent -e \ \"SELECT COUNT(*) FROM information_schema.TABLES WHERE \ TABLE_SCHEMA='%s';\"" % (dbname, dbname)) num_tables_dest=dest("mysql %s -N --silent -e \ \"SELECT COUNT(*) FROM information_schema.TABLES WHERE \ TABLE_SCHEMA='%s';\"" % (dbname, dbname)) tables_rows_dest = dest("echo \"SELECT CONCAT('SELECT COUNT(*) FROM \ ', table_name,';') FROM information_schema.tables WHERE table_schema \ = '%s';\"|mysql -N --silent %s |mysql -N \ --silent %s|jq -s add " % (dbname, dbname, dbname)) tables_rows_src = src("echo \"SELECT CONCAT('SELECT COUNT(*) FROM \ ', table_name,';') FROM information_schema.tables WHERE table_schema \ = '%s';\"|mysql -N --silent %s |mysql -N \ --silent %s|jq -s add " % (dbname, dbname, dbname)) |
We migrated about 350GB of data in 20 minutes, using Ansible parallelism (20 databases in parallel).
In the next article we’ll show you another chapter of our migration to AWS.
Stay tuned!