Over time, as the data in your production database grows, it will inevitably degrade the DB’s performance. When that happens, you know it’s time to get rid of some of that years old data. But of course, you can’t just delete it. You have to archive it first.
The traditional way to do this would be to export the old data using maybe
mysqldump or some other format available in MySQL Workbench. You’ll then delete the old data from your database. If you ever need to access the archived data, you’ll have to import it back into a database in order to query it.
But there’s a better way of doing this in AWS. The best part is that you don’t need to import the archive into a database to query it. You can query it in-place!
Let’s look at how this is done. We start with a demo DB with a very simple demo table, as shown below:
When it’s time to archive the data, create a snapshot of the database & export it to S3 as shown below:
After the export completes, find the exported Parquet file in S3 & test it by querying it as shown below:
There you go! Now that your data is safe in S3, you can delete the same from the database. If you ever need data from the exported archive, just use the S3 console or Athena to query it. No need to import it into a database.