Query Optimisation
While in a project which involved handling lots of data, we came across an interesting issue. After importing data into our database, we had to validate the data and mark the invalid ones.
This is what we did. 100 records were selected that weren't error flagged. Each column in a record was then checked for errors and a StringBuffer appended with the error description. When the end column was reached for that record, the record was flagged for error if an errors were found and its error description was written. This took approx 105 minutes for 10000 records which had many invalid data.
As our POC took shape of an application we concluded we couldn't use this approach as we would have to handle millions of records in most cases.
We then agreed that we would go column-wise rather than row-wise. Update query with if statements were used and record data were compared to the REGEXP approx 100 minutes for 0.4 million records.
We also kept a backup algorithm of validating data only if that row isn't already invalid. This narrowed down our search and took approx 15 minutes for 0.4 million records.
This whole experience made us learn quite a lot in optimizing query for processing data in large databases.


Comments
Post new comment