Query Optimisation

While in a project which involved handling lots of data, we came across an interesting issue. After importing data into our database, we had to validate the data and mark the invalid ones.

 

This is what we did. 100 records were selected that weren't error flagged. Each column in a record was then checked for errors and a StringBuffer appended with the error description. When the end column was reached for that record, the record was flagged for error if an errors were found and its error description was written. This took approx 105 minutes for 10000 records which had many invalid data.

 

As our POC took shape of an application we concluded we couldn't use this approach as we would have to handle millions of records in most cases.

 

We then agreed that we would go column-wise rather than row-wise. Update query with if statements were used and record data were compared to the REGEXP approx 100 minutes for 0.4 million records.

 

We also kept a backup algorithm of validating data only if that row isn't already invalid. This narrowed down our search and took approx 15 minutes for 0.4 million records.

 

This whole experience made us learn quite a lot in optimizing query for processing data in large databases.

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Copy the characters (respecting upper/lower case) from the image.