Dumping from a database, revisited
Some time ago, a user named VB6Boy posted a handy little VBScript that dumps out records from a database table to a series of text files. The method works fine, but takes an exorbitant amount of time for large datasets. Can we do better?
Another user commented on the post mentioned above, stating that he (or she) was using the script as a basis for a modified version that wrote records to XML files. The script seemed to run fine, however it was taking over an hour to process the five thousand records, and write them out to XML files. The VBScript was pretty bad at handling large datasets.
So this weekend, I decided to see if I could get some better performance out of the process. Pretty much an exercise to keep my mind busy while I was doing laundry. Plus, I wanted to see if I could get things running better.
Before I get started on this, I've posted the test results, as well as the schema of the database table that I am working with, at the bottom of this post. Just in case you might want to give this a try. I would definitely be interested in hearing your results in the comments!
My first task was to get a baseline. So I set up a MySQL database on my own machine, with content from a decent sized table, about 229,304 records. Next, I brought the VBScript to my machine as well, and modified the connectivity and field names to match my database.
I fired off the script, and went off to start a load of laundry. Honestly, I thought the script would be done in about 15 minutes. Much to my chagrin, it was still churning. The script finally finished sometime during my movie viewing, with an elapsed time of one hour, twenty-six minutes. Woah.
PowerShell to the rescue?
So, I got to thinking, I can do better than this with PowerShell, right? I knew I wanted to build a couple different versions for PowerShell, with the first version being a straight conversion from the original VBScript. It's a pretty straightforward conversion, and it only took about ten minutes to convert it over. So I kicked off the script, and started another load of laundry.
Version 1 of the PowerShell script grabs the entirety 229,304 records into an array. Talk about a memory hog! It then runs through the array, writing each records out to an XML file as it goes. I thought for sure this would be faster than the VBScript.
As it turns out, it was actually more than twice as slow as the VBScript! The only thing I can think would cause this would be the sheer largeness of the dataset that is being read from the database, then processed line-by-line by the script. We can do better.
Another try with PowerShell
So on to version 2 for PowerShell. With this one, I wanted to be able to take smaller bites from the database, processing each of these smaller bites as they came in. So I modified version 1 to do just that. It would now read 500 records at a time, squirting each batch out to disk as they were read from the database. This has got to be faster!
Well, not so much. It was actually six minutes slower. How could this be? Alright, let's run it again, this time with ten lines per bite. Maybe we just can't handle larger chunks of data, so reading smaller chunks will speed us up?
Taking smaller bites
How about if I drop the size of the result set that I am working with? So this time, I dropped it down to ten records per query. Unfortunately, this only made the script run that much slower. I'm not sure where the sweet spot is for using PowerShell, but I certainly haven't found it in my limited testing. I think I'll leave off PowerShell for now, though. I've got one more thing that I'd like to test.
How about using a compiled language?
I have no shame in admitting that I still mess around with Visual Basic 6 from time to time. I do have some legacy code to maintain, which is my main reason for keeping it around. But I still crack it open when I want to make a quick and dirty proof of concept or to test an idea. With it, I can break out an app very quickly, for the GUI or for the command line. So why not give it a try on this exercise.
Admittedly, a compiled language is going to blow away an interpreted language. No question on that. But I wanted to see if my trusty old VB6 would have the chops to chew this data, quickly.
Since VBScript and VB6 are so closely related, it is very easy to port from one to the other. I grabbed the VBScript from the original test, plugged it in to VB6 with minor modifications such as referencing MS ADO 6.1, and click the Execute button.
Prepared for at least a few minutes of wait, I started to push away from the keyboard to go manage my laundry. However, the script stopped running, as if complete. What? I checked the code, and it looked fine, no errors had popped up. So I ran it again. Once more, it came back as done after a brief pause.
I checked for output, and lo, the XML awaited in the target directory! I quickly added some timing and display items to the script, and found it was running in five seconds. Yes, you read that right, five seconds. From within the Integrated Development Environment (IDE).
Amazing, but would it run even faster if compiled? I modified the project file to compile for fast code, remove floating point checks, etc. I also made sure all my variables were set as the right type, then I compiled it. Interestingly enough, it ran in the same length of time, about five seconds.
So why did I do this? Why is this important enough to publish to the world at large? First, as I mentioned earlier, I needed something to keep me busy on a Sunday.
This little exercise also reminds us that while one particular language or method might be your favorite at the moment, it may not necessarily be the best tool for the job at hand. It's good to keep that in mind.
|PowerShell, Straight Conversion||16:28:30||19:25:55||02:57:24|
|PowerShell, Mod 1 (500)||08:02:09||11:05:34||03:03:25|
|PowerShell, Mod 2 (10)||11:27:09||14:43:21||03:16:12|
|Visual Basic 6 (IDE)||10:08:23||10:08:28||00:00:05|
|Visual Basic 6 (Compiled)||10:16:54||10:16:59||00:00:05|
|Field Name||Data Type|
|Total Records in Database: 229,304|
About the Author
dwirch has posted a total of 172 articles.
Comments On This Post
No comments on this post yet!
Do you have a thought relating to this post? You can post your comment here. If you have an unrelated question, you can use the Q&A section to ask it.
Or you can drop a note to the administrators if you're not sure where you should post.