Search Tools Links Login

Dumping from a database, revisited


Some time ago, a user named VB6Boy posted a handy little VBScript that dumps out records from a database table to a series of text files. The method works fine, but takes an exorbitant amount of time for large datasets. Can we do better?

Another user commented on the post mentioned above, stating that he (or she) was using the script as a basis for a modified version that wrote records to XML files. The script seemed to run fine, however it was taking over an hour to process the five thousand records, and write them out to XML files. The VBScript was pretty bad at handling large datasets.

So this weekend, I decided to see if I could get some better performance out of the process. Pretty much an exercise to keep my mind busy while I was doing laundry. Plus, I wanted to see if I could get things running better.

Before I get started on this, I've posted the test results, as well as the schema of the database table that I am working with, at the bottom of this post. Just in case you might want to give this a try. I would definitely be interested in hearing your results in the comments!

The baseline

My first task was to get a baseline. So I set up a MySQL database on my own machine, with content from a decent sized table, about 229,304 records. Next, I brought the VBScript to my machine as well, and modified the connectivity and field names to match my database.

I fired off the script, and went off to start a load of laundry. Honestly, I thought the script would be done in about 15 minutes. Much to my chagrin, it was still churning. The script finally finished sometime during my movie viewing, with an elapsed time of one hour, twenty-six minutes. Woah.

PowerShell to the rescue?

So, I got to thinking, I can do better than this with PowerShell, right? I knew I wanted to build a couple different versions for PowerShell, with the first version being a straight conversion from the original VBScript. It's a pretty straightforward conversion, and it only took about ten minutes to convert it over. So I kicked off the script, and started another load of laundry.

Version 1 of the PowerShell script grabs the entirety 229,304 records into an array. Talk about a memory hog! It then runs through the array, writing each records out to an XML file as it goes. I thought for sure this would be faster than the VBScript.

As it turns out, it was actually more than twice as slow as the VBScript! The only thing I can think would cause this would be the sheer largeness of the dataset that is being read from the database, then processed line-by-line by the script. We can do better.

Another try with PowerShell

So on to version 2 for PowerShell. With this one, I wanted to be able to take smaller bites from the database, processing each of these smaller bites as they came in. So I modified version 1 to do just that. It would now read 500 records at a time, squirting each batch out to disk as they were read from the database. This has got to be faster!

Well, not so much. It was actually six minutes slower. How could this be? Alright, let's run it again, this time with ten lines per bite. Maybe we just can't handle larger chunks of data, so reading smaller chunks will speed us up?

Taking smaller bites

How about if I drop the size of the result set that I am working with? So this time, I dropped it down to ten records per query. Unfortunately, this only made the script run that much slower. I'm not sure where the sweet spot is for using PowerShell, but I certainly haven't found it in my limited testing. I think I'll leave off PowerShell for now, though. I've got one more thing that I'd like to test.

How about using a compiled language?

I have no shame in admitting that I still mess around with Visual Basic 6 from time to time. I do have some legacy code to maintain, which is my main reason for keeping it around. But I still crack it open when I want to make a quick and dirty proof of concept or to test an idea. With it, I can break out an app very quickly, for the GUI or for the command line. So why not give it a try on this exercise.

Admittedly, a compiled language is going to blow away an interpreted language. No question on that. But I wanted to see if my trusty old VB6 would have the chops to chew this data, quickly.

Since VBScript and VB6 are so closely related, it is very easy to port from one to the other. I grabbed the VBScript from the original test, plugged it in to VB6 with minor modifications such as referencing MS ADO 6.1, and click the Execute button.

VB6 Version DBtoXMLPrepared for at least a few minutes of wait, I started to push away from the keyboard to go manage my laundry. However, the script stopped running, as if complete. What? I checked the code, and it looked fine, no errors had popped up. So I ran it again. Once more, it came back as done after a brief pause.

I checked for output, and lo, the XML awaited in the target directory! I quickly added some timing and display items to the script, and found it was running in five seconds. Yes, you read that right, five seconds. From within the Integrated Development Environment (IDE).

Amazing, but would it run even faster if compiled? I modified the project file to compile for fast code, remove floating point checks, etc. I also made sure all my variables were set as the right type, then I compiled it. Interestingly enough, it ran in the same length of time, about five seconds.

Summary

So why did I do this? Why is this important enough to publish to the world at large? First, as I mentioned earlier, I needed something to keep me busy on a Sunday.

This little exercise also reminds us that while one particular language or method might be your favorite at the moment, it may not necessarily be the best tool for the job at hand. It's good to keep that in mind.

Test Results

Language/Method Start End Elapsed
VBScript 13:15:00 14:41:00 01:26:00
PowerShell, Straight Conversion 16:28:30 19:25:55 02:57:24
PowerShell, Mod 1 (500) 08:02:09 11:05:34 03:03:25
PowerShell, Mod 2 (10) 11:27:09 14:43:21 03:16:12
Visual Basic 6 (IDE) 10:08:23 10:08:28 00:00:05
Visual Basic 6 (Compiled) 10:16:54 10:16:59 00:00:05

 

Table Schema

Field Name Data Type
EntryID int(10)
VisitDateTime DateTime
IPAddress ShortText
Agent LongText
Total Records in Database: 229,304

About this post

Posted: 2017-03-19
By: dwirch
Viewed: 1,870 times

Categories

Visual Basic Script (VBS)

Powershell

PowerShell Code Cache

Visual Basic 6

Attachments

No attachments for this post


Loading Comments ...

Comments

No comments have been added for this post.

You must be logged in to make a comment.