fortypoundhead.com

Speed Optimization Using Arrays

Posted On 2017-09-24 by VB6Boy
Keywords:
Tags: VB6 Files and I/O VB6 String Handling VB6 Miscellaneous Windows
Views: 228

Download Attachment


I was recently given a small programming challenge by a friend, which entailed splitting a text file into multiple text files, based on word length. I got run time down to 11 seconds. Can you do better?

The text file in question is a listing of dictionary words, A through Z, containing a total of 234,936 words. The goal is to have the program create a file containing words of each length in individual files. For example, all the words in the main list that are nine characters long would end up in a file called 9.txt.

Sounds pretty straightforward, no? My first swag at this didn't use any pre-processing at all, and took a whopping twelve minutes to run. The initial draft, which gave the correct output, followed this flow:

  1. Open the main file
  2. Read a line, determine length of word
  3. Create filename, open file for append, write to file, close file
  4. Repeat until done

The speed killer in this process is the constant opening/writing/closing of files. The program was basically strangling itself on disk I/O operations, which resulted in a run time of about 11 minutes.

I gave the output to my friend, and he was happy. However, I knew I could improve this program. By doing some basic preprocessing, and performing most operations in memory, run time for the same list is down to about 11 seconds, compiled. Withing the IDE, run time is about 17 seconds.

The flow of operations is basically the same, with some exceptions:

  1. Get a count of words in the file
  2. Read the words into an array
  3. Scan the array to find the longest word length
  4. For all words of length n, put into a string
  5. Write entire string to file at once
  6. Repeat until all lengths are complete

Wordthing

While there are more steps in this method, it is much faster due to the large reduction of small disk I/O operations. Rather than writing each individual string to each file, almost 235,000 times, this method only writes to disk 24 times.

Think of it this way:

For words of length 12, there 20,460 words. You could write to disk 20,460 times, or you could write to disk once. Which would be faster?

I've attached the project and word list to this post. I can see at least one way to speed this up a bit more, but I was interested to see if anyone else could think of a way to speed it up? Can we get this to run even faster? I look forward to seeing your solutions!

Concepts

There are a few concepts demonstrated in this program:

  • File input/output
  • Basic string processing
  • Dynamic arrays


About the Author

VB6Boy has posted a total of 22 articles.


Comments On This Post

No comments on this post yet!


Do you have a thought relating to this post? You can post your comment here. If you have an unrelated question, you can use the Q&A section to ask it.

Or you can drop a note to the administrators if you're not sure where you should post.


Your IP address is:107.20.115.174

Before you can post, you need to prove you are human. If you log in, this test goes away.




Quick Links