« Surprise -- DST comes early this year | Main | Updates on recent posts »
January 15, 2007
Adjusting algorithms for dynamic languages
One of my college Computer Science classes introduced me to Sedgewick's Algorithms in C. Though after a decade I don't remember specifics of the class, I regularly find myself applying the principles I learned, evaluating the cost of each way a problem can be solved (and once in a while pulling the book from my reference library).
A Perl program I was working on recently seemed like a textbook case to apply Algorithms. After some testing I realized that though the fundamentals might be the same, Algorithms in Perl would be quite different from Algorithms in C.
The program required dropping the smallest value from a random list of numbers if it was more than 50% smaller than the next-smallest value, and similar for the largest value on the list.
Two possible algorithms came to mind. First would be to run through the list and find the two smallest and two largest values, then compare them and delete as necessary. Another option would be to sort the list, then look at the first two (smallest) and last two values (largest), and delete as necessary.
Sedgewick says sorting takes at least N log N time, while a sequential search takes only N / 2, so the first option seems the best. I was working with a short list (~200 values), so could easily be done in memory. (Of course, you might be wondering why I'm going through all of this when it's probably not noticeable which I use on a small list--don't spoil the fun). So I hacked up both implementations, and iterated each 100,000 times to amplify the difference:
Sequential search: 34.75 seconds
Sort: 7.85 seconds
Surprising, kind of (more in a moment). For comparison, I implemented the same thing using D, a successor to C/C++ (and with similar performance):
Sequential search: 0.566 seconds
Sort: 3.47 seconds
Sedgewick predicts that the sort method would be around 4.6x slower than sequential search, that's pretty close to the results in D. But Perl is almost the opposite (search 4.4x slower than sort).
Explanation? Not suprisingly, Perl implements sort in C (or maybe C++) using quicksort or mergesort. From the perlfunc manpage
Perl 5.6 and earlier used a quicksort algorithm to implement sort. That algorithm was not stable, and could go quadratic. (A stable sort preserves the input order of elements that compare equal. Although quicksort's run time is O(NlogN) when averaged over all arrays of length N, the time can be O(N**2), quadratic behavior, for some inputs.) In 5.7, the quicksort implementation was replaced with a stable mergesort algorithm whose worst case behavior is O(NlogN).
All of this leaves me wondering how the principles from my algorithm class apply to dynamic languages (where's Algorithms in Perl--maybe Sedgewick's book on Java would be a start). Or maybe algorithms and machine performance aren't such a big deal anymore, and I should worry more about cookbooks, patterns and developer performance.
Posted by pete at January 15, 2007 8:33 PM