Jump to content
Macro Express Forums

Almost Sorting

Les Hazlett

Recommended Posts

The PGMACROS.COM forum is a wonderful resource for learning to use Macro Express. We’ve gained a lot from the ideas and advice found there. We always wanted to give back something of value. Our new LineGroup macro works so well that we want to share it with other ME users.



At our company, we use ME to process text and database files. We needed a way to sort the records in these files. Our requirement is “almost sorting” - not really a sort. We need to group related records together. The order of the records within a group is not important. The order of the groups is not important. We call it “grouping”.



The lines are records and the fields in the records are delimited with Tabs. The first field in each line is the “key” field. Our LineGroup macro re-orders the lines so as to group records having the same key field.



The algorithm for this macro is “elegantly simple”. It mostly consists of two Switch/Case blocks, the second one inside of the Default Case of the first. As designed, it collects 30 groups per pass through the unsorted lines. The key fields are held in “Key” variables T21–T50. The grouped lines are held in “Line” variables T51-T80. The macro reruns itself recursively to collect all the groups, 30 in each pass.



For many ME users, the script may be easier to understand than the following explanation: -- In the first Switch/Case block lines are moved to the appropriate “Line” variable if their key is held in one of the “Key” variables. If not, control passes to the Default Case. In the second Switch/Case block a blank “Key” variable is found, assigned the key of the current line, and the line is put into the associated “Line” variable. When all the “Key” variables are in use, the current line is held in an excess buffer to wait for a later recursion of the macro. Lines with duplicate keys are added to the “Line” variables by the first Switch/Case block. Because blank keys are common, we made a special case for records with blank keys. They are grouped first before any groups with keys.



To test the algorithm, we made the Switch/Case blocks only 5 groups wide (T21-T25 for “Key” variables and T51-T55 for “Line” variables). The execution speed was so fast that we have never bothered to make it any wider. As is, it groups 1,659 lines of ~ 600 characters per line in 10 seconds. There were 44 different “Key” fields in the test file, requiring nine recursions. In comparison, it takes ME 35 seconds to parse the original line format before grouping and 14 seconds to format the lines after grouping.



The LineGroup.mxe file is attached. To demonstrate how it works, the ShowLineGroup.mxe file will be attached to the first reply to this message. It should be obvious how it provides the data and sets up the global variables needed by the LineGroup macro. The data in the ShowLineGroup macro is just for demonstration. It is not the “live” data used for performance testing. Both macros have lots of comments to help you understand how they work. Just run the ShowLineGroup macro to see both the un-grouped source data and the grouped results shown side by side.


We hope that someone out there benefits from this very fast “Almost Sort”, we certainly have. We would also like to see the concept put to other uses. We welcome your comments and opinions. If, I am slow in responding to any replies, it is because I leave next week for a 3 week vacation. B)


Happy Grouping


Les Hazlett

Automated Mailing Services

Fargo, ND


Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Create New...