Jump to content
Macro Express Forums
acantor

Challenge: Script to extract a postal code from an address

Recommended Posts

A fun Macro Express project: I needed a way to extract Canadian postal codes from the full address in which they appear.

 

Canadian postal codes consist of six alternating letters and digits, plus a space in the fourth position. (L = Letter, D = Digit):


LDL DLD

 

Examples:

 

M4A 1V1

K7L 2D3

V6B 3A2

 

The postal code might appear anywhere in the address, which is always a single line:

 

"25 Clark Street, 1st Floor, Toronto, ON, M8C 1X9, Canada"

"111 rue de Chaplain, Montreal PQ H2P 7W9"

"J2E 6G2 (main entrance)"

 

After many happy hours of tinkering, I came up with a solution. But I am curious how others might approach this problem using ONLY Macro Express.

 

If you are interested in this problem, start with the following line and see how far you get!

 

Variable Set String %Location% to "1234 Chaplin Dr. Twr. 1- 6th floor , Mississauga, ON L4Q 5Z3 Canada"

 

If anyone takes up the challenge and posts a possible solution, I'll respond with my solution.

Share this post


Link to post
Share on other sites

Hi Alan,

 

I'm sorely tempted, but that would mean raising the task several places up my To Do list!

 

Another excuse is that because MXPro sadly lacks Regular Expression features I would always tackle this sort of requirement with assistance from another application, usually my text editor, TextPad. Your challenge is akin to asking me to attach a screw without using a screwdriver.

 

And yet another reason is that I expect you already have a brilliant solution 😉

 

But if curiosity does get the better of me, then I think I might start as follows:

 

// @acantor's challenge to extract a single post code from a line of text.
// No external applications are allowed.
// Assumption: a reasonable max length of a single address line like the examples provided is 100 characters.
Variable Set String %tResult% to "" // Just in case.
Variable Set String %tInput%: Prompt // (Disabled for testing)
Variable Set String %tInput% to "1234 Chaplin Dr. Twr. 1- 6th floor , Mississauga, ON L4Q 5Z3 Canada"
Variable Set Integer %nLength% to the length of variable %tInput%
// A line of that length contains (nLength-6) groups of 7 characters.
// So my first approach will be to test every group until I find one with the post code format LDL DLD.
// The parsing of that should then be straightforward.

 

 

Terry

Share this post


Link to post
Share on other sites



And yet another reason is that I expect you already have a brilliant solution

 

Thank you for the compliment, Terry! I have a solution, which I will be happy to share, but I wouldn't go so far as to say that it's brilliant! But it does work. I doubt I have found the most elegant solution, so I'm curious to know how other experienced Macro Express users would tackle a problem that has more than one solution.


Your sketched out solution overlaps with mine to an astonishing degree. I hope you'll give it a whirl. It took me many hours to get to where you already are.

 

I enjoy the challenge of working within Macro Express's constraints, which includes no Regular Expressions, and no built-in tests for digits or alphanumeric characters. These would be nice features to have when tackling a problem like this, but they are not available. So I had to find other ways. Due to the pandemic, my work has slowed down, so I'm using some of my extra time to dig deeper into Macro Express.

Share this post


Link to post
Share on other sites

Hi Terry,

 

I'll add (as a hint!) that my approach has much in common with the one your proposed, but is also different.

 

I'm unsure which approach will yield the most satisfactory (and satisfying) result. Probably either approach is fine.

 

And there are probably other approaches. I tried (and subsequently abandoned) at least two methods before I found one that I could make work.

Share this post


Link to post
Share on other sites

I'm just guessing but more than likely you utilized the Split String command to create an array to work through.  As fun as that sounds, I used the External Script command instead.  Not what you are looking for but it is indeed a Macro Express command.

 

<VARIABLE SET STRING Option="\x00" Destination="%location%" Value="1234 Chaplin Dr. Twr. 1- 6th floor , Mississauga, ON L4Q 5Z3 Canada" NoEmbeddedVars="FALSE"/>
<VARIABLE SET STRING Option="\x00" Destination="%pattern%" Value="\\b[ABCEGHJKLMNPRSTVXY][0-9][A-Z] [0-9][A-Z][0-9]\\b" NoEmbeddedVars="FALSE"/>
<EXTERNAL SCRIPT Language="VBScript" Dest="%console%" Script="Dim regEx, matches\r\nSet regEx = New RegExp\r\nregEx.Pattern = \"%pattern%\"\r\nregEx.Global = False\r\nSet matches = regEx.Execute( \"%location%\" )\r\nFor Each match In matches\r\n   WScript.StdOut.Write match.Value\r\nNext" Encoding="0"/>
<MESSAGEBOX Caption="Result" Message="%console%" Icon="0"/>

 

Also, the regular expression pattern I placed in the code is good, but not perfect ... c'est la vie

 

Share this post


Link to post
Share on other sites

This solution builds the LDL DLD pattern by substituting characters:
Any letter becomes @
Any number becomes #
Any space becomes $
Then it searches for the pattern @#@$#@# within the substituted text, and extracts the corresponding positions from the original text.  The solution is a little crude because it doesn't allow for multiple pattern matches within the address -- probably I would add code to take the last match in that case.  

 

Joe's solution using VBSCRIPT is certainly more elegant, but kind of breaks the ME-only rule.😧

 

//  
// Extract Canadian postal code
Variable Set String %T1% "1234 Chaplin Dr. Twr. 1- 6th floor , Mississauga, ON L4Q 5Z3 Canada"
Variable Set String %T1% "25 Clark Street, 1st Floor, Toronto, ON, M8C 1X9, Canada"
Variable Set String %T1% "111 rue de Chaplain, Montreal PQ H2P 7W9"
Variable Set String %T1% "J2E 6G2 (main entrance)"
// Move address to working variable T2, clear destination variable T3
Variable Set String %T3% ""
Variable Modify String: Copy %T1% to %T2%
Variable Modify String: Trim %T2%
// Get rid of any special characters @, #, and $ that may already exist
Replace "@" with "" in %T2%
Replace "#" with "" in %T2%
Replace "$" with "" in %T2%
Text Box Display: Debug special characters removed
// Replace letters with @, numbers with #, blank spaces with $, building result in T3
Variable Set Integer %N1% to 0
Variable Set Integer %N2% from Length of Variable %T2%
Repeat Start (Repeat %N2% times)
  Variable Modify Integer: Inc (%N1%)
  Variable Modify String: Copy Part of %T2% to %T9%
  If Variable %T9% >= "A"
    AND
  If Variable %T9% <= "Z"
    Variable Set String %T9% "@"
  End If
  If Variable %T9% >= "a"
    AND
  If Variable %T9% <= "z"
    Variable Set String %T9% "@"
  End If
  If Variable %T9% >= "0"
    AND
  If Variable %T9% <= "9"
    Variable Set String %T9% "#"
  End If
  If Variable %T9% = " "
    Variable Set String %T9% "$"
  End If
  Variable Modify String: Append %T9% to %T3%
Repeat End
Text Box Display: Debug letters/numbers replaced with special characters
// Postal code LDL DLD will have been replaced by @#@$#@#, so find position of that string within the address in variable T3
Variable Set Integer %N9% from Position of Text in Variable %T3%
// Extract corresponding characters from original [trimmed] address in T2
Variable Modify String: Copy Part of %T2% to %T12%
Text Box Display: Debug original address and extracted postal code
//  
//  
Macro Return
// 

Share this post


Link to post
Share on other sites

Well rberq ... your idea of converting many possible characters to a single-known character is a good concept, indeed.

Share this post


Link to post
Share on other sites

@rberg: Neat! Could you post the code so I can explore it further please, as I haven't grasped how it handles an address containing special chars.

 

 

 

 

Share this post


Link to post
Share on other sites

Thank you everybody for your contributions!

 

Like rberq, I used the substitution method. But I came to the approach relatively late. My initial attempt was similar to what Terry suggested, but I found the coding too complicated. I'm curious whether someone else might have better success with the group of characters approach.

 

rberq, nice touch to delete the three tokens you used before parsing the address. I used "L" (for letters) "D" (for digits) and "X" for anything else. I hope you are OK with me borrowing your idea!

 

Joe, your stretch-the-rules approach is probably as good as it's going to get. I'm going to study your solution. I've not been able to make sense of MEP's external scripting feature, but you've given me a foothold to understanding.

 

Here is my solution:

 

// Extract a Canadian postal code from an address.
// For example: "123 Main, M4Z 1V4, Canada" --> "M4Z 2V4"
 
// Assume postal code is always in this format: "LNL NLN"
// It's 7 characters long with a space in position 4. "L" is a letter and "N" is a digit.
 
// How this works:
// Parse the address one character at a time to build a variable %Result%
// If the character is an upper-case letter, append an "L" to %Result%
// If the character is a digit, append a "D" to %Result%
// If the character is anything else, append "X"
// Search %Result% for this 7-digit string: "LNLXNLN"
// Its starting position within the address is the postal code.
Variable Set String %Address% to "68 Bob Dr. Twr. 2, Mississauga, ON L5N 7Y5" // Leave only one of the following line uncommented
Variable Set String %Address% to "M4K 1Z1 (Main building)"
Variable Set String %Address% to "123 rue Charlevoix, Sherbrooke, H5N 7Y5 Quebec (PQ)"
Variable Set String %Address% to "123 Moncton, Halifax, NB (B1N 4A5)"
 
Variable Set Integer %Chars% to the length of variable %Address%
Variable Set String %Result% to ""
Variable Set Integer %Count% to 1
 
Repeat with Variable: Repeat %Chars% times
  Variable Modify String: Copy part of text in %Address% starting at %Count% and 1 characters long to %Item%
   
  If Variable %Item% Is Greater Than or Equal To "A"
    AND
  If Variable %Item% Is Less Than or Equal To "Z"
    Variable Modify String %Result%: Append Text (A)
  Else
    If Variable %Item% Is Greater Than or Equal To "0"
      AND
    If Variable %Item% Is Less Than or Equal To "9"
      Variable Modify String %Result%: Append Text (D)
    Else
      Variable Modify String %Result%: Append Text (X)
    End If
  End If
  Variable Modify Integer %Count%: Increment
End Repeat
 
Variable Set Integer %PostalCodeStart% to the position of "ADAXDAD" in %Result%
Variable Modify String: Copy part of text in %Address% starting at %PostalCodeStart% and 7 characters long to %PostalCode%
 
Text Box Display: %PostalCode%
<COMMENT Value="Extract a Canadian postal code from an address."/>
<COMMENT Value="For example: \"123 Main, M4Z 1V4, Canada\" --> \"M4Z 2V4\""/>
<COMMENT/>
<COMMENT Value="Assume postal code is always in this format: \"LNL NLN\""/>
<COMMENT Value="It's 7 characters long with a space in position 4. \"L\" is a letter and \"N\" is a digit."/>
<COMMENT/>
<COMMENT Value="How this works:"/>
<COMMENT Value="Parse the address one character at a time to build a variable %Result%"/>
<COMMENT Value="If the character is an upper-case letter, append an \"L\" to %Result%"/>
<COMMENT Value="If the character is a digit, append a \"D\" to %Result%"/>
<COMMENT Value="If the character is anything else, append \"X\""/>
<COMMENT Value="Search %Result% for this 7-digit string: \"LNLXNLN\""/>
<COMMENT Value="Its starting position within the address is the postal code."/>
<VARIABLE SET STRING Option="\x00" Destination="%Address%" Value="6880 Financial Dr. Twr. 2- 6th floor , Mississauga, ON L5N 7Y5" NoEmbeddedVars="FALSE" _ENABLED="FALSE" _COMMENT="Leave only one of the following line uncommented"/>
<VARIABLE SET STRING Option="\x00" Destination="%Address%" Value="M4K 1V1 (Main building)" NoEmbeddedVars="FALSE" _ENABLED="FALSE"/>
<VARIABLE SET STRING Option="\x00" Destination="%Address%" Value="123 rue Charlevoix, Sherbrooke, H5N 7Y5 Quebec (PQ)" NoEmbeddedVars="FALSE"/>
<VARIABLE SET STRING Option="\x00" Destination="%Address%" Value="123 Moncton, Halifax, NB (B1N 4A5)" NoEmbeddedVars="FALSE" _ENABLED="FALSE"/>
<COMMENT/>
<VARIABLE SET INTEGER Option="\x0D" Destination="%Chars%" Text_Variable="%Address%"/>
<VARIABLE SET STRING Option="\x00" Destination="%Result%" NoEmbeddedVars="FALSE"/>
<VARIABLE SET INTEGER Option="\x00" Destination="%Count%" Value="1"/>
<COMMENT/>
<REPEAT WITH VARIABLE Variable="%Chars%" UseCounter="FALSE"/>
<VARIABLE MODIFY STRING Option="\x09" Destination="%Item%" Variable="%Address%" Start="%Count%" Count="1" NoEmbeddedVars="FALSE"/>
<COMMENT/>
<IF VARIABLE Variable="%Item%" Condition="\x04" Value="A" IgnoreCase="FALSE"/>
<AND/>
<IF VARIABLE Variable="%Item%" Condition="\x05" Value="Z" IgnoreCase="FALSE"/>
<VARIABLE MODIFY STRING Option="\x06" Destination="%Result%" Value="A" NoEmbeddedVars="FALSE"/>
<ELSE/>
<IF VARIABLE Variable="%Item%" Condition="\x04" Value="0" IgnoreCase="FALSE"/>
<AND/>
<IF VARIABLE Variable="%Item%" Condition="\x05" Value="9" IgnoreCase="FALSE"/>
<VARIABLE MODIFY STRING Option="\x06" Destination="%Result%" Value="D" NoEmbeddedVars="FALSE"/>
<ELSE/>
<VARIABLE MODIFY STRING Option="\x06" Destination="%Result%" Value="X" NoEmbeddedVars="FALSE"/>
<END IF/>
<END IF/>
<VARIABLE MODIFY INTEGER Option="\x07" Destination="%Count%"/>
<END REPEAT/>
<COMMENT/>
<VARIABLE SET INTEGER Option="\x0E" Destination="%PostalCodeStart%" Text_Variable="%Result%" Text="ADAXDAD" Ignore_Case="FALSE"/>
<VARIABLE MODIFY STRING Option="\x09" Destination="%PostalCode%" Variable="%Address%" Start="%PostalCodeStart%" Count="7" NoEmbeddedVars="FALSE"/>
<COMMENT/>
<TEXT BOX DISPLAY Title="%PostalCode%" Content="{\\rtf1\\ansi\\ansicpg1252\\deff0\\deflang1033{\\fonttbl{\\f0\\fnil Tahoma;}}\r\n\\viewkind4\\uc1\\pard\\f0\\fs14 \r\n\\par }\r\n" Left="Center" Top="Center" Width="278" Height="200" Monitor="0" OnTop="TRUE" Keep_Focus="TRUE" Mode="\x00" Delay="0"/>
	

 

Share this post


Link to post
Share on other sites
5 hours ago, joe said:

I'm just guessing but more than likely you utilized the Split String command to create an array to work through.

 

I attempted it, but unfortunately I couldn't get it to work. Did anyone try to crack the puzzle via Split String?

Share this post


Link to post
Share on other sites

As for my intended approach, getting the 7-character groups into an array proved easy enough. But my assumption that the subsequent manipulation of these would be easy was over optimistic!

 

Your solution, Alan, incorporating @rberg’s substitution logic looks great to me, well done, and thanks for an interesting thread.

(Trivial point: it seems you switched to using A instead of L for alphabetic characters,  so your introductory comment needs editing.)

 

But wish I had learnt VBS 😉

Share this post


Link to post
Share on other sites

acantor -

 

Here is a generic explanation for the regex (regular expression) pattern.  What regex engines do is to find portions of strings based on a pattern.  It processes the string passed to it one character at a time moving from left to right.  In this case it returns only the first thing found and is case-insensitive.

 

\b[ABCEGHJKLMNPRSTVXY][0-9][A-Z] [0-9][A-Z][0-9]\b

Assert position at a word boundary (preceded or followed—but not both—by an ASCII letter, digit, or underscore) «\b»
Match a single character from the list "ABCEGHJKLMNPRSTVXY" «[ABCEGHJKLMNPRSTVXY]»
Match a single character in the range between "0" and "9" «[0-9]»
Match a single character in the range between "A" and "Z" «[A-Z]»
Match the space character " " literally « »
Match a single character in the range between "0" and "9" «[0-9]»
Match a single character in the range between "A" and "Z" «[A-Z]»
Match a single character in the range between "0" and "9" «[0-9]»

Assert position at a word boundary (preceded or followed—but not both—by an ASCII letter, digit, or underscore) «\b»

 

rberq did something similar to this by logically changing the characters in the string to be known characters which gave him the ability to find a pattern.

 

Most, if not all, programming and scripting languages provide a regex engine.  Macro Express does not, even though it could, because Delphi itself has one (TRegEx).  But in my opinion, the implementation of it would be beyond the general meaning and purpose of Macro Express.  That being said, Macro Express provides both the Program Launch and External Script commands which allow us to use features of other programming languages and return results back to Macro Express.

Share this post


Link to post
Share on other sites

Joe, if you are not a Canadian, you deserve to be declared an honourary citizen for recognizing that the first character of Canadian postal codes is a subset of the letters from A to Z:

 

A, B, C, E, G, H, J, K, L, M, N, P, R, S, T, V, X, and Y

 

This is easy to detect with Regedit, and more challenging via the substitution method.

 

As I've tested with real-life address data, I've had to make adaptations to my script. For example, I've seen these "errors" in my data:

 

1. The postal code is in lowercase instead of uppercase: "b2c d3e"
2. The space between the two groups is missing:"B2CD3E"
3. There are two or more spaces between the two groups: "B2C  D3E"
4. There is a tab character between the two groups.
5. There is a non-breaking space between the two groups, ASCII(160)
6. There is a hyphen between the two groups: "B2C-D3E"
7. The address contains % characters, which causes mischief for Macro Express. (Perhaps because MEP treats percentage signs as delimiters in variable names?)

 

None of these problems are insurmountable using the substitution method, but handling irregular postal codes adds complexity to the script. Because most these errors are rare, I'm only acting on the most easily handled issues.

 

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

×
×
  • Create New...