Jump to content
Macro Express Forums

Challenge: Script to extract a postal code from an address


Recommended Posts

A fun Macro Express project: I needed a way to extract Canadian postal codes from the full address in which they appear.

 

Canadian postal codes consist of six alternating letters and digits, plus a space in the fourth position. (L = Letter, D = Digit):


LDL DLD

 

Examples:

 

M4A 1V1

K7L 2D3

V6B 3A2

 

The postal code might appear anywhere in the address, which is always a single line:

 

"25 Clark Street, 1st Floor, Toronto, ON, M8C 1X9, Canada"

"111 rue de Chaplain, Montreal PQ H2P 7W9"

"J2E 6G2 (main entrance)"

 

After many happy hours of tinkering, I came up with a solution. But I am curious how others might approach this problem using ONLY Macro Express.

 

If you are interested in this problem, start with the following line and see how far you get!

 

Variable Set String %Location% to "1234 Chaplin Dr. Twr. 1- 6th floor , Mississauga, ON L4Q 5Z3 Canada"

 

If anyone takes up the challenge and posts a possible solution, I'll respond with my solution.

Link to comment
Share on other sites

Hi Alan,

 

I'm sorely tempted, but that would mean raising the task several places up my To Do list!

 

Another excuse is that because MXPro sadly lacks Regular Expression features I would always tackle this sort of requirement with assistance from another application, usually my text editor, TextPad. Your challenge is akin to asking me to attach a screw without using a screwdriver.

 

And yet another reason is that I expect you already have a brilliant solution 😉

 

But if curiosity does get the better of me, then I think I might start as follows:

 

// @acantor's challenge to extract a single post code from a line of text.
// No external applications are allowed.
// Assumption: a reasonable max length of a single address line like the examples provided is 100 characters.
Variable Set String %tResult% to "" // Just in case.
Variable Set String %tInput%: Prompt // (Disabled for testing)
Variable Set String %tInput% to "1234 Chaplin Dr. Twr. 1- 6th floor , Mississauga, ON L4Q 5Z3 Canada"
Variable Set Integer %nLength% to the length of variable %tInput%
// A line of that length contains (nLength-6) groups of 7 characters.
// So my first approach will be to test every group until I find one with the post code format LDL DLD.
// The parsing of that should then be straightforward.

 

 

Terry

Link to comment
Share on other sites



And yet another reason is that I expect you already have a brilliant solution

 

Thank you for the compliment, Terry! I have a solution, which I will be happy to share, but I wouldn't go so far as to say that it's brilliant! But it does work. I doubt I have found the most elegant solution, so I'm curious to know how other experienced Macro Express users would tackle a problem that has more than one solution.


Your sketched out solution overlaps with mine to an astonishing degree. I hope you'll give it a whirl. It took me many hours to get to where you already are.

 

I enjoy the challenge of working within Macro Express's constraints, which includes no Regular Expressions, and no built-in tests for digits or alphanumeric characters. These would be nice features to have when tackling a problem like this, but they are not available. So I had to find other ways. Due to the pandemic, my work has slowed down, so I'm using some of my extra time to dig deeper into Macro Express.

Link to comment
Share on other sites

Hi Terry,

 

I'll add (as a hint!) that my approach has much in common with the one your proposed, but is also different.

 

I'm unsure which approach will yield the most satisfactory (and satisfying) result. Probably either approach is fine.

 

And there are probably other approaches. I tried (and subsequently abandoned) at least two methods before I found one that I could make work.

Link to comment
Share on other sites

I'm just guessing but more than likely you utilized the Split String command to create an array to work through.  As fun as that sounds, I used the External Script command instead.  Not what you are looking for but it is indeed a Macro Express command.

 

<VARIABLE SET STRING Option="\x00" Destination="%location%" Value="1234 Chaplin Dr. Twr. 1- 6th floor , Mississauga, ON L4Q 5Z3 Canada" NoEmbeddedVars="FALSE"/>
<VARIABLE SET STRING Option="\x00" Destination="%pattern%" Value="\\b[ABCEGHJKLMNPRSTVXY][0-9][A-Z] [0-9][A-Z][0-9]\\b" NoEmbeddedVars="FALSE"/>
<EXTERNAL SCRIPT Language="VBScript" Dest="%console%" Script="Dim regEx, matches\r\nSet regEx = New RegExp\r\nregEx.Pattern = \"%pattern%\"\r\nregEx.Global = False\r\nSet matches = regEx.Execute( \"%location%\" )\r\nFor Each match In matches\r\n   WScript.StdOut.Write match.Value\r\nNext" Encoding="0"/>
<MESSAGEBOX Caption="Result" Message="%console%" Icon="0"/>

 

Also, the regular expression pattern I placed in the code is good, but not perfect ... c'est la vie

 

Link to comment
Share on other sites

This solution builds the LDL DLD pattern by substituting characters:
Any letter becomes @
Any number becomes #
Any space becomes $
Then it searches for the pattern @#@$#@# within the substituted text, and extracts the corresponding positions from the original text.  The solution is a little crude because it doesn't allow for multiple pattern matches within the address -- probably I would add code to take the last match in that case.  

 

Joe's solution using VBSCRIPT is certainly more elegant, but kind of breaks the ME-only rule.😧

 

//  
// Extract Canadian postal code
Variable Set String %T1% "1234 Chaplin Dr. Twr. 1- 6th floor , Mississauga, ON L4Q 5Z3 Canada"
Variable Set String %T1% "25 Clark Street, 1st Floor, Toronto, ON, M8C 1X9, Canada"
Variable Set String %T1% "111 rue de Chaplain, Montreal PQ H2P 7W9"
Variable Set String %T1% "J2E 6G2 (main entrance)"
// Move address to working variable T2, clear destination variable T3
Variable Set String %T3% ""
Variable Modify String: Copy %T1% to %T2%
Variable Modify String: Trim %T2%
// Get rid of any special characters @, #, and $ that may already exist
Replace "@" with "" in %T2%
Replace "#" with "" in %T2%
Replace "$" with "" in %T2%
Text Box Display: Debug special characters removed
// Replace letters with @, numbers with #, blank spaces with $, building result in T3
Variable Set Integer %N1% to 0
Variable Set Integer %N2% from Length of Variable %T2%
Repeat Start (Repeat %N2% times)
  Variable Modify Integer: Inc (%N1%)
  Variable Modify String: Copy Part of %T2% to %T9%
  If Variable %T9% >= "A"
    AND
  If Variable %T9% <= "Z"
    Variable Set String %T9% "@"
  End If
  If Variable %T9% >= "a"
    AND
  If Variable %T9% <= "z"
    Variable Set String %T9% "@"
  End If
  If Variable %T9% >= "0"
    AND
  If Variable %T9% <= "9"
    Variable Set String %T9% "#"
  End If
  If Variable %T9% = " "
    Variable Set String %T9% "$"
  End If
  Variable Modify String: Append %T9% to %T3%
Repeat End
Text Box Display: Debug letters/numbers replaced with special characters
// Postal code LDL DLD will have been replaced by @#@$#@#, so find position of that string within the address in variable T3
Variable Set Integer %N9% from Position of Text in Variable %T3%
// Extract corresponding characters from original [trimmed] address in T2
Variable Modify String: Copy Part of %T2% to %T12%
Text Box Display: Debug original address and extracted postal code
//  
//  
Macro Return
// 

Link to comment
Share on other sites

Thank you everybody for your contributions!

 

Like rberq, I used the substitution method. But I came to the approach relatively late. My initial attempt was similar to what Terry suggested, but I found the coding too complicated. I'm curious whether someone else might have better success with the group of characters approach.

 

rberq, nice touch to delete the three tokens you used before parsing the address. I used "L" (for letters) "D" (for digits) and "X" for anything else. I hope you are OK with me borrowing your idea!

 

Joe, your stretch-the-rules approach is probably as good as it's going to get. I'm going to study your solution. I've not been able to make sense of MEP's external scripting feature, but you've given me a foothold to understanding.

 

Here is my solution:

 

// Extract a Canadian postal code from an address.
// For example: "123 Main, M4Z 1V4, Canada" --> "M4Z 2V4"
 
// Assume postal code is always in this format: "LNL NLN"
// It's 7 characters long with a space in position 4. "L" is a letter and "N" is a digit.
 
// How this works:
// Parse the address one character at a time to build a variable %Result%
// If the character is an upper-case letter, append an "L" to %Result%
// If the character is a digit, append a "D" to %Result%
// If the character is anything else, append "X"
// Search %Result% for this 7-digit string: "LNLXNLN"
// Its starting position within the address is the postal code.
Variable Set String %Address% to "68 Bob Dr. Twr. 2, Mississauga, ON L5N 7Y5" // Leave only one of the following line uncommented
Variable Set String %Address% to "M4K 1Z1 (Main building)"
Variable Set String %Address% to "123 rue Charlevoix, Sherbrooke, H5N 7Y5 Quebec (PQ)"
Variable Set String %Address% to "123 Moncton, Halifax, NB (B1N 4A5)"
 
Variable Set Integer %Chars% to the length of variable %Address%
Variable Set String %Result% to ""
Variable Set Integer %Count% to 1
 
Repeat with Variable: Repeat %Chars% times
  Variable Modify String: Copy part of text in %Address% starting at %Count% and 1 characters long to %Item%
   
  If Variable %Item% Is Greater Than or Equal To "A"
    AND
  If Variable %Item% Is Less Than or Equal To "Z"
    Variable Modify String %Result%: Append Text (A)
  Else
    If Variable %Item% Is Greater Than or Equal To "0"
      AND
    If Variable %Item% Is Less Than or Equal To "9"
      Variable Modify String %Result%: Append Text (D)
    Else
      Variable Modify String %Result%: Append Text (X)
    End If
  End If
  Variable Modify Integer %Count%: Increment
End Repeat
 
Variable Set Integer %PostalCodeStart% to the position of "ADAXDAD" in %Result%
Variable Modify String: Copy part of text in %Address% starting at %PostalCodeStart% and 7 characters long to %PostalCode%
 
Text Box Display: %PostalCode%
<COMMENT Value="Extract a Canadian postal code from an address."/>
<COMMENT Value="For example: \"123 Main, M4Z 1V4, Canada\" --> \"M4Z 2V4\""/>
<COMMENT/>
<COMMENT Value="Assume postal code is always in this format: \"LNL NLN\""/>
<COMMENT Value="It's 7 characters long with a space in position 4. \"L\" is a letter and \"N\" is a digit."/>
<COMMENT/>
<COMMENT Value="How this works:"/>
<COMMENT Value="Parse the address one character at a time to build a variable %Result%"/>
<COMMENT Value="If the character is an upper-case letter, append an \"L\" to %Result%"/>
<COMMENT Value="If the character is a digit, append a \"D\" to %Result%"/>
<COMMENT Value="If the character is anything else, append \"X\""/>
<COMMENT Value="Search %Result% for this 7-digit string: \"LNLXNLN\""/>
<COMMENT Value="Its starting position within the address is the postal code."/>
<VARIABLE SET STRING Option="\x00" Destination="%Address%" Value="6880 Financial Dr. Twr. 2- 6th floor , Mississauga, ON L5N 7Y5" NoEmbeddedVars="FALSE" _ENABLED="FALSE" _COMMENT="Leave only one of the following line uncommented"/>
<VARIABLE SET STRING Option="\x00" Destination="%Address%" Value="M4K 1V1 (Main building)" NoEmbeddedVars="FALSE" _ENABLED="FALSE"/>
<VARIABLE SET STRING Option="\x00" Destination="%Address%" Value="123 rue Charlevoix, Sherbrooke, H5N 7Y5 Quebec (PQ)" NoEmbeddedVars="FALSE"/>
<VARIABLE SET STRING Option="\x00" Destination="%Address%" Value="123 Moncton, Halifax, NB (B1N 4A5)" NoEmbeddedVars="FALSE" _ENABLED="FALSE"/>
<COMMENT/>
<VARIABLE SET INTEGER Option="\x0D" Destination="%Chars%" Text_Variable="%Address%"/>
<VARIABLE SET STRING Option="\x00" Destination="%Result%" NoEmbeddedVars="FALSE"/>
<VARIABLE SET INTEGER Option="\x00" Destination="%Count%" Value="1"/>
<COMMENT/>
<REPEAT WITH VARIABLE Variable="%Chars%" UseCounter="FALSE"/>
<VARIABLE MODIFY STRING Option="\x09" Destination="%Item%" Variable="%Address%" Start="%Count%" Count="1" NoEmbeddedVars="FALSE"/>
<COMMENT/>
<IF VARIABLE Variable="%Item%" Condition="\x04" Value="A" IgnoreCase="FALSE"/>
<AND/>
<IF VARIABLE Variable="%Item%" Condition="\x05" Value="Z" IgnoreCase="FALSE"/>
<VARIABLE MODIFY STRING Option="\x06" Destination="%Result%" Value="A" NoEmbeddedVars="FALSE"/>
<ELSE/>
<IF VARIABLE Variable="%Item%" Condition="\x04" Value="0" IgnoreCase="FALSE"/>
<AND/>
<IF VARIABLE Variable="%Item%" Condition="\x05" Value="9" IgnoreCase="FALSE"/>
<VARIABLE MODIFY STRING Option="\x06" Destination="%Result%" Value="D" NoEmbeddedVars="FALSE"/>
<ELSE/>
<VARIABLE MODIFY STRING Option="\x06" Destination="%Result%" Value="X" NoEmbeddedVars="FALSE"/>
<END IF/>
<END IF/>
<VARIABLE MODIFY INTEGER Option="\x07" Destination="%Count%"/>
<END REPEAT/>
<COMMENT/>
<VARIABLE SET INTEGER Option="\x0E" Destination="%PostalCodeStart%" Text_Variable="%Result%" Text="ADAXDAD" Ignore_Case="FALSE"/>
<VARIABLE MODIFY STRING Option="\x09" Destination="%PostalCode%" Variable="%Address%" Start="%PostalCodeStart%" Count="7" NoEmbeddedVars="FALSE"/>
<COMMENT/>
<TEXT BOX DISPLAY Title="%PostalCode%" Content="{\\rtf1\\ansi\\ansicpg1252\\deff0\\deflang1033{\\fonttbl{\\f0\\fnil Tahoma;}}\r\n\\viewkind4\\uc1\\pard\\f0\\fs14 \r\n\\par }\r\n" Left="Center" Top="Center" Width="278" Height="200" Monitor="0" OnTop="TRUE" Keep_Focus="TRUE" Mode="\x00" Delay="0"/>
	

 

Link to comment
Share on other sites

5 hours ago, joe said:

I'm just guessing but more than likely you utilized the Split String command to create an array to work through.

 

I attempted it, but unfortunately I couldn't get it to work. Did anyone try to crack the puzzle via Split String?

Link to comment
Share on other sites

As for my intended approach, getting the 7-character groups into an array proved easy enough. But my assumption that the subsequent manipulation of these would be easy was over optimistic!

 

Your solution, Alan, incorporating @rberg’s substitution logic looks great to me, well done, and thanks for an interesting thread.

(Trivial point: it seems you switched to using A instead of L for alphabetic characters,  so your introductory comment needs editing.)

 

But wish I had learnt VBS 😉

Link to comment
Share on other sites

acantor -

 

Here is a generic explanation for the regex (regular expression) pattern.  What regex engines do is to find portions of strings based on a pattern.  It processes the string passed to it one character at a time moving from left to right.  In this case it returns only the first thing found and is case-insensitive.

 

\b[ABCEGHJKLMNPRSTVXY][0-9][A-Z] [0-9][A-Z][0-9]\b

Assert position at a word boundary (preceded or followed—but not both—by an ASCII letter, digit, or underscore) «\b»
Match a single character from the list "ABCEGHJKLMNPRSTVXY" «[ABCEGHJKLMNPRSTVXY]»
Match a single character in the range between "0" and "9" «[0-9]»
Match a single character in the range between "A" and "Z" «[A-Z]»
Match the space character " " literally « »
Match a single character in the range between "0" and "9" «[0-9]»
Match a single character in the range between "A" and "Z" «[A-Z]»
Match a single character in the range between "0" and "9" «[0-9]»

Assert position at a word boundary (preceded or followed—but not both—by an ASCII letter, digit, or underscore) «\b»

 

rberq did something similar to this by logically changing the characters in the string to be known characters which gave him the ability to find a pattern.

 

Most, if not all, programming and scripting languages provide a regex engine.  Macro Express does not, even though it could, because Delphi itself has one (TRegEx).  But in my opinion, the implementation of it would be beyond the general meaning and purpose of Macro Express.  That being said, Macro Express provides both the Program Launch and External Script commands which allow us to use features of other programming languages and return results back to Macro Express.

Link to comment
Share on other sites

Joe, if you are not a Canadian, you deserve to be declared an honourary citizen for recognizing that the first character of Canadian postal codes is a subset of the letters from A to Z:

 

A, B, C, E, G, H, J, K, L, M, N, P, R, S, T, V, X, and Y

 

This is easy to detect with Regedit, and more challenging via the substitution method.

 

As I've tested with real-life address data, I've had to make adaptations to my script. For example, I've seen these "errors" in my data:

 

1. The postal code is in lowercase instead of uppercase: "b2c d3e"
2. The space between the two groups is missing:"B2CD3E"
3. There are two or more spaces between the two groups: "B2C  D3E"
4. There is a tab character between the two groups.
5. There is a non-breaking space between the two groups, ASCII(160)
6. There is a hyphen between the two groups: "B2C-D3E"
7. The address contains % characters, which causes mischief for Macro Express. (Perhaps because MEP treats percentage signs as delimiters in variable names?)

 

None of these problems are insurmountable using the substitution method, but handling irregular postal codes adds complexity to the script. Because most these errors are rare, I'm only acting on the most easily handled issues.

 

Link to comment
Share on other sites

Well, I couldn't let this one go!


While using the macro to extract postal codes from addresses, I've seen more and more failures when the script encounters "real-world" data. For example:
 

  1. The postal code is missing.
  2. The postal code is in lowercase instead of uppercase: "b2c d3e"
  3. The space between the two groups is missing: "B2CD3E"
  4. There are two (or more) spaces between the two groups: "B2C    D3E"
  5. There is a tab character between the two groups.
  6. There is a combination of tab characters and spaces between the two groups.
  7. There is a non-breaking space between the two groups: ASCII(160)
  8. There is a hyphen between the two groups: "B2C-D3E"
  9. The address contains % characters, which is a reserved character in Macro Express.
  10. The letter "O" is used as a zero. (Presumably the opposite happens as well.)

 

I rewrote my script to deal with all of these issues -- except the last one!

 

The new logic:

  1. Uppercase the address.
  2. Strip out everything that is not a letter or a digit. So there will be no spaces, tabs, punctuation, etc.
  3. Change each letter to @ and each digit to #. (Thanks to rberq!)
  4. Search for the postal code pattern "@#@#@" and extract the postal code (six characters long and lacking its space).
  5. Insert a space in the middle position.
// Extract a Canadian postal code from anywhere in the address.
// For example: "123 Main, M4Z 1V4, Canada" --> "M4Z 2V4"
 
// Canadian postal codes are supposed to be in this format: "LNL NLN"
// ("L" is a letter and "N" is a digit)
// 7 characters long, with a space in position 4. 
// But sometimes there are formatting mistakes in the postal code. For example:
//     The code includes lowercase letters; or instead of one space, there is
//     a hyphen. a tab character, a non-breaking space, or several spaces.
// Sometimes the postal code is omitted from the address
// This script tries to anticipate these issues.
 
// How it works:
// 1. Uppercase the address --> %x%
// 2. Parse the address so it consists only of the characters A-Z and 0-9 --> %y%
// 3. Replace each letter with "@" and each digit with "#" --> %z%
// 4. Search %z% for "@#@#@#" and extract the postal code from %y%
// 5. Add a space to position 4 of the Postal Code
 
Variable Set String %Address% to "M4K 1V1 (Main building)" // Test addresses. Uncomment one at a time
Variable Set String %Address% to "6880 Bottle Dr. Twr. 2- 11th floor , Mississauga, ON L5N 7V4"
Variable Set String %Address% to "123 rue Charlevoix, Sherbrooke, H5N- 7Y5 Quebec (PQ)"
Variable Set String %Address% to "123 Moncton, Halifax, NB (B1N   4A5)"
Variable Set String %Address% to "555 George Street, Unit A2B 2nd floor, Winnipeg MB"
 
// 1. Uppercase the address (%x%)
Variable Set String %x% to "%Address%"
Variable Modify String %x%: Uppercase
 
// 2. Parse %x% to build %y% consisting only of A to Z and 0 to 9.
Variable Set Integer %Chars% to the length of variable %x%
Variable Set String %y% to ""
 
Repeat Start (Repeat %Chars% times)
  Variable Modify String: Copy part of text in %x% starting at %Count% and 1 characters long to %Item%
  If Variable %Item% Is Greater Than or Equal To "0"
    AND
  If Variable %Item% Is Less Than or Equal To "9"
    OR
  If Variable %Item% Is Greater Than or Equal To "A"
    AND
  If Variable %Item% Is Less Than or Equal To "Z"
    Variable Modify String %y%: Append Text (%Item%)
  End If
End Repeat
 
// 3. Parse %y% to build %z% consisting of "@" for each letter and "#" for each digit.
Variable Set Integer %Chars% to the length of variable %y%
Variable Set String %z% to ""
 
Repeat Start (Repeat %Chars% times)
  Variable Modify String: Copy part of text in %y% starting at %Count% and 1 characters long to %Item%
  If Variable %Item% Is Greater Than or Equal To "A"
    AND
  If Variable %Item% Is Less Than or Equal To "Z"
    Variable Modify String %z%: Append Text (@)
  End If
  If Variable %Item% Is Greater Than or Equal To "0"
    AND
  If Variable %Item% Is Less Than or Equal To "9"
    Variable Modify String %z%: Append Text (#)
  End If
End Repeat
 
// 4. Search %z% for "@#@#@#" and extract the postal code from %y%
If Variable %z% Contains "@#@#@#"
  Variable Set Integer %PostalCodePosition% to the position of "@#@#@#" in %z%
  Variable Modify String: Copy part of text in %y% starting at %PostalCodePosition% and 6 characters long to %PostalCodeNoSpace%
Else
  Variable Set String %PostalCodeNoSpace% to "X0X0X0"
End If
 
// 5. Add a space to position 4 of the Postal Code
Variable Modify String: Copy part of text in %PostalCodeNoSpace% starting at 1 and 3 characters long to %PostalCodeGroup1%
Variable Modify String: Copy part of text in %PostalCodeNoSpace% starting at 4 and 6 characters long to %PostalCodeGroup2%
Variable Set String %PostalCode% to "%PostalCodeGroup1% %PostalCodeGroup2%"
Text Box Display: Postal code result
<COMMENT Value="Extract a Canadian postal code from anywhere in the address."/>
<COMMENT Value="For example: \"123 Main, M4Z 1V4, Canada\" --> \"M4Z 2V4\""/>
<COMMENT/>
<COMMENT Value="Canadian postal codes are supposed to be in this format: \"LNL NLN\""/>
<COMMENT Value="(\"L\" is a letter and \"N\" is a digit)"/>
<COMMENT Value="7 characters long, with a space in position 4. "/>
<COMMENT Value="But sometimes there are formatting mistakes in the postal code. For example:"/>
<COMMENT Value="    The code includes lowercase letters; or instead of one space, there is"/>
<COMMENT Value="    a hyphen. a tab character, a non-breaking space, or several spaces."/>
<COMMENT Value="Sometimes the postal code is omitted from the address"/>
<COMMENT Value="This script tries to anticipate these issues."/>
<COMMENT/>
<COMMENT Value="How it works:"/>
<COMMENT Value="1. Uppercase the address --> %x%"/>
<COMMENT Value="2. Parse the address so it consists only of the characters A-Z and 0-9 --> %y%"/>
<COMMENT Value="3. Replace each letter with \"@\" and each digit with \"#\" --> %z%"/>
<COMMENT Value="4. Search %z% for \"@#@#@#\" and extract the postal code from %y%"/>
<COMMENT Value="5. Add a space to position 4 of the Postal Code"/>
<COMMENT/>
<VARIABLE SET STRING Option="\x00" Destination="%Address%" Value="M4K 1V1 (Main building)" NoEmbeddedVars="FALSE" _ENABLED="FALSE" _COMMENT="Test addresses. Uncomment one at a time"/>
<VARIABLE SET STRING Option="\x00" Destination="%Address%" Value="6880 Bottle Dr. Twr. 2- 11th floor , Mississauga, ON L5N 7V4" NoEmbeddedVars="FALSE"/>
<VARIABLE SET STRING Option="\x00" Destination="%Address%" Value="123 rue Charlevoix, Sherbrooke, H5N- 7Y5 Quebec (PQ)" NoEmbeddedVars="FALSE" _ENABLED="FALSE"/>
<VARIABLE SET STRING Option="\x00" Destination="%Address%" Value="123 Moncton, Halifax, NB (B1N   4A5)" NoEmbeddedVars="FALSE" _ENABLED="FALSE"/>
<VARIABLE SET STRING Option="\x00" Destination="%Address%" Value="555 George Street, Unit A2B 2nd floor, Winnipeg MB" NoEmbeddedVars="FALSE" _ENABLED="FALSE"/>
<COMMENT/>
<COMMENT Value="1. Uppercase the address (%x%)"/>
<VARIABLE SET STRING Option="\x00" Destination="%x%" Value="%Address%" NoEmbeddedVars="FALSE"/>
<VARIABLE MODIFY STRING Option="\x0B" Destination="%x%"/>
<COMMENT/>
<COMMENT Value="2. Parse %x% to build %y% consisting only of A to Z and 0 to 9."/>
<VARIABLE SET INTEGER Option="\x0D" Destination="%Chars%" Text_Variable="%x%"/>
<VARIABLE SET STRING Option="\x00" Destination="%y%" NoEmbeddedVars="FALSE"/>
<COMMENT/>
<REPEAT START Start="1" Step="1" Count="%Chars%" Save="TRUE" Variable="%Count%"/>
<VARIABLE MODIFY STRING Option="\x09" Destination="%Item%" Variable="%x%" Start="%Count%" Count="1" NoEmbeddedVars="FALSE"/>
<IF VARIABLE Variable="%Item%" Condition="\x04" Value="0" IgnoreCase="FALSE"/>
<AND/>
<IF VARIABLE Variable="%Item%" Condition="\x05" Value="9" IgnoreCase="FALSE"/>
<OR/>
<IF VARIABLE Variable="%Item%" Condition="\x04" Value="A" IgnoreCase="FALSE"/>
<AND/>
<IF VARIABLE Variable="%Item%" Condition="\x05" Value="Z" IgnoreCase="FALSE"/>
<VARIABLE MODIFY STRING Option="\x06" Destination="%y%" Value="%Item%" NoEmbeddedVars="FALSE"/>
<END IF/>
<END REPEAT/>
<COMMENT/>
<COMMENT Value="3. Parse %y% to build %z% consisting of \"@\" for each letter and \"#\" for each digit."/>
<VARIABLE SET INTEGER Option="\x0D" Destination="%Chars%" Text_Variable="%y%"/>
<VARIABLE SET STRING Option="\x00" Destination="%z%" NoEmbeddedVars="FALSE"/>
<COMMENT/>
<REPEAT START Start="1" Step="1" Count="%Chars%" Save="TRUE" Variable="%Count%"/>
<VARIABLE MODIFY STRING Option="\x09" Destination="%Item%" Variable="%y%" Start="%Count%" Count="1" NoEmbeddedVars="FALSE"/>
<IF VARIABLE Variable="%Item%" Condition="\x04" Value="A" IgnoreCase="FALSE"/>
<AND/>
<IF VARIABLE Variable="%Item%" Condition="\x05" Value="Z" IgnoreCase="FALSE"/>
<VARIABLE MODIFY STRING Option="\x06" Destination="%z%" Value="@" NoEmbeddedVars="FALSE"/>
<END IF/>
<IF VARIABLE Variable="%Item%" Condition="\x04" Value="0" IgnoreCase="FALSE"/>
<AND/>
<IF VARIABLE Variable="%Item%" Condition="\x05" Value="9" IgnoreCase="FALSE"/>
<VARIABLE MODIFY STRING Option="\x06" Destination="%z%" Value="#" NoEmbeddedVars="FALSE"/>
<END IF/>
<END REPEAT/>
<COMMENT/>
<COMMENT Value="4. Search %z% for \"@#@#@#\" and extract the postal code from %y%"/>
<IF VARIABLE Variable="%z%" Condition="\x06" Value="@#@#@#" IgnoreCase="FALSE"/>
<VARIABLE SET INTEGER Option="\x0E" Destination="%PostalCodePosition%" Text_Variable="%z%" Text="@#@#@#" Ignore_Case="FALSE"/>
<VARIABLE MODIFY STRING Option="\x09" Destination="%PostalCodeNoSpace%" Variable="%y%" Start="%PostalCodePosition%" Count="6" NoEmbeddedVars="FALSE"/>
<ELSE/>
<VARIABLE SET STRING Option="\x00" Destination="%PostalCodeNoSpace%" Value="X0X0X0" NoEmbeddedVars="FALSE"/>
<END IF/>
<COMMENT/>
<COMMENT Value="5. Add a space to position 4 of the Postal Code"/>
<VARIABLE MODIFY STRING Option="\x09" Destination="%PostalCodeGroup1%" Variable="%PostalCodeNoSpace%" Start="1" Count="3" NoEmbeddedVars="FALSE"/>
<VARIABLE MODIFY STRING Option="\x09" Destination="%PostalCodeGroup2%" Variable="%PostalCodeNoSpace%" Start="4" Count="6" NoEmbeddedVars="FALSE"/>
<VARIABLE SET STRING Option="\x00" Destination="%PostalCode%" Value="%PostalCodeGroup1% %PostalCodeGroup2%" NoEmbeddedVars="FALSE"/>
<TEXT BOX DISPLAY Title="Postal code result" Content="{\\rtf1\\ansi\\ansicpg1252\\deff0\\deflang1033{\\fonttbl{\\f0\\fnil\\fcharset0 Tahoma;}{\\f1\\fnil Tahoma;}}\r\n\\viewkind4\\uc1\\pard\\lang4105\\f0\\fs24 \"%PostalCode%\"\\lang1033\\f1\\fs14 \r\n\\par }\r\n" Left="821" Top="417" Width="1085" Height="446" Monitor="0" OnTop="TRUE" Keep_Focus="TRUE" Mode="\x00" Delay="0"/>

 

Link to comment
Share on other sites

10 minutes ago, acantor said:

The letter "O" is used as a zero. (Presumably the opposite happens as well.)

.

I haven't thought this through all the way, but since the letter 'O' never occurs within a postal code, perhaps after upper-casing the address you could change every "O" to zero, then proceed as usual.  The danger would be possibly generating false positives elsewhere in the address, but as I suggested above, if there are two or more patterns that seem to be postal codes, most likely you would be safe accepting the last one. 

 

I'm surprised you haven't also had the problem of people using a lower case "L" instead of the digit "1", or vice versa.  "L" is a legitimate code, so it wouldn't be amenable to the above "O" technique.  

Link to comment
Share on other sites

Quote


I'm surprised you haven't also had the problem of people using a lower case "L" instead of the digit "1", or vice versa.  "L" is a legitimate code, so it wouldn't be amenable to the above "O" technique.  

 

I wouldn't be surprised to discover "lowercase L" instead of  "1" in a very large data set.

 

This raises a question: how far is far enough when trying to "bullet-proof" a script. I can imagine that in time, there would be confusions between 3 and E, 5 and S, 6 and G, and maybe others.

 

I recognize that the technique I am using could potentially pluck postal codes out of an address where there is none, although probably very rarely. My hunch is that attempts to anticipate and correct problems (such as the above examples) will lead to false-positives more often than when an address includes a string of characters that isn't a postal code, but matches the postal code pattern.

 

Terry, number zero is used in Canadian postal codes. Trivial factoid: a zero in the second place means a rural address. But there is one exception. Canada Post employees (and volunteers) have for years responded to letters sent to this address:

 

Santa Claus

North Pole

HOH OH0

 

The initial "H" means the address is in Montreal, but the zero that follows it means that Montreal must be rural rather than a city with 3.5 million people!

Link to comment
Share on other sites

Quote

 


I haven't thought this through all the way, but since the letter 'O' never occurs within a postal code, perhaps after upper-casing the address you could change every "O" to zero, then proceed as usual.  The danger would be possibly generating false positives elsewhere in the address, but as I suggested above, if there are two or more patterns that seem to be postal codes, most likely you would be safe accepting the last one. 
 

 

 

The idea of repairing slightly incorrect postal codes is appealing, and I may take it on!

 

Checking if the postal code pattern appears more than once is worth considering. I think you're right: trying to repair a postal code could lead to a false positive.

 

I have been trying to imagine hypothetical addresses that, stripped of everything except numbers and letters, would result in the "LDLDLD" pattern. The longest example I have come up with is three, maybe four characters. But not six.

 

Apt. #A1, B. Street --> ...A1B... (LDL)

Link to comment
Share on other sites

I'm interested!

 

Based on Joe's posting, I realize RegEx would be a very efficient way to crack this puzzle.

 

On the other hand, it was educational (and fun) developing the original macro, getting feedback from others on this forum, discovering the limits imposed by real world data, and then redesigning the entire macro to deal with some of the limitations.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...