Jump to content
Macro Express Forums

Challenge: a macro to "clean up" email addresses


Recommended Posts

This challenge is arguably more of a mid-week diversion than a real challenge. But I thought others might enjoy this not-too-hard puzzle.

 

I recently updated an email mailing list consisting of about 1000 addresses. The list looked like this...

 

Quote

Midge, John <jmidge@abc.ca>
de Ville, Cruella <cruel@deville.com>
<info@blablabla.gov>
smith@smithsmith.net
Didge, Onn <onn@abc.ca>
Jack <jack@oooo.com>
<contact@blablabla.gov>
ssmith@smithsmith.net

#Bingo!

...

(A total of about 1000)

 

Note that each line appears in one of four formats:

 

Name <email@address>

<email@address>

email@address

not an email address (e.g., #Bingo!)

 

The challenge: use Macro Express to simplify the task of cleaning up the 1000 lines so that each line consists only of an email address:

 

Quote

jmidge@abc.ca
cruel@deville.com
info@blablabla.gov
smith@smithsmith.net
onn@abc.ca
jack@oooo.com
contact@blablabla.gov
ssmith@smithsmith.net


etc.

 

Any line that is not an email address, such as "#Bingo!" or a blank line, should be removed.

Link to comment
Share on other sites

Well, this challenge didn't exactly break the internet!

 

I came up with two solutions. The first solution appears in this post, and the second is in the next.

 

The first solution is what I had in mind when I posted the challenge. It's a "quick and dirty" way to strip out text that doesn't form part of an email address. The reason it was relatively easy to script is that the addresses appeared in only four formats. In two of the formats, the less-than sign always marks the start of the email address, and the greater-than sign always marks its end. So the handling of both was identical.

 

email@address

<email@address>

Name <email@address>

Not an email address

 

The macro only took 10 or 15 minutes to prepare, and saved me 45 minutes or an hour of repetitive drudgery. The macro acts on ten email addresses at a time, so although much quicker than making the repairs by hand, the macro is still fairly manual.

 

// Clean-up 10 email addresses by deleting non-email address text
Repeat Start (Repeat 10 times)
  Clipboard Empty
  Text Type (Simulate Keystrokes): <END><SHIFT><HOME> // Select the line
  Text Type (Simulate Keystrokes): <CONTROL>c
  Variable Set String %Clip% from the clipboard contents
  If Variable %Clip% Does not Contain "@"
    Text Type (Simulate Keystrokes): <BACKSPACE>
  Else
    If Variable %Clip% Contains "<" // Handle email address wrapped with < and >
      Variable Set Integer %LessThanPos% to the position of "<" in %Clip%
      Variable Modify String: Delete part of text from %Clip% starting at 1 and %LessThanPos% characters long
      Variable Modify String: Replace ">" in %Clip% with ""
      Text Type (Simulate Keystrokes): %Clip%
    End If
  End If
  Text Type (Simulate Keystrokes): <ARROW DOWN>
End Repeat

<COMMENT Value="Clean-up 10 email addresses by deleting non-email address text"/>
<REPEAT START Start="1" Step="1" Count="10" Save="FALSE"/>
<CLIPBOARD EMPTY/>
<TEXT TYPE Action="0" Text="<END><SHIFT><HOME>" _COMMENT="Select the line"/>
<TEXT TYPE Action="0" Text="<CONTROL>c"/>
<VARIABLE SET STRING Option="\x02" Destination="%Clip%" NoEmbeddedVars="FALSE"/>
<IF VARIABLE Variable="%Clip%" Condition="\x07" Value="@" IgnoreCase="FALSE"/>
<TEXT TYPE Action="0" Text="<BACKSPACE>"/>
<ELSE/>
<IF VARIABLE Variable="%Clip%" Condition="\x06" Value="<" IgnoreCase="FALSE" _COMMENT="Handle email address wrapped with < and >"/>
<VARIABLE SET INTEGER Option="\x0E" Destination="%LessThanPos%" Text_Variable="%Clip%" Text="<" Ignore_Case="FALSE"/>
<VARIABLE MODIFY STRING Option="\x0A" Destination="%Clip%" Start="1" Count="%LessThanPos%"/>
<VARIABLE MODIFY STRING Option="\x0F" Destination="%Clip%" ToReplace=">" All="FALSE" IgnoreCase="FALSE" NoEmbeddedVars="FALSE"/>
<TEXT TYPE Action="0" Text="%Clip%"/>
<END IF/>
<END IF/>
<TEXT TYPE Action="0" Text="<ARROW DOWN>"/>
<END REPEAT/>

 

 

Link to comment
Share on other sites

As soon as I got my "quick-and-dirty" macro working, I realized I wanted a general solution, i.e., a way to recognize any email address regardless of how it is formatted. In other words, I wanted a macro that would extract any properly-formed email address.

 

I assume these are the rules for a properly-formed email address:

 

left@right

 

It's got an @-sign sandwiched between the left side and the right side.

 

These characters are legal on both sides:

 

A through Z          (both cases)

0 through 9

.    (period)

-    (hyphen)

_   (underscore)

 

However, the last character on the right side cannot be a period, hyphen, or underscore.

 

To make a long story short, after a lot of work, I came up with a solution that does the heavy-lifting entirely in variable-space.  But instead of knocking it out in 15 or 20 minutes, as I did for the original solution, this version took 15 or 20 hours to work out the kinks. And I'm sure there are still ways to improve it.

 

// Macro to "scrape" email addresses from selected text
 
// How to use it:
// 1. Select text that contains email addresses.
// 2. Run this macro.
// 3. The list of email addresses will replace the selected text.
 
// How it works (simplified)
//    0. Copy selected text to variable %Clip% for processing.
//    1. Find leftmost @-sign in %Clip%.
//    2. Parse left from @ until illegal character encountered. Capture legal characters in %LeftReversed%
//    3. Reverse %LeftReversed%. That will be the left side of the email address, %Left%
//    4. Parse right from @ until an illegal character is found. Capture legal characters in %Right%
//    5. %Result% = %Left%@%Right% 
//    6. Store results. Delete the first @-sign in %Clip%.
//    7. Repeat Steps 1 through 6 until all @-signs have been deleted.
 
// Legal characters for email addresses
Variable Set String %LegalChars% to "abcdefghijklmnopqrstuvwxyz1234567890._-"
// Three legal characters cannot appear in the last position: period, hyphen, and underscore
Variable Set String %IllegalLastChars% to "._-"
 
Clipboard Copy
Variable Set String %Clip% from the clipboard contents
Variable Set String %Clip% to "john@johnny.ca, Sandy Smith <sandy@smith.com>; alan@cantoraccess.com." // Test
 
Variable Set String %Clip% to "*%Clip%*" // Add an "illegal" character to the start and end of %Clip%
 
// Loop through %Clip% until all @-signs are processed
Repeat Until %Clip% Does not Contain "@"
 
// Get position of first @-sign
  Variable Set Integer %CharsToAtSign% to the position of "@" in %Clip%
  Variable Set Integer %Count% to %CharsToAtSign%
   
  // Parse by character <-- RIGHT to LEFT, from the @-sign, until an illegal character is encountered
  Repeat Start (Repeat %CharsToAtSign% times)
    Variable Modify Integer %Count%: Decrement
    Variable Modify String: Copy part of text in %Clip% starting at %Count% and 1 characters long to %Char%
    If Variable %LegalChars% Contains "%Char%"
      Variable Modify String %LeftReversed%: Append Text (%Char%)
    Else
      Repeat Exit
    End If
  End Repeat
   
  // Reverse the order of %LeftReversed% to produce %Left%, the left side of the email address
  Variable Set Integer %LeftLength% to the length of variable %LeftReversed%
  Repeat Start (Repeat %LeftLength% times)
    Variable Modify String: Copy part of text in %LeftReversed% starting at %LeftLength% and 1 characters long to %Char%
    Variable Modify String %Left%: Append Text (%Char%)
    Variable Modify Integer: %LeftLength% = %LeftLength% - 1
  End Repeat
   
  // Parse by character --> RIGHT to LEFT, from the @-sign, until an illegal character is encountered
  Variable Modify Integer: %CharsToAtSign% = %CharsToAtSign% + 1
  Variable Set Integer %Pointer% to %CharsToAtSign%
  Variable Set Integer %ClipLength% to the length of variable %Clip%
  Variable Modify Integer: %MaxCharsToCheck% = %ClipLength% - %CharsToAtSign%
   
  Repeat Start (Repeat %MaxCharsToCheck% times)
    Variable Modify String: Copy part of text in %Clip% starting at %Pointer% and 1 characters long to %Char%
    If Variable %LegalChars% Contains "%Char%"
      Variable Modify String %Right%: Append Text (%Char%)
      Variable Modify Integer %Pointer%: Increment
    Else
      Repeat Exit
    End If
  End Repeat
   
  // Handle illegal characters in the last position in %Right% 
  Variable Set Integer %RightLength% to the length of variable %Right%
  Variable Modify String: Copy part of text in %Right% starting at %RightLength% and 1 characters long to %Char%
  If Variable %IllegalLastChars% Contains "%Char%"
    Variable Modify String: Delete part of text from %Right% starting at %RightLength% and 1 characters long
  End If
   
  // Construct email address: %Left%@%Right% and store all addresses in %Results%
  Variable Set String %Result% to "%Left%@%Right%"
  Variable Modify String %Results%: Append Text (%Result%
)
  Text Box Display: Result
   
  // Delete the @-sign identified above from %Clip%
  Variable Modify String: Replace "@" in %Clip% with ""
   
  // Prepare for the next iteration
  Variable Set String %Left% to ""
  Variable Set String %LeftReversed% to ""
  Variable Set String %Right% to ""
   
End Repeat
 
Text Type (Use Clipboard and Paste Text): %Results%

<COMMENT Value="Macro to \"scrape\" email addresses from selected text"/>
<COMMENT/>
<COMMENT Value="How to use it:"/>
<COMMENT Value="1. Select text that contains email addresses."/>
<COMMENT Value="2. Run this macro."/>
<COMMENT Value="3. The list of email addresses will replace the selected text."/>
<COMMENT/>
<COMMENT Value="How it works (simplified)"/>
<COMMENT Value="   0. Copy selected text to variable %Clip% for processing."/>
<COMMENT Value="   1. Find leftmost @-sign in %Clip%."/>
<COMMENT Value="   2. Parse left from @ until illegal character encountered. Capture legal characters in %LeftReversed%"/>
<COMMENT Value="   3. Reverse %LeftReversed%. That will be the left side of the email address, %Left%"/>
<COMMENT Value="   4. Parse right from @ until an illegal character is found. Capture legal characters in %Right%"/>
<COMMENT Value="   5. %Result% = %Left%@%Right% "/>
<COMMENT Value="   6. Store results. Delete the first @-sign in %Clip%."/>
<COMMENT Value="   7. Repeat Steps 1 through 6 until all @-signs have been deleted."/>
<COMMENT/>
<COMMENT Value="Legal characters for email addresses"/>
<VARIABLE SET STRING Option="\x00" Destination="%LegalChars%" Value="abcdefghijklmnopqrstuvwxyz1234567890._-" NoEmbeddedVars="FALSE"/>
<COMMENT Value="Three legal characters cannot appear in the last position: period, hyphen, and underscore"/>
<VARIABLE SET STRING Option="\x00" Destination="%IllegalLastChars%" Value="._-" NoEmbeddedVars="FALSE"/>
<COMMENT/>
<CLIPBOARD COPY/>
<VARIABLE SET STRING Option="\x02" Destination="%Clip%" NoEmbeddedVars="FALSE"/>
<VARIABLE SET STRING Option="\x00" Destination="%Clip%" Value="john@johnny.ca, Sandy Smith <sandy@smith.com>; alan@cantoraccess.com." NoEmbeddedVars="FALSE" _ENABLED="FALSE" _COMMENT="Test"/>
<COMMENT/>
<VARIABLE SET STRING Option="\x00" Destination="%Clip%" Value="*%Clip%*" NoEmbeddedVars="FALSE" _COMMENT="Add an \"illegal\" character to the start and end of %Clip%"/>
<COMMENT/>
<COMMENT Value="Loop through %Clip% until all @-signs are processed"/>
<REPEAT UNTIL Variable="%Clip%" Condition="\x07" Value="@"/>
<COMMENT/>
<COMMENT Value="Get position of first @-sign"/>
<VARIABLE SET INTEGER Option="\x0E" Destination="%CharsToAtSign%" Text_Variable="%Clip%" Text="@" Ignore_Case="FALSE"/>
<VARIABLE SET INTEGER Option="\x00" Destination="%Count%" Value="%CharsToAtSign%"/>
<COMMENT/>
<COMMENT Value="Parse by character <-- RIGHT to LEFT, from the @-sign, until an illegal character is encountered"/>
<REPEAT START Start="1" Step="1" Count="%CharsToAtSign%" Save="FALSE"/>
<VARIABLE MODIFY INTEGER Option="\x08" Destination="%Count%"/>
<VARIABLE MODIFY STRING Option="\x09" Destination="%Char%" Variable="%Clip%" Start="%Count%" Count="1" NoEmbeddedVars="FALSE"/>
<IF VARIABLE Variable="%LegalChars%" Condition="\x06" Value="%Char%" IgnoreCase="TRUE"/>
<VARIABLE MODIFY STRING Option="\x06" Destination="%LeftReversed%" Value="%Char%" NoEmbeddedVars="FALSE"/>
<ELSE/>
<REPEAT EXIT/>
<END IF/>
<END REPEAT/>
<COMMENT/>
<COMMENT Value="Reverse the order of %LeftReversed% to produce %Left%, the left side of the email address"/>
<VARIABLE SET INTEGER Option="\x0D" Destination="%LeftLength%" Text_Variable="%LeftReversed%"/>
<REPEAT START Start="1" Step="1" Count="%LeftLength%" Save="FALSE"/>
<VARIABLE MODIFY STRING Option="\x09" Destination="%Char%" Variable="%LeftReversed%" Start="%LeftLength%" Count="1" NoEmbeddedVars="FALSE"/>
<VARIABLE MODIFY STRING Option="\x06" Destination="%Left%" Value="%Char%" NoEmbeddedVars="FALSE"/>
<VARIABLE MODIFY INTEGER Option="\x01" Destination="%LeftLength%" Value1="%LeftLength%" Value2="1"/>
<END REPEAT/>
<COMMENT/>
<COMMENT Value="Parse by character --> RIGHT to LEFT, from the @-sign, until an illegal character is encountered"/>
<VARIABLE MODIFY INTEGER Option="\x00" Destination="%CharsToAtSign%" Value1="%CharsToAtSign%" Value2="1"/>
<VARIABLE SET INTEGER Option="\x00" Destination="%Pointer%" Value="%CharsToAtSign%"/>
<VARIABLE SET INTEGER Option="\x0D" Destination="%ClipLength%" Text_Variable="%Clip%"/>
<VARIABLE MODIFY INTEGER Option="\x01" Destination="%MaxCharsToCheck%" Value1="%ClipLength%" Value2="%CharsToAtSign%"/>
<COMMENT/>
<REPEAT START Start="1" Step="1" Count="%MaxCharsToCheck%" Save="FALSE"/>
<VARIABLE MODIFY STRING Option="\x09" Destination="%Char%" Variable="%Clip%" Start="%Pointer%" Count="1" NoEmbeddedVars="FALSE"/>
<IF VARIABLE Variable="%LegalChars%" Condition="\x06" Value="%Char%" IgnoreCase="TRUE"/>
<VARIABLE MODIFY STRING Option="\x06" Destination="%Right%" Value="%Char%" NoEmbeddedVars="FALSE"/>
<VARIABLE MODIFY INTEGER Option="\x07" Destination="%Pointer%"/>
<ELSE/>
<REPEAT EXIT/>
<END IF/>
<END REPEAT/>
<COMMENT/>
<COMMENT Value="Handle illegal characters in the last position in %Right% "/>
<VARIABLE SET INTEGER Option="\x0D" Destination="%RightLength%" Text_Variable="%Right%"/>
<VARIABLE MODIFY STRING Option="\x09" Destination="%Char%" Variable="%Right%" Start="%RightLength%" Count="1" NoEmbeddedVars="FALSE"/>
<IF VARIABLE Variable="%IllegalLastChars%" Condition="\x06" Value="%Char%" IgnoreCase="FALSE"/>
<VARIABLE MODIFY STRING Option="\x0A" Destination="%Right%" Start="%RightLength%" Count="1"/>
<END IF/>
<COMMENT/>
<COMMENT Value="Construct email address: %Left%@%Right% and store all addresses in %Results%"/>
<VARIABLE SET STRING Option="\x00" Destination="%Result%" Value="%Left%@%Right%" NoEmbeddedVars="FALSE"/>
<VARIABLE MODIFY STRING Option="\x06" Destination="%Results%" Value="%Result%\r\n" NoEmbeddedVars="FALSE"/>
<TEXT BOX DISPLAY Title="Result" Content="{\\rtf1\\ansi\\ansicpg1252\\deff0\\deflang1033{\\fonttbl{\\f0\\fnil\\fcharset0 Tahoma;}}\r\n\\viewkind4\\uc1\\pard\\lang4105\\f0\\fs28 %Result%\r\n\\par }\r\n" Left="821" Top="417" Width="1014" Height="216" Monitor="0" OnTop="TRUE" Keep_Focus="TRUE" Mode="\x00" Delay="0" _ENABLED="FALSE" _BACK="0080FFFF"/>
<COMMENT/>
<COMMENT Value="Delete the @-sign identified above from %Clip%"/>
<VARIABLE MODIFY STRING Option="\x0F" Destination="%Clip%" ToReplace="@" All="FALSE" IgnoreCase="FALSE" NoEmbeddedVars="FALSE"/>
<COMMENT/>
<COMMENT Value="Prepare for the next iteration"/>
<VARIABLE SET STRING Option="\x00" Destination="%Left%" NoEmbeddedVars="FALSE"/>
<VARIABLE SET STRING Option="\x00" Destination="%LeftReversed%" NoEmbeddedVars="FALSE"/>
<VARIABLE SET STRING Option="\x00" Destination="%Right%" NoEmbeddedVars="FALSE"/>
<COMMENT/>
<END REPEAT/>
<COMMENT/>
<TEXT TYPE Action="1" Text="%Results%"/>

 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...