acantor Posted August 25, 2021 Report Share Posted August 25, 2021 This challenge is arguably more of a mid-week diversion than a real challenge. But I thought others might enjoy this not-too-hard puzzle. I recently updated an email mailing list consisting of about 1000 addresses. The list looked like this... Quote Midge, John <jmidge@abc.ca> de Ville, Cruella <cruel@deville.com> <info@blablabla.gov> smith@smithsmith.net Didge, Onn <onn@abc.ca> Jack <jack@oooo.com> <contact@blablabla.gov> ssmith@smithsmith.net #Bingo! ... (A total of about 1000) Note that each line appears in one of four formats: Name <email@address> <email@address> email@address not an email address (e.g., #Bingo!) The challenge: use Macro Express to simplify the task of cleaning up the 1000 lines so that each line consists only of an email address: Quote jmidge@abc.ca cruel@deville.com info@blablabla.gov smith@smithsmith.net onn@abc.ca jack@oooo.com contact@blablabla.gov ssmith@smithsmith.net etc. Any line that is not an email address, such as "#Bingo!" or a blank line, should be removed. Quote Link to comment Share on other sites More sharing options...
acantor Posted August 28, 2021 Author Report Share Posted August 28, 2021 Well, this challenge didn't exactly break the internet! I came up with two solutions. The first solution appears in this post, and the second is in the next. The first solution is what I had in mind when I posted the challenge. It's a "quick and dirty" way to strip out text that doesn't form part of an email address. The reason it was relatively easy to script is that the addresses appeared in only four formats. In two of the formats, the less-than sign always marks the start of the email address, and the greater-than sign always marks its end. So the handling of both was identical. email@address <email@address> Name <email@address> Not an email address The macro only took 10 or 15 minutes to prepare, and saved me 45 minutes or an hour of repetitive drudgery. The macro acts on ten email addresses at a time, so although much quicker than making the repairs by hand, the macro is still fairly manual. // Clean-up 10 email addresses by deleting non-email address text Repeat Start (Repeat 10 times) Clipboard Empty Text Type (Simulate Keystrokes): <END><SHIFT><HOME> // Select the line Text Type (Simulate Keystrokes): <CONTROL>c Variable Set String %Clip% from the clipboard contents If Variable %Clip% Does not Contain "@" Text Type (Simulate Keystrokes): <BACKSPACE> Else If Variable %Clip% Contains "<" // Handle email address wrapped with < and > Variable Set Integer %LessThanPos% to the position of "<" in %Clip% Variable Modify String: Delete part of text from %Clip% starting at 1 and %LessThanPos% characters long Variable Modify String: Replace ">" in %Clip% with "" Text Type (Simulate Keystrokes): %Clip% End If End If Text Type (Simulate Keystrokes): <ARROW DOWN> End Repeat <COMMENT Value="Clean-up 10 email addresses by deleting non-email address text"/> <REPEAT START Start="1" Step="1" Count="10" Save="FALSE"/> <CLIPBOARD EMPTY/> <TEXT TYPE Action="0" Text="<END><SHIFT><HOME>" _COMMENT="Select the line"/> <TEXT TYPE Action="0" Text="<CONTROL>c"/> <VARIABLE SET STRING Option="\x02" Destination="%Clip%" NoEmbeddedVars="FALSE"/> <IF VARIABLE Variable="%Clip%" Condition="\x07" Value="@" IgnoreCase="FALSE"/> <TEXT TYPE Action="0" Text="<BACKSPACE>"/> <ELSE/> <IF VARIABLE Variable="%Clip%" Condition="\x06" Value="<" IgnoreCase="FALSE" _COMMENT="Handle email address wrapped with < and >"/> <VARIABLE SET INTEGER Option="\x0E" Destination="%LessThanPos%" Text_Variable="%Clip%" Text="<" Ignore_Case="FALSE"/> <VARIABLE MODIFY STRING Option="\x0A" Destination="%Clip%" Start="1" Count="%LessThanPos%"/> <VARIABLE MODIFY STRING Option="\x0F" Destination="%Clip%" ToReplace=">" All="FALSE" IgnoreCase="FALSE" NoEmbeddedVars="FALSE"/> <TEXT TYPE Action="0" Text="%Clip%"/> <END IF/> <END IF/> <TEXT TYPE Action="0" Text="<ARROW DOWN>"/> <END REPEAT/> Quote Link to comment Share on other sites More sharing options...
acantor Posted August 28, 2021 Author Report Share Posted August 28, 2021 As soon as I got my "quick-and-dirty" macro working, I realized I wanted a general solution, i.e., a way to recognize any email address regardless of how it is formatted. In other words, I wanted a macro that would extract any properly-formed email address. I assume these are the rules for a properly-formed email address: left@right It's got an @-sign sandwiched between the left side and the right side. These characters are legal on both sides: A through Z (both cases) 0 through 9 . (period) - (hyphen) _ (underscore) However, the last character on the right side cannot be a period, hyphen, or underscore. To make a long story short, after a lot of work, I came up with a solution that does the heavy-lifting entirely in variable-space. But instead of knocking it out in 15 or 20 minutes, as I did for the original solution, this version took 15 or 20 hours to work out the kinks. And I'm sure there are still ways to improve it. // Macro to "scrape" email addresses from selected text // How to use it: // 1. Select text that contains email addresses. // 2. Run this macro. // 3. The list of email addresses will replace the selected text. // How it works (simplified) // 0. Copy selected text to variable %Clip% for processing. // 1. Find leftmost @-sign in %Clip%. // 2. Parse left from @ until illegal character encountered. Capture legal characters in %LeftReversed% // 3. Reverse %LeftReversed%. That will be the left side of the email address, %Left% // 4. Parse right from @ until an illegal character is found. Capture legal characters in %Right% // 5. %Result% = %Left%@%Right% // 6. Store results. Delete the first @-sign in %Clip%. // 7. Repeat Steps 1 through 6 until all @-signs have been deleted. // Legal characters for email addresses Variable Set String %LegalChars% to "abcdefghijklmnopqrstuvwxyz1234567890._-" // Three legal characters cannot appear in the last position: period, hyphen, and underscore Variable Set String %IllegalLastChars% to "._-" Clipboard Copy Variable Set String %Clip% from the clipboard contents Variable Set String %Clip% to "john@johnny.ca, Sandy Smith <sandy@smith.com>; alan@cantoraccess.com." // Test Variable Set String %Clip% to "*%Clip%*" // Add an "illegal" character to the start and end of %Clip% // Loop through %Clip% until all @-signs are processed Repeat Until %Clip% Does not Contain "@" // Get position of first @-sign Variable Set Integer %CharsToAtSign% to the position of "@" in %Clip% Variable Set Integer %Count% to %CharsToAtSign% // Parse by character <-- RIGHT to LEFT, from the @-sign, until an illegal character is encountered Repeat Start (Repeat %CharsToAtSign% times) Variable Modify Integer %Count%: Decrement Variable Modify String: Copy part of text in %Clip% starting at %Count% and 1 characters long to %Char% If Variable %LegalChars% Contains "%Char%" Variable Modify String %LeftReversed%: Append Text (%Char%) Else Repeat Exit End If End Repeat // Reverse the order of %LeftReversed% to produce %Left%, the left side of the email address Variable Set Integer %LeftLength% to the length of variable %LeftReversed% Repeat Start (Repeat %LeftLength% times) Variable Modify String: Copy part of text in %LeftReversed% starting at %LeftLength% and 1 characters long to %Char% Variable Modify String %Left%: Append Text (%Char%) Variable Modify Integer: %LeftLength% = %LeftLength% - 1 End Repeat // Parse by character --> RIGHT to LEFT, from the @-sign, until an illegal character is encountered Variable Modify Integer: %CharsToAtSign% = %CharsToAtSign% + 1 Variable Set Integer %Pointer% to %CharsToAtSign% Variable Set Integer %ClipLength% to the length of variable %Clip% Variable Modify Integer: %MaxCharsToCheck% = %ClipLength% - %CharsToAtSign% Repeat Start (Repeat %MaxCharsToCheck% times) Variable Modify String: Copy part of text in %Clip% starting at %Pointer% and 1 characters long to %Char% If Variable %LegalChars% Contains "%Char%" Variable Modify String %Right%: Append Text (%Char%) Variable Modify Integer %Pointer%: Increment Else Repeat Exit End If End Repeat // Handle illegal characters in the last position in %Right% Variable Set Integer %RightLength% to the length of variable %Right% Variable Modify String: Copy part of text in %Right% starting at %RightLength% and 1 characters long to %Char% If Variable %IllegalLastChars% Contains "%Char%" Variable Modify String: Delete part of text from %Right% starting at %RightLength% and 1 characters long End If // Construct email address: %Left%@%Right% and store all addresses in %Results% Variable Set String %Result% to "%Left%@%Right%" Variable Modify String %Results%: Append Text (%Result% ) Text Box Display: Result // Delete the @-sign identified above from %Clip% Variable Modify String: Replace "@" in %Clip% with "" // Prepare for the next iteration Variable Set String %Left% to "" Variable Set String %LeftReversed% to "" Variable Set String %Right% to "" End Repeat Text Type (Use Clipboard and Paste Text): %Results% <COMMENT Value="Macro to \"scrape\" email addresses from selected text"/> <COMMENT/> <COMMENT Value="How to use it:"/> <COMMENT Value="1. Select text that contains email addresses."/> <COMMENT Value="2. Run this macro."/> <COMMENT Value="3. The list of email addresses will replace the selected text."/> <COMMENT/> <COMMENT Value="How it works (simplified)"/> <COMMENT Value=" 0. Copy selected text to variable %Clip% for processing."/> <COMMENT Value=" 1. Find leftmost @-sign in %Clip%."/> <COMMENT Value=" 2. Parse left from @ until illegal character encountered. Capture legal characters in %LeftReversed%"/> <COMMENT Value=" 3. Reverse %LeftReversed%. That will be the left side of the email address, %Left%"/> <COMMENT Value=" 4. Parse right from @ until an illegal character is found. Capture legal characters in %Right%"/> <COMMENT Value=" 5. %Result% = %Left%@%Right% "/> <COMMENT Value=" 6. Store results. Delete the first @-sign in %Clip%."/> <COMMENT Value=" 7. Repeat Steps 1 through 6 until all @-signs have been deleted."/> <COMMENT/> <COMMENT Value="Legal characters for email addresses"/> <VARIABLE SET STRING Option="\x00" Destination="%LegalChars%" Value="abcdefghijklmnopqrstuvwxyz1234567890._-" NoEmbeddedVars="FALSE"/> <COMMENT Value="Three legal characters cannot appear in the last position: period, hyphen, and underscore"/> <VARIABLE SET STRING Option="\x00" Destination="%IllegalLastChars%" Value="._-" NoEmbeddedVars="FALSE"/> <COMMENT/> <CLIPBOARD COPY/> <VARIABLE SET STRING Option="\x02" Destination="%Clip%" NoEmbeddedVars="FALSE"/> <VARIABLE SET STRING Option="\x00" Destination="%Clip%" Value="john@johnny.ca, Sandy Smith <sandy@smith.com>; alan@cantoraccess.com." NoEmbeddedVars="FALSE" _ENABLED="FALSE" _COMMENT="Test"/> <COMMENT/> <VARIABLE SET STRING Option="\x00" Destination="%Clip%" Value="*%Clip%*" NoEmbeddedVars="FALSE" _COMMENT="Add an \"illegal\" character to the start and end of %Clip%"/> <COMMENT/> <COMMENT Value="Loop through %Clip% until all @-signs are processed"/> <REPEAT UNTIL Variable="%Clip%" Condition="\x07" Value="@"/> <COMMENT/> <COMMENT Value="Get position of first @-sign"/> <VARIABLE SET INTEGER Option="\x0E" Destination="%CharsToAtSign%" Text_Variable="%Clip%" Text="@" Ignore_Case="FALSE"/> <VARIABLE SET INTEGER Option="\x00" Destination="%Count%" Value="%CharsToAtSign%"/> <COMMENT/> <COMMENT Value="Parse by character <-- RIGHT to LEFT, from the @-sign, until an illegal character is encountered"/> <REPEAT START Start="1" Step="1" Count="%CharsToAtSign%" Save="FALSE"/> <VARIABLE MODIFY INTEGER Option="\x08" Destination="%Count%"/> <VARIABLE MODIFY STRING Option="\x09" Destination="%Char%" Variable="%Clip%" Start="%Count%" Count="1" NoEmbeddedVars="FALSE"/> <IF VARIABLE Variable="%LegalChars%" Condition="\x06" Value="%Char%" IgnoreCase="TRUE"/> <VARIABLE MODIFY STRING Option="\x06" Destination="%LeftReversed%" Value="%Char%" NoEmbeddedVars="FALSE"/> <ELSE/> <REPEAT EXIT/> <END IF/> <END REPEAT/> <COMMENT/> <COMMENT Value="Reverse the order of %LeftReversed% to produce %Left%, the left side of the email address"/> <VARIABLE SET INTEGER Option="\x0D" Destination="%LeftLength%" Text_Variable="%LeftReversed%"/> <REPEAT START Start="1" Step="1" Count="%LeftLength%" Save="FALSE"/> <VARIABLE MODIFY STRING Option="\x09" Destination="%Char%" Variable="%LeftReversed%" Start="%LeftLength%" Count="1" NoEmbeddedVars="FALSE"/> <VARIABLE MODIFY STRING Option="\x06" Destination="%Left%" Value="%Char%" NoEmbeddedVars="FALSE"/> <VARIABLE MODIFY INTEGER Option="\x01" Destination="%LeftLength%" Value1="%LeftLength%" Value2="1"/> <END REPEAT/> <COMMENT/> <COMMENT Value="Parse by character --> RIGHT to LEFT, from the @-sign, until an illegal character is encountered"/> <VARIABLE MODIFY INTEGER Option="\x00" Destination="%CharsToAtSign%" Value1="%CharsToAtSign%" Value2="1"/> <VARIABLE SET INTEGER Option="\x00" Destination="%Pointer%" Value="%CharsToAtSign%"/> <VARIABLE SET INTEGER Option="\x0D" Destination="%ClipLength%" Text_Variable="%Clip%"/> <VARIABLE MODIFY INTEGER Option="\x01" Destination="%MaxCharsToCheck%" Value1="%ClipLength%" Value2="%CharsToAtSign%"/> <COMMENT/> <REPEAT START Start="1" Step="1" Count="%MaxCharsToCheck%" Save="FALSE"/> <VARIABLE MODIFY STRING Option="\x09" Destination="%Char%" Variable="%Clip%" Start="%Pointer%" Count="1" NoEmbeddedVars="FALSE"/> <IF VARIABLE Variable="%LegalChars%" Condition="\x06" Value="%Char%" IgnoreCase="TRUE"/> <VARIABLE MODIFY STRING Option="\x06" Destination="%Right%" Value="%Char%" NoEmbeddedVars="FALSE"/> <VARIABLE MODIFY INTEGER Option="\x07" Destination="%Pointer%"/> <ELSE/> <REPEAT EXIT/> <END IF/> <END REPEAT/> <COMMENT/> <COMMENT Value="Handle illegal characters in the last position in %Right% "/> <VARIABLE SET INTEGER Option="\x0D" Destination="%RightLength%" Text_Variable="%Right%"/> <VARIABLE MODIFY STRING Option="\x09" Destination="%Char%" Variable="%Right%" Start="%RightLength%" Count="1" NoEmbeddedVars="FALSE"/> <IF VARIABLE Variable="%IllegalLastChars%" Condition="\x06" Value="%Char%" IgnoreCase="FALSE"/> <VARIABLE MODIFY STRING Option="\x0A" Destination="%Right%" Start="%RightLength%" Count="1"/> <END IF/> <COMMENT/> <COMMENT Value="Construct email address: %Left%@%Right% and store all addresses in %Results%"/> <VARIABLE SET STRING Option="\x00" Destination="%Result%" Value="%Left%@%Right%" NoEmbeddedVars="FALSE"/> <VARIABLE MODIFY STRING Option="\x06" Destination="%Results%" Value="%Result%\r\n" NoEmbeddedVars="FALSE"/> <TEXT BOX DISPLAY Title="Result" Content="{\\rtf1\\ansi\\ansicpg1252\\deff0\\deflang1033{\\fonttbl{\\f0\\fnil\\fcharset0 Tahoma;}}\r\n\\viewkind4\\uc1\\pard\\lang4105\\f0\\fs28 %Result%\r\n\\par }\r\n" Left="821" Top="417" Width="1014" Height="216" Monitor="0" OnTop="TRUE" Keep_Focus="TRUE" Mode="\x00" Delay="0" _ENABLED="FALSE" _BACK="0080FFFF"/> <COMMENT/> <COMMENT Value="Delete the @-sign identified above from %Clip%"/> <VARIABLE MODIFY STRING Option="\x0F" Destination="%Clip%" ToReplace="@" All="FALSE" IgnoreCase="FALSE" NoEmbeddedVars="FALSE"/> <COMMENT/> <COMMENT Value="Prepare for the next iteration"/> <VARIABLE SET STRING Option="\x00" Destination="%Left%" NoEmbeddedVars="FALSE"/> <VARIABLE SET STRING Option="\x00" Destination="%LeftReversed%" NoEmbeddedVars="FALSE"/> <VARIABLE SET STRING Option="\x00" Destination="%Right%" NoEmbeddedVars="FALSE"/> <COMMENT/> <END REPEAT/> <COMMENT/> <TEXT TYPE Action="1" Text="%Results%"/> Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.