Jump to content
Macro Express Forums
acantor

Challenge: script to automatically output HTML code

Recommended Posts

 

Here is another Macro Express challenge: Can you write a script to automate the process of writing HTML code?

 

A hypertext link on a website looks something like this:

Welcome to the ABC 123 website

 

 

 

The words are underlined. Clicking the link will take you to a website, for example:

 

https://www.abc123.com

 

The HTML code that produces a hypertext link on a webpage is this:

 

<a href="https://www.abc123.com">Welcome to the ABC 123 website</a>

 

 

 

Create a Macro Express script that analyzes data in two locations:

 

 

1. The clipboard

 

2. The selection (i.e., whatever text is selected)

 

 

The script tests the clipboard to decide whether it is a web address or "ordinary text."

 

The script also tests the selection to decide whether it is a web address or "ordinary text." (The rules for deciding whether text is a web address or ordinary text need not be foolproof, but they should work most of the time!)

 

 

Once decided, the script outputs code for an HTML hypertext link:

<a href="WEB_ADDRESS">ORDINARY_TEXT</a>

 

If nothing is selected, the script applies the rules to the clipboard to decide whether it's a web address or ordinary text.  If it's a web address, the script outputs:

 

 

<a href="WEB_ADDRESS"></a> 

 

 

If it's ordinary text, the script outputs:

 

 

<a href="">ORDINARY_TEXT</a>

 

If both the clipboard and selection are empty, the script outputs:

 

 

<a href=""></a>

 

I first developed a Macro Express script to do this a decade ago, but the code was challenging to maintain and even harder to understand. It looked like a plate of spaghetti!

 

 

Recently, I totally rewrote the script to make it easy (or at least easier!) to understand.

 

 

How would you tackle this problem using only Macro Express? The goal is clarity!

 

I'll post my solution here after others have chimed in.

 

Share this post


Link to post
Share on other sites
4 hours ago, acantor said:

The goal is clarity!

Clarity is largely a matter of EXTENSIVE commenting.  Complex macros or routines often will have almost as many remarks as functioning code.

 

As to solving this, I have to say your Canadian Postal Code problem was fun and called for some cleverness.  This one appears to be mostly grunt work, not fun.🙄  I'd be delighted to do it for a hundred bucks an hour (US bucks, not Canadian -- hours can be measured on a US, Canadian, Slovakian, or 24-hour military clock).  But I can understand if you're not willing to pay that high since you've already written the code.🙃 

 

Still eager for future challenges, though.  😛  In fact, here's one: Clipboard contains a piece of text with one or potentially many embedded strings of one or more blanks.  Total text can be up to 32K characters.  A string of contiguous blanks can be up to 2K in length.  Write a macro that changes every string of blanks to a single blank, without otherwise changing the overall text string.  At the end of the macro, display a count of how many blanks have been removed.  The challenge is to do this in the fewest lines of code.     

 

Sorry, not trying to hijack your thread, I just don't want to work that hard tonight. 

Share this post


Link to post
Share on other sites

By posing challenges, I guess I run the risk of suggesting problems that fellow forum members might not find all that interesting!

 

You are so right about the importance of comments to make the code clear. But here is a hint: the clarity of my solution comes less from the comments I added, and more from my naming convention with variables.

 

And you are right about another thing: a complete solution would take grunt work. On the other hand, I don't think it's necessary to come up with a comprehensive solution. The hard part (for me) was figuring out the structure of the macro. Once I settled on its logic, the "grunt work" was straightforward. But I don't expect others to deal with every possibility... unless there are others who would actually find the macro useful. I wish I had had this macro when I redid my website earlier this year. It would have saved me days of persnickety hand-coding of my website.

 

Finally: I hope other forum members will post their challenges. How about posing your problem as a separate topic? It's a good one, and I'd be happy to try!

Share this post


Link to post
Share on other sites
9 hours ago, acantor said:

But here is a hint: the clarity of my solution comes less from the comments I added, and more from my naming convention with variables.

Sounds like COBOL code (showing my age, here).  You can define a "conditional" value.  For example, you have a numeric data element called Programmer's-Age.  Then you define the conditional under it, like "88 the-programmer-is-a-dinosaur   values 65 to 100".  Within the program logic, then, you can say neat stuff like

"if the-programmer-is-a-dinosaur

    goto eligible-for-social-security

        else

    goto standard-working-age." 

When people say COBOL code is like reading English, it all comes from the naming. 

 

Anyhow, the ability to use better naming is one thing that distinguishes ME Pro from my old ME 3 version.  I really would like to see your solution and your naming convention for this challenge, even though I'm too lazy to work on it myself. 

Share this post


Link to post
Share on other sites

Rberq, before I post my solution, let's wait to see if anyone takes up my challenge.

 

In the meantime, I have accepted your challenge. I assumed that "blank" means a space. It should not be difficult to adapt my script to check for carriage returns, new lines, Tab, non-breaking spaces, optional hyphens, and other "invisible" characters that might be considered blank.

 

My strategy: replace two spaces with one space, again and again, until all double spaces are gone. If x is the variable holding the entire string, I worked out that this operation must be repeated length(x) - 1 times, at most.

 

My solution consists of ten lines of code. I could have reduced it to nine by repeating length(x) times instead of length(x) - 1. But I decided to keep the extra instruction in the interest of mathematical rigour!

 

To keep the macro compact, I made no effort to reduce the number of loops. So it's done via "brute force." If the variable we are testing contains 1000 characters, the macro always loops 999 times, even if the problem is solved after 1 loop. Although my solution is inefficient, because Macro Express does string manipulations quickly so my script runs fast. I tested with a string of 100,000 characters, and Macro Express outputted the solution in about four seconds.

 

My guess is that there are more efficient ways to solve this puzzle using RegEx!

Variable Set String %a% to "A   B      C" // Uncomment lines for testing...
Variable Set String %a% to "A   B       C"
Variable Set String %a% to "A  B  C"
Variable Set String %a% to "A   B                                                                       C"
Variable Set String %a% from the clipboard contents
 
Variable Set String %b% to "%a%"
 
Variable Set Integer %LengthA% to the length of variable %a%
Variable Modify Integer: %Count% = %LengthA% - 1 // Number of passes to ensure each double-space in the string is replaced with a single space
 
Repeat Start (Repeat %Count% times)
  Variable Modify String: Replace "  " in %b% with " "
End Repeat
 
Variable Set Integer %LengthB% to the length of variable %b%
Variable Modify Integer: %Result% = %LengthA% - %LengthB%
Text Box Display: Spaces removed = %Result%
<VARIABLE SET STRING Option="\x00" Destination="%a%" Value="A   B      C" NoEmbeddedVars="FALSE" _ENABLED="FALSE" _COMMENT="Uncomment lines for testing..."/>
<VARIABLE SET STRING Option="\x00" Destination="%a%" Value="A   B       C" NoEmbeddedVars="FALSE" _ENABLED="FALSE"/>
<VARIABLE SET STRING Option="\x00" Destination="%a%" Value="A  B  C" NoEmbeddedVars="FALSE" _ENABLED="FALSE"/>
<VARIABLE SET STRING Option="\x00" Destination="%a%" Value="A   B                                                                       C" NoEmbeddedVars="FALSE" _ENABLED="FALSE"/>
<VARIABLE SET STRING Option="\x02" Destination="%a%" NoEmbeddedVars="FALSE"/>
<COMMENT/>
<VARIABLE SET STRING Option="\x00" Destination="%b%" Value="%a%" NoEmbeddedVars="FALSE"/>
<COMMENT/>
<VARIABLE SET INTEGER Option="\x0D" Destination="%LengthA%" Text_Variable="%a%"/>
<VARIABLE MODIFY INTEGER Option="\x01" Destination="%Count%" Value1="%LengthA%" Value2="1" _COMMENT="Number of passes to ensure each double-space in the string is replaced with a single space"/>
<COMMENT/>
<REPEAT START Start="1" Step="1" Count="%Count%" Save="FALSE"/>
<VARIABLE MODIFY STRING Option="\x0F" Destination="%b%" ToReplace="  " ReplaceWith=" " All="TRUE" IgnoreCase="FALSE" NoEmbeddedVars="FALSE"/>
<END REPEAT/>
<COMMENT/>
<VARIABLE SET INTEGER Option="\x0D" Destination="%LengthB%" Text_Variable="%b%"/>
<VARIABLE MODIFY INTEGER Option="\x01" Destination="%Result%" Value1="%LengthA%" Value2="%LengthB%"/>
<TEXT BOX DISPLAY Title="Spaces removed = %Result%" Content="{\\rtf1\\ansi\\ansicpg1252\\deff0\\deflang1033{\\fonttbl{\\f0\\fnil\\fcharset0 Tahoma;}{\\f1\\fnil Tahoma;}}\r\n\\viewkind4\\uc1\\pard\\lang4105\\f0\\fs20 Finish = [%b%]\r\n\\par Begin = [%a%]\\lang1033\\f1 \r\n\\par }\r\n" Left="110" Top="417" Width="1816" Height="285" Monitor="0" OnTop="TRUE" Keep_Focus="TRUE" Mode="\x00" Delay="0"/>

 

Share this post


Link to post
Share on other sites

Your solution was almost identical to mine, except I put no limit on the number of loops.  Instead, I included an IF to see when to exit the loop.  Here are the "business" lines of code, without the housekeeping and displaying of counters:

 

Repeat Until %N99% <> %N99%
  If Variable %T1% contains "  "
    Replace "  " with " " in %T1% (replace-all-instances option)
  Else
    Repeat Exit
  End If
Repeat End

 

With your macro, you can probably get away with far fewer passes than (length minus 1).  If you use the "replace all instances" option of the Replace command, then each time through the repeat loop cuts each space string roughly in half.  So the number of repeats needed is a power of 2 rather than based on overall string length.  For example, I made a string of about 78K bytes, whose longest contiguous string of spaces was about 41K.  It required only 16 iterations of the Replace command to do the job.  Which makes sense, because 2 to the 16th power is 64K.  

Since my "challenge" specified a maximum of 2K contiguous spaces, your macro should be successful if you always repeated 11 times (maybe 12?) (2 to 11th power is 2K).  Knowing that, you could get rid of the lines that worry about overall string length, and you would beat me by a couple lines of code.  You win, pending any other entries.   

Share this post


Link to post
Share on other sites

After I posted my solution but before I saw yours, I modified my macro to develop a feeling for the number of loops that are needed to remove the extra spaces. For a string of 100,000, it's around 12.

 

So when my macro is parsing a string of 100,000 characters, it cycles through the repeat loop an extra 99,987 times!

 

In this case, the time penalty for brute force computing is trivial: my macro loops 25,000 times per second. The four-second wait for my script to process 100,000 characters is not an issue. But what if we wanted to process 100 million or 100 billion characters?

 

Probably the way forward is to jettison the requirement for compactness, and add calculations to determine the maximum number of loops, which (I think) is a function of the length of the string. I'm not quite sure how to do that. I used to have reasonable mathematical intuitions, but it's been decades since I last studied the subject. My math skills have become rusty and my mathematical understanding is tenuous!

 

Rberq, thank you for posting this interesting and irresistible challenge!

Share this post


Link to post
Share on other sites

Too much else on my plate to spend serious time on it. But I’m wondering about another approach: replace all spaces with CRLF, then use one of the text processing loops to add a single space to the resultant ‘lines’.

Share this post


Link to post
Share on other sites
1 hour ago, acantor said:

Probably the way forward is to jettison the requirement for compactness, and add calculations to determine the maximum number of loops, which (I think) is a function of the length of the string. I'm not quite sure how to do that. I used to have reasonable mathematical intuitions, but it's been decades since I last studied the subject. My math skills have become rusty and my mathematical understanding is tenuous!

I think maximum required loops depends on the length of the longest string of blanks, not the length of the entire string.  But by the time a macro did the counting, the brute-force method could be all done processing.  As you say, often the time penalty for brute force computing is trivial.  When I started programming, I worked with a guy who was immensely proud if he could cut a routine from, say, 44 to 42 bytes of machine-language code, to reduce run time.  The downside was, the next programmer to work with his stuff could spend hours or days trying to decipher his magic, because he also felt program logic was self-documenting and comments were for sissies.  🙂

Share this post


Link to post
Share on other sites



I think maximum required loops depends on the length of the longest string of blanks, not the length of the entire string. 

 

My guess is that the maximum number of loops is related to the length of the entire string, not to the longest string of blanks.

 

The two "best case" scenarios: either there are no spaces in the string, or there are only single spaces in the string. The job is complete before the first loop even happens.

 

If there is one set of two consecutive spaces, the job is complete after one loop. If there are two sets of two consecutive spaces, the job is complete after two loops.

 

The "worst case" scenario is when the string consists entirely of spaces. The number of loops needed to strip away all of the extra spaces is, I believe, length - 1.

 

I think the formula to determine the number of loops might be this:

 

loops - log base 2 (length of string)

 

The result for 100,000 is 16.6. For 1,000, it's 9.97. Macro Express doesn't support logarithms, but I think I'm on track to figuring out approximate values.

Share this post


Link to post
Share on other sites

[q]

 But I’m wondering about another approach: replace all spaces with CRLF, then use one of the text processing loops to add a single space to the resultant ‘lines’

[/q]

 

Terry, I hope you find the time to give these two challenges a try. My initial thinking about rberq's problem was similar to what you are proposing.

Share this post


Link to post
Share on other sites
1 hour ago, acantor said:

The "worst case" scenario is when the string consists entirely of spaces. The number of loops needed to strip away all of the extra spaces is, I believe, length - 1.

 

Remember the story where Joe says, "I will work for you starting for penny a day, but you must double my pay every day."  So day 2, Joe earns 2 cents.  Day 3, 4 cents.  Day 4, 8 cents.  Then 16 cents, then 32, then 64, then $1.28, then $2.56.  Keep doubling, and in a few weeks Joe is very very wealthy.  The space-replacement works just the reverse: the total spaces are DIVIDED by 2 each time the macro says "replace every instance of space-space by space." 

 

I created a text string of about 120K, entirely spaces, and here's the text box display at the finish of the macro:

   Beginning text length = 121311
   Ending text length      = 1
   Spaces removed         = 121310

   Number of loops         = 17

 

2 to the power 17 is 132K.  So if my starting text string was more than 132K, there would be 18 loops.  To test, I doubled my text string, and voila:

   Beginning text length = 242622
   Ending text length       = 1
   Spaces removed         = 242621

   Number of loops         = 18

 

If you DON'T use the "replace all instances" option of the ME command, then I believe the number of loops needed is (length - 1) as you suggest. 

 

 

//  
// Convert all multi-space strings to a single string -- starting with text in clipboard
Variable Set String %T1% from Clipboard
// Initialize counter for number of loops
Variable Set Integer %N20% to 0
// Save length of beginning text
Variable Set Integer %N1% from Length of Variable %T1%
// Repeat until there are no multi-space strings, then put the result back into the clipboard
Repeat Until %N99% <> %N99%
  If Variable %T1% contains "  "
    Replace "  " with " " in %T1%
    Variable Modify Integer: Inc (%N20%)
  Else
    Repeat Exit
  End If
Repeat End
Variable Modify String: Save %T1% to Clipboard
// Save length of ending text
Variable Set Integer %N2% from Length of Variable %T1%
// Display beginning and ending text lengths and number of spaces removed
Variable Modify Integer: %N3% = %N1% - %N2%
Text Box Display: Result
//  
Macro Return
// 

 

 

Share this post


Link to post
Share on other sites
15 hours ago, terrypin said:

Too much else on my plate to spend serious time on it. But I’m wondering about another approach: replace all spaces with CRLF, then use one of the text processing loops to add a single space to the resultant ‘lines’.

I'm thinking Terry has us both beat.  Should be a two-line macro:

Replace all spaces with CRLF.

Variable Modify String [Strip CR/LF].

 

Bedtime now, but I'll try it tomorrow.

 

strip.JPG

 

EDIT: Well, so much for my midnight enthusiasm.  In the light of day, I can't see how it would work.

The following would ALMOST work:

1) Replace any existing CRLF by x'01'

2) Replace all spaces by CRLF

3) Strip all CRLF

4) Restore original existing CRLF by replacing x'01' by CRLF

But that would remove ALL spaces, and not leave one space where each string of spaces originally existed. 

No prize yet, Terry, unless you can see your way out of this. 🙄

 

EDIT AGAIN: Actually the above ALMOST works, if (step 2) all DOUBLE spaces are replaced by CRLF.  But some strings of spaces have an even number of spaces, and some have an odd number of spaces.  The above logic leaves odd-number strings with a single space, as desired.  But it completely removes even-number strings.  Have to find a way around that -- still thinking....  It runs pretty fast, though.  Have to go work in the garden.  I'll probably plant the flowers upside down if my mind is on this problem.

  • Haha 1

Share this post


Link to post
Share on other sites

Here is my latest version, which calculates the minimum number of times to loop instead of what we have been doing until now:

 

1. My first example always loops the maximum number of times needed to strip away extra spaces (Length of string - 1)

 

2. Your example checks the result each pass through the loop for evidence that the work is done: the result does not contain two spaces.

 

So this latest attempt tries to simplify the repeat loop!

 

To calculate the number of times to loop, I used Log Base 2 (Length of String). Since Macro Express does not have a Log function, I estimated the value by repeatedly halving the length until it was one or less. (I use a Decimal variable for this). When I tested with a string length of 242,622 (as you did), the calculated value was 18.
 

Although I did not measure execution speed, my impression is that this solution doesn't run much faster than my first solution, which is surprising: for a string of 100,000 characters, my first solution looped 99,999 times, and my new solution looped 17 times. I guess the overhead is not caused by looping, but by manipulating the string.

 

I hope my fake logarithm calculator is right! It's been years since I have thought about logarithms!

 

Variable Set String %a% from the clipboard contents
Variable Set String %b% to "%a%"
 
Variable Set Integer %LengthA% to the length of variable %a%
 
// Estimate the number of passes to strip out extra spaces
// Maximum number of passes is approximately Log Base 2 (Length of String)
// Estimate by repeatedly halving value of Length until it's 1 or less
Variable Set Integer %Count% to 0
Variable Set Decimal %PseudoLogBase2% to %LengthA%
Repeat Until %PseudoLogBase2% Is Less Than or Equal To "1"
  Variable Modify Decimal: %PseudoLogBase2% = %PseudoLogBase2% / 2
  Variable Modify Integer %Count%: Increment
End Repeat
 
Repeat Start (Repeat %Count% times)
  Variable Modify String: Replace "  " in %b% with " "
End Repeat
 
Variable Set Integer %LengthB% to the length of variable %b%
Variable Modify Integer: %Result% = %LengthA% - %LengthB%
Text Box Display: Spaces removed = %Result%
<VARIABLE SET STRING Option="\x02" Destination="%a%" NoEmbeddedVars="FALSE"/>
<VARIABLE SET STRING Option="\x00" Destination="%b%" Value="%a%" NoEmbeddedVars="FALSE"/>
<COMMENT/>
<VARIABLE SET INTEGER Option="\x0D" Destination="%LengthA%" Text_Variable="%a%"/>
<COMMENT/>
<COMMENT Value="Estimate the number of passes to strip out extra spaces"/>
<COMMENT Value="Maximum number of passes is approximately Log Base 2 (Length of String)"/>
<COMMENT Value="Estimate by repeatedly halving value of Length until it's 1 or less"/>
<VARIABLE SET INTEGER Option="\x00" Destination="%Count%" Value="0"/>
<VARIABLE SET DECIMAL Option="\x00" Destination="%PseudoLogBase2%" Value="%LengthA%"/>
<REPEAT UNTIL Variable="%PseudoLogBase2%" Condition="\x05" Value="1"/>
<VARIABLE MODIFY DECIMAL Option="\x03" Destination="%PseudoLogBase2%" Value1="%PseudoLogBase2%" Value2="2"/>
<VARIABLE MODIFY INTEGER Option="\x07" Destination="%Count%"/>
<END REPEAT/>
<COMMENT/>
<REPEAT START Start="1" Step="1" Count="%Count%" Save="TRUE" Variable="%Loops%"/>
<VARIABLE MODIFY STRING Option="\x0F" Destination="%b%" ToReplace="  " ReplaceWith=" " All="TRUE" IgnoreCase="FALSE" NoEmbeddedVars="FALSE"/>
<END REPEAT/>
<COMMENT/>
<VARIABLE SET INTEGER Option="\x0D" Destination="%LengthB%" Text_Variable="%b%"/>
<VARIABLE MODIFY INTEGER Option="\x01" Destination="%Result%" Value1="%LengthA%" Value2="%LengthB%"/>
<TEXT BOX DISPLAY Title="Spaces removed = %Result%" Content="{\\rtf1\\ansi\\ansicpg1252\\deff0\\deflang1033{\\fonttbl{\\f0\\fnil\\fcharset0 Tahoma;}{\\f1\\fnil Tahoma;}}\r\n\\viewkind4\\uc1\\pard\\lang4105\\f0\\fs20 Length = %LengthA%\r\n\\par Loops to strip out extra spaces = %Count%\r\n\\par Finish = [%b%]\r\n\\par Begin = [%a%]\\lang1033\\f1 \r\n\\par }\r\n" Left="110" Top="417" Width="1816" Height="285" Monitor="0" OnTop="TRUE" Keep_Focus="TRUE" Mode="\x00" Delay="0"/>

 

Share this post


Link to post
Share on other sites

@rberq
 

Yep, I'm struggling too! Motivated by your midnight post (and rain cutting short my garden chores) I've just spent a frustrating hour, but no joy yet.

Share this post


Link to post
Share on other sites
4 hours ago, acantor said:

I hope my fake logarithm calculator is right! It's been years since I have thought about logarithms!

Yes, appears to be correct, and by calculating the minimum number of time you MUST go through the repeat loop, you very nicely minimize execution time. 

Well .... not really minimize, because you will always loop enough times to handle the worst-case condition where the entire string, or most of it, consists of spaces.  Probably still faster than my version, which checks on every pass whether any double spaces still remain.  My version potentially saves passes through the "Replace" command, but at the expense of re-scanning the string each time to see if it's time to quit. 

Share this post


Link to post
Share on other sites

Here are some thoughts on optimizing performance of this kind of script:

 

1. There appears to be a limit on the number of characters that 2020-era operating systems and hardware can handle when solving a problem like this: If my log calculator is right, it takes 17 passes to process a string of 100 thousand characters, 20 passes for a million characters, 24 passes for a billion characters, and 56 for 100 quadrillion (100,000,000,000,000,000).

 

(I checked an online log calculator and discovered that my homemade log estimator fails when the number of digits is greater than 2 ^ 31. Oh well!)

 

2. But string manipulation seems to be computationally more intensive than looping, at least with Macro Express. There may not be enough time to process very long strings before the universe collapses back to its primordial beginnings.

 

3. So for practical purposes, one could repeat, say, 24 times. Then we could delete my log calculator and stop testing every pass through the loop.

 

4. This might be the best way to meet rberq's requirement for a solution with the fewest lines of code. Only nine lines:

 

 

 

Variable Set String %a% from the clipboard contents
Variable Set String %b% to "%a%"
Variable Set Integer %LengthA% to the length of variable %a%
 
Repeat Start (Repeat 24 times)
  Variable Modify String: Replace "  " in %b% with " "
End Repeat
 
Variable Set Integer %LengthB% to the length of variable %b%
Variable Modify Integer: %Result% = %LengthA% - %LengthB%
Text Box Display: Spaces removed = %Result%


<VARIABLE SET STRING Option="\x02" Destination="%a%" NoEmbeddedVars="FALSE"/>
<VARIABLE SET STRING Option="\x00" Destination="%b%" Value="%a%" NoEmbeddedVars="FALSE"/>
<VARIABLE SET INTEGER Option="\x0D" Destination="%LengthA%" Text_Variable="%a%"/>
<COMMENT/>
<REPEAT START Start="1" Step="1" Count="24" Save="FALSE" Variable="%Loops%"/>
<VARIABLE MODIFY STRING Option="\x0F" Destination="%b%" ToReplace="  " ReplaceWith=" " All="TRUE" IgnoreCase="FALSE" NoEmbeddedVars="FALSE"/>
<END REPEAT/>
<COMMENT/>
<VARIABLE SET INTEGER Option="\x0D" Destination="%LengthB%" Text_Variable="%b%"/>
<VARIABLE MODIFY INTEGER Option="\x01" Destination="%Result%" Value1="%LengthA%" Value2="%LengthB%"/>
<TEXT BOX DISPLAY Title="Spaces removed = %Result%" Content="{\\rtf1\\ansi\\ansicpg1252\\deff0\\deflang1033{\\fonttbl{\\f0\\fnil\\fcharset0 Tahoma;}{\\f1\\fnil Tahoma;}}\r\n\\viewkind4\\uc1\\pard\\lang4105\\f0\\fs20 Finish = [%b%]\r\n\\par Begin = [%a%]\\lang1033\\f1 \r\n\\par }\r\n" Left="110" Top="417" Width="1816" Height="285" Monitor="0" OnTop="TRUE" Keep_Focus="TRUE" Mode="\x00" Delay="0"/>

 

5. If there is an elegant RegEx solution, the contributors of this forum who know how to implement it have been silent (so far!)

Share this post


Link to post
Share on other sites

"5. If there is an elegant RegEx solution, the contributors of this forum who know how to implement it have been silent (so far!)"

 

I'd assumed it had to be done entirely in MXPro? I'm off to bed shortly in the hope that I'll dream up the bit of my approach that's evading me. But if a paste into my text editor is not breaking the rules I'll see if Regex can solve it.

 

EDIT:

OK, it came to me a few minutes later. This Regex should do it
Find
([^ ])([ ]+)
and replace with
\1

 

(That's \1 plus a space.)

 

In english: Find string #1 that is 'not a space', followed by string #2 that is 'one or more spaces', and replace it with string #1 followed by one space.

 

But the purist in me still wants it done in-house. 😉

 

Tomorrow...

 

 

Share this post


Link to post
Share on other sites
4 hours ago, acantor said:

This might be the best way to meet rberq's requirement for a solution with the fewest lines of code. Only nine lines

 

We can cut your nine lines down to eight, if we eliminate
Variable Set String %b% to "%a%"
and then do all operations on %a%.  That should help us finish before the universe either collapses or squirts through a black hole into somebody else's universe.  

 

I tried all day to figure out how to use Terry's CRLF idea, just because I wanted so badly to use the Strip feature.  Couldn't come up with a reasonable way to deal with even-length space strings.

 

I did find that my version runs considerably faster if I precede my Repeat loop with the following series -- but only if it's a VERY large string and contains VERY long strings of contiguous blanks.  I realize this takes me out of contention for the fewest-statements prize, but you've now got me worried about the cosmic deadline so I feel it's a worthwhile addition.

// Initial removal of long space strings
Replace "                              " with " " in %T1%
Replace "                             " with " " in %T1%
Replace "                            " with " " in %T1%
Replace "                           " with " " in %T1%
Replace "                          " with " " in %T1%
Replace "                         " with " " in %T1%
Replace "                        " with " " in %T1%
Replace "                       " with " " in %T1%
Replace "                      " with " " in %T1%
Replace "                     " with " " in %T1%
Replace "                    " with " " in %T1%
Replace "                   " with " " in %T1%
Replace "                  " with " " in %T1%
Replace "                 " with " " in %T1%
Replace "                " with " " in %T1%
Replace "               " with " " in %T1%
Replace "              " with " " in %T1%
Replace "             " with " " in %T1%
Replace "            " with " " in %T1%
Replace "           " with " " in %T1%
Replace "          " with " " in %T1%
Replace "         " with " " in %T1%
Replace "        " with " " in %T1%
Replace "       " with " " in %T1%
Replace "      " with " " in %T1%
Replace "     " with " " in %T1%
Replace "    " with " " in %T1%
Replace "   " with " " in %T1%
Replace "  " with " " in %T1%

 

  • Haha 1

Share this post


Link to post
Share on other sites



But the purist in me still wants it done in-house.

 

I think I mentioned on another thread that I find it satisfying to work within the limitations of Macro Express. I learn a lot by pushing Macro Express to its limits. So when I discovered there were no logarithms in Macro Express, I was strongly motivated to come up with my own log calculator.

 

I've had reason to use RegEx in at least three applications, and I'm fairly confident I could figure out how to implement it with the Macro Express External Script command. (Did you know that Microsoft Word supports RegEx searches? However, Microsoft doesn't call it that.) But when I'm working with Macro Express, like you, Terry, I prefer to do everything "in-house."

Share this post


Link to post
Share on other sites

I ran that Regex in my text editor, TextPad, on many files varying in size, number of lines, words and successive spaces. Most were almost immediate in response. Number of lines seemed dominant factor. But note also that I added spaces to only a single line in each file, although some of them also already had longish sequences of spaces. Here are four example results:

#1 7 KB,  15 lines,  6.7K chars,  most spaces 6.4K = Under 1 s
#2 289 KB, 60 lines, 289K chars,  most spaces 6.4K = Under 1 s
#3 111 KB, 282 lines, 15K chars,  most spaces 101K  = Under 1 s
#4 471 KB, 11K lines, 443K chars,  most spaces 101K  = 5 s

Those timings would obviously increase if I used a macro to apply the Replace command.

 

Haven't quite given up on my 'MXPro only' method, but close to doing so!

 

EDIT: Someone I contacted with real regex skill pointed out a much simpler setting:

Find: _+
Replace with:_

(where '_' is a space.)

 

 

Share this post


Link to post
Share on other sites

I haven't given up on going solo with Macro Express, either.

 

After discovering my logarithmic calculator breaks with inputs of very large numbers, I decided to opt for a solution that won't fail, even if the macro needs cosmic time scales to complete. :)

 

So I came up with this. Instead of comparing the before and after strings at the start of each pass, this version compares the before and after string lengths. So for an initial string of 100k characters, it compares six-digit, three-digit, etc. integers instead of 100k, 50k, etc. long strings. Not sure this is really an efficiency gain. Maybe I'll test this one day.

Variable Set String %a% from the clipboard contents
Variable Set String %b% to "%a%"
Variable Set Integer %LengthA% to the length of variable %a%
 
Repeat Until %LengthB% Equals "%LengthC%"
  Variable Set Integer %LengthB% to the length of variable %b%
  Variable Modify String: Replace "  " in %b% with " "
  Variable Set Integer %LengthC% to the length of variable %b%
End Repeat
 
Variable Set Integer %LengthB% to the length of variable %b%
Variable Modify Integer: %Result% = %LengthA% - %LengthB%
Text Box Display: Spaces removed = %Result%

<VARIABLE SET STRING Option="\x02" Destination="%a%" NoEmbeddedVars="FALSE"/>
<VARIABLE SET STRING Option="\x00" Destination="%b%" Value="%a%" NoEmbeddedVars="FALSE"/>
<VARIABLE SET INTEGER Option="\x0D" Destination="%LengthA%" Text_Variable="%a%"/>
<COMMENT/>
<REPEAT UNTIL Variable="%LengthB%" Condition="\x00" Value="%LengthC%"/>
<VARIABLE SET INTEGER Option="\x0D" Destination="%LengthB%" Text_Variable="%b%"/>
<VARIABLE MODIFY STRING Option="\x0F" Destination="%b%" ToReplace="  " ReplaceWith=" " All="TRUE" IgnoreCase="FALSE" NoEmbeddedVars="FALSE"/>
<VARIABLE SET INTEGER Option="\x0D" Destination="%LengthC%" Text_Variable="%b%"/>
<END REPEAT/>
<COMMENT/>
<VARIABLE SET INTEGER Option="\x0D" Destination="%LengthB%" Text_Variable="%b%"/>
<VARIABLE MODIFY INTEGER Option="\x01" Destination="%Result%" Value1="%LengthA%" Value2="%LengthB%"/>
<TEXT BOX DISPLAY Title="Spaces removed = %Result%" Content="{\\rtf1\\ansi\\ansicpg1252\\deff0\\deflang1033{\\fonttbl{\\f0\\fnil\\fcharset0 Tahoma;}{\\f1\\fnil Tahoma;}}\r\n\\viewkind4\\uc1\\pard\\lang4105\\f0\\fs20 Finish = [%b%]\r\n\\par Begin = [%a%]\\lang1033\\f1 \r\n\\par }\r\n" Left="110" Top="417" Width="1816" Height="285" Monitor="0" OnTop="TRUE" Keep_Focus="TRUE" Mode="\x00" Delay="0"/>

Finally, a challenge within the challenge: I realize now that there is a way to reduce the number of lines of every example we have posted by one. What is that method? (Hint: It's not a particularly practical method so you might not use it in practice. But it does work.)

 

Share this post


Link to post
Share on other sites

I'm reluctantly giving up on my method for an MXPro only solution; life's too short!

 

Alan: Am I right that your current solution is for a single line of text? I've been assuming multiple lines; essentially a file of text.

 

Share this post


Link to post
Share on other sites
3 hours ago, terrypin said:

Those timings would obviously increase if I used a macro to apply the Replace command.

Not necessarily.  My macro runs in about a second on a 91K file with not too many long space strings.  It runs in three seconds on a 335K file that is entirely spaces.  When the ME script runs, the hard work must be done by a pre-compiled routine within ME, so it's not necessarily slower than a RegEx command which likewise (I assume) uses pre-compiled logic. 

Share this post


Link to post
Share on other sites
14 minutes ago, terrypin said:

Alan: Am I right that your current solution is for a single line of text? I've been assuming multiple lines; essentially a file of text.

 

 

rberq's challenge is to eliminate double spaces from the contents of the clipboard.

 

To test, I've been using single lines consisting of alpha-numerics and spaces. But there is nothing in the articulation of the challenge that would forbid multiple lines of text in the clipboard.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

×
×
  • Create New...