Jump to content
Macro Express Forums

Extracting Text Between Tags


HeyJim

Recommended Posts

Thanks to some help on this board before I was able to write a macro that has saved me a ton of work and more time then I could ever accurately estimate. It was my first.

 

My next little project includes looking at the source code of a web page and extracting everything that falls between two tags and then stripping the <b> tags out and pasting or saving the results to a word processor document.

 

For instance, I think what I want the macro to do is go through a page of source code and when it finds "onMouseOut="cs()">Jumbo Jumping <b>Frogs</b></a>" (the quotes won't be there) see "onMouseOut="cs()"> as the flag that it should copy the text up to </a>, spit out the bold tags and return "Jumbo Jumping Frogs.

 

I really don't have a clue as to the concept involved here or even how to place the 100+ resultant phrases per web page into a document or spreadsheet. Hmmm. Is the easiest way to save the results as a csv file?

 

Any guidance on this will be really appreciated.

Link to comment
Share on other sites

HeyJim -

 

What you want to do seems fairly simple and can be done within a loop by using the built-in Macro Express string manipulation commands. For instance, you would set and integer to the position of the text "onMouseOut". Then you would delete the string to that point. Now you would track down the ending tag "</a>". This time you would save that portion of the string. The Replace Substring command is ideal for replacing the bold tags with "nothing".

 

This sounds like a lot and that it would take forever to process this kind of string, but trust me, it will work faster than you can possibly imagine.

Link to comment
Share on other sites

Here's an example derived from Floyd's type of approach;

You would not be doing it in "repeat"; defining each string to find etc as you go?

String Splitting

Variable Set String %T98% from Clipboard

// Set web source to clipboard, then set variable from clipboard; here just example

Variable Set String %T98% "rubbish..."onMouseOut="cs()">Jumbo Jumping <b>Frogs1</b></a>gsdsdghumbusfgs"onMouseOut="cs()">Jumbo Jumping <b>Frogs2</b></a>....."onMouseOut="cs()">Jumbo Jumping <b>Frogs3</b></a>"

Variable Set String %T9% ""onMouseOut="cs()"

Variable Set Integer %N3% from Length of Variable %T9%

Variable Modify Integer: Inc (%N3%)

Variable Set String %T10% "</a>"

Variable Modify String: Copy %T98% to %T97%

Replace "<b>" with "" in %T97%

Replace "</b>" with "" in %T97%

Repeat Until %N1% = 0

  Variable Set Integer %N1% from Position of Text in Variable %T97%

  If Variable %N1% > 0

    Variable Modify Integer: %N1% = %N1% + %N3%

    Variable Modify String: Delete Part of %T97%

    Text Box Display:

    Variable Set Integer %N1% from Position of Text in Variable %T97%

    Variable Modify Integer: Dec (%N1%)

    Variable Modify String: Copy Part of %T97% to %T1%

    Variable Modify Integer: %N1% = %N1% + 3

    Variable Modify String: Delete Part of %T97%

    Variable Modify String: Append "%T1%|" to %T2%

  End If

Repeat End

Variable Set Integer %N2% from Length of Variable %T2%

Variable Modify Integer: Dec (%N2%)

Variable Modify String: Copy Part of %T2% to %T1%

Text Box Display:

<DIS:<TVAR2:98:03:><REM2:Set web source to clipboard, then set variable from clipboard; here just example><TVAR2:98:01:rubbish..."onMouseOut="cs()">Jumbo Jumping <b>Frogs1</b></a>gsdsdghumbusfgs"onMouseOut="cs()">Jumbo Jumping <b>Frogs2</b></a>....."onMouseOut="cs()">Jumbo Jumping <b>Frogs3</b></a>><TVAR2:09:01:"onMouseOut="cs()><IVAR2:03:12:9><NMVAR:08:03:0:0000001:0:0000000><TVAR2:10:01:</a>><TMVAR2:09:97:98:000:000:><TMVAR2:21:97:01:001:000:<b>><TMVAR2:21:97:01:001:000:</b>><REP3:08:000001:000002:0001:0:01:0><IVAR2:01:13:97:%T9%><IFVAR2:2:01:4:0><NMVAR:01:01:1:0000001:1:0000003><TMVAR2:11:97:00:001:N01:><DIS:<TBOX4:T:1:000075000252000899000337:000:T98=T%98%
T97=%T97%
T2=%T2%
T1=%T1%><IVAR2:01:13:97:%T10%><NMVAR:09:01:0:0000001:0:0000000><TMVAR2:10:01:97:001:N01:><NMVAR:01:01:1:0000001:2:0000003><TMVAR2:11:97:00:001:N01:><TMVAR2:07:02:00:000:000:%T1%|><ENDIF><ENDREP><IVAR2:02:12:2><NMVAR:09:02:0:0000001:0:0000000><TMVAR2:10:01:02:001:N02:><TBOX4:T:1:000075000252000899000337:000:T98=%T98%
T97=%T97%
T2=%T2%
T1=%T1%>

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...