Jump to content
Macro Express Forums

Extract All Links From A Google Search


Recommended Posts

This macro will:

1. Load the source html of a Google search result page,

2. extract all urls from the html,

3. filter out Google-related links (e.g. About Google, Advanced search, cached pages, etc),

4. and display a list of urls it managed to extract.

 

This macro only works with Firefox, and I've set the scope for only Google search result pages. You can easily change this, of course.

 

I'd stopped using IE months ago, so I'll leave it to the holdouts to modify this script for IE or other browsers. Should be really trivial though.

 

This script has no error-checking, and may explode if it hits bad HTML. However, this is usually not a problem for machine-generated html code like Google's results. You might also need to tweak the delays for slower machines.

 

To use:

 

1. Search for anything in Google, using Mozilla Firefox

2. Wait for page to completely finish loading.

3. Hit Ctrl-Tab

4. URL extraction results should be displayed in under 2 seconds.

 

The url extraction works by finding pairs of href="http and "> , which indicate a clickable link for example,

<a href="http://www.macros.com/">

 

-Lemming

 

Clear Text Variables: All
Clipboard Empty
// Define CR/LF
Variable Set %T95% to ASCII Char of 13
Variable Set %T96% to ASCII Char of 10
Variable Set String %T95% "%T95%%T96%"
// obtain source code
Text Type: <CONTROL>u
Wait For Window Title: "view-source"
Delay 20 Milliseconds
Text Type: <CONTROL>a
Delay 250 Milliseconds
Clipboard Copy
Delay 100 Milliseconds
Variable Set String %T1% from Clipboard
Window Close: "view-source"
Delay 10 Milliseconds
// Process urls
Variable Set String %T99% "CONTINUE"
Repeat Until %T99% = "STOP"
   Variable Set String %T98% "NORMAL LINK"
 // Look for  href="http
 Variable Set Integer %N1% from Position of Text in Variable %T1%
 If Variable %N1% = 0
   Variable Set String %T99% "STOP"
 End If
 // Delete everything up till first http
 Variable Modify Integer: %N1% = %N1% + 5
 Variable Modify String: Delete Part of %T1%
 // Look for ">
 Variable Set Integer %N2% from Position of Text in Variable %T1%
 // calc length of url
 Variable Modify Integer: %N3% = %N2% - 1
 // copy url
 Variable Modify String: Copy Part of %T1% to %T2%
 // Filter out Google links
 If Variable %T2% contains "google.com"
   // Indicate this is a Google link
   Variable Set String %T98% "GOOGLE LINK"
 End If
 If Variable %T2% contains "q=cache"
   // Indicate this is a Google link
   Variable Set String %T98% "GOOGLE LINK"
 End If
 If Variable %T98% = "GOOGLE LINK"
   // don't add to list
 Else
   // append url to T3, with CRLF
   Variable Set String %T3% "%T3%%T2%%T95%"
 End If
Repeat End
// display results
Text Box Display: URLs obtained from Google

parse_links_v0.2.mex

Link to comment
Share on other sites

  • 2 years later...
This macro will:

1. Load the source html of a Google search result page,

2. extract all urls from the html,

3. filter out Google-related links (e.g. About Google, Advanced search, cached pages, etc),

4. and display a list of urls it managed to extract.

 

This macro only works with Firefox, and I've set the scope for only Google search result pages. You can easily change this, of course.

 

I'd stopped using IE months ago, so I'll leave it to the holdouts to modify this script for IE or other browsers. Should be really trivial though.

 

This script has no error-checking, and may explode if it hits bad HTML. However, this is usually not a problem for machine-generated html code like Google's results. You might also need to tweak the delays for slower machines.

 

To use:

 

1. Search for anything in Google, using Mozilla Firefox

2. Wait for page to completely finish loading.

3. Hit Ctrl-Tab

4. URL extraction results should be displayed in under 2 seconds.

 

The url extraction works by finding pairs of href="http and "> , which indicate a clickable link for example,

<a href="http://www.macros.com/">

 

-Lemming

 

Clear Text Variables: All
Clipboard Empty
// Define CR/LF
Variable Set %T95% to ASCII Char of 13
Variable Set %T96% to ASCII Char of 10
Variable Set String %T95% "%T95%%T96%"
// obtain source code
Text Type: <CONTROL>u
Wait For Window Title: "view-source"
Delay 20 Milliseconds
Text Type: <CONTROL>a
Delay 250 Milliseconds
Clipboard Copy
Delay 100 Milliseconds
Variable Set String %T1% from Clipboard
Window Close: "view-source"
Delay 10 Milliseconds
// Process urls
Variable Set String %T99% "CONTINUE"
Repeat Until %T99% = "STOP"
   Variable Set String %T98% "NORMAL LINK"
 // Look for  href="http
 Variable Set Integer %N1% from Position of Text in Variable %T1%
 If Variable %N1% = 0
   Variable Set String %T99% "STOP"
 End If
 // Delete everything up till first http
 Variable Modify Integer: %N1% = %N1% + 5
 Variable Modify String: Delete Part of %T1%
 // Look for ">
 Variable Set Integer %N2% from Position of Text in Variable %T1%
 // calc length of url
 Variable Modify Integer: %N3% = %N2% - 1
 // copy url
 Variable Modify String: Copy Part of %T1% to %T2%
 // Filter out Google links
 If Variable %T2% contains "google.com"
   // Indicate this is a Google link
   Variable Set String %T98% "GOOGLE LINK"
 End If
 If Variable %T2% contains "q=cache"
   // Indicate this is a Google link
   Variable Set String %T98% "GOOGLE LINK"
 End If
 If Variable %T98% = "GOOGLE LINK"
   // don't add to list
 Else
   // append url to T3, with CRLF
   Variable Set String %T3% "%T3%%T2%%T95%"
 End If
Repeat End
// display results
Text Box Display: URLs obtained from Google

sorry but how do you use this, i need it and am new to programming

Link to comment
Share on other sites

  • 2 months later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...