lemming Posted April 4, 2006 Report Share Posted April 4, 2006 This macro will: 1. Load the source html of a Google search result page, 2. extract all urls from the html, 3. filter out Google-related links (e.g. About Google, Advanced search, cached pages, etc), 4. and display a list of urls it managed to extract. This macro only works with Firefox, and I've set the scope for only Google search result pages. You can easily change this, of course. I'd stopped using IE months ago, so I'll leave it to the holdouts to modify this script for IE or other browsers. Should be really trivial though. This script has no error-checking, and may explode if it hits bad HTML. However, this is usually not a problem for machine-generated html code like Google's results. You might also need to tweak the delays for slower machines. To use: 1. Search for anything in Google, using Mozilla Firefox 2. Wait for page to completely finish loading. 3. Hit Ctrl-Tab 4. URL extraction results should be displayed in under 2 seconds. The url extraction works by finding pairs of href="http and "> , which indicate a clickable link for example, <a href="http://www.macros.com/"> -Lemming Clear Text Variables: All Clipboard Empty // Define CR/LF Variable Set %T95% to ASCII Char of 13 Variable Set %T96% to ASCII Char of 10 Variable Set String %T95% "%T95%%T96%" // obtain source code Text Type: <CONTROL>u Wait For Window Title: "view-source" Delay 20 Milliseconds Text Type: <CONTROL>a Delay 250 Milliseconds Clipboard Copy Delay 100 Milliseconds Variable Set String %T1% from Clipboard Window Close: "view-source" Delay 10 Milliseconds // Process urls Variable Set String %T99% "CONTINUE" Repeat Until %T99% = "STOP" Variable Set String %T98% "NORMAL LINK" // Look for href="http Variable Set Integer %N1% from Position of Text in Variable %T1% If Variable %N1% = 0 Variable Set String %T99% "STOP" End If // Delete everything up till first http Variable Modify Integer: %N1% = %N1% + 5 Variable Modify String: Delete Part of %T1% // Look for "> Variable Set Integer %N2% from Position of Text in Variable %T1% // calc length of url Variable Modify Integer: %N3% = %N2% - 1 // copy url Variable Modify String: Copy Part of %T1% to %T2% // Filter out Google links If Variable %T2% contains "google.com" // Indicate this is a Google link Variable Set String %T98% "GOOGLE LINK" End If If Variable %T2% contains "q=cache" // Indicate this is a Google link Variable Set String %T98% "GOOGLE LINK" End If If Variable %T98% = "GOOGLE LINK" // don't add to list Else // append url to T3, with CRLF Variable Set String %T3% "%T3%%T2%%T95%" End If Repeat End // display results Text Box Display: URLs obtained from Google parse_links_v0.2.mex Quote Link to comment Share on other sites More sharing options...
MORG22 Posted January 29, 2009 Report Share Posted January 29, 2009 This macro will:1. Load the source html of a Google search result page, 2. extract all urls from the html, 3. filter out Google-related links (e.g. About Google, Advanced search, cached pages, etc), 4. and display a list of urls it managed to extract. This macro only works with Firefox, and I've set the scope for only Google search result pages. You can easily change this, of course. I'd stopped using IE months ago, so I'll leave it to the holdouts to modify this script for IE or other browsers. Should be really trivial though. This script has no error-checking, and may explode if it hits bad HTML. However, this is usually not a problem for machine-generated html code like Google's results. You might also need to tweak the delays for slower machines. To use: 1. Search for anything in Google, using Mozilla Firefox 2. Wait for page to completely finish loading. 3. Hit Ctrl-Tab 4. URL extraction results should be displayed in under 2 seconds. The url extraction works by finding pairs of href="http and "> , which indicate a clickable link for example, <a href="http://www.macros.com/"> -Lemming Clear Text Variables: All Clipboard Empty // Define CR/LF Variable Set %T95% to ASCII Char of 13 Variable Set %T96% to ASCII Char of 10 Variable Set String %T95% "%T95%%T96%" // obtain source code Text Type: <CONTROL>u Wait For Window Title: "view-source" Delay 20 Milliseconds Text Type: <CONTROL>a Delay 250 Milliseconds Clipboard Copy Delay 100 Milliseconds Variable Set String %T1% from Clipboard Window Close: "view-source" Delay 10 Milliseconds // Process urls Variable Set String %T99% "CONTINUE" Repeat Until %T99% = "STOP" Variable Set String %T98% "NORMAL LINK" // Look for href="http Variable Set Integer %N1% from Position of Text in Variable %T1% If Variable %N1% = 0 Variable Set String %T99% "STOP" End If // Delete everything up till first http Variable Modify Integer: %N1% = %N1% + 5 Variable Modify String: Delete Part of %T1% // Look for "> Variable Set Integer %N2% from Position of Text in Variable %T1% // calc length of url Variable Modify Integer: %N3% = %N2% - 1 // copy url Variable Modify String: Copy Part of %T1% to %T2% // Filter out Google links If Variable %T2% contains "google.com" // Indicate this is a Google link Variable Set String %T98% "GOOGLE LINK" End If If Variable %T2% contains "q=cache" // Indicate this is a Google link Variable Set String %T98% "GOOGLE LINK" End If If Variable %T98% = "GOOGLE LINK" // don't add to list Else // append url to T3, with CRLF Variable Set String %T3% "%T3%%T2%%T95%" End If Repeat End // display results Text Box Display: URLs obtained from Google sorry but how do you use this, i need it and am new to programming Quote Link to comment Share on other sites More sharing options...
Cory Posted April 22, 2009 Report Share Posted April 22, 2009 Using a downlaoder app to dump a raw HTML file to disk works well too. Then one just has to parse the file. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.