Jump to content
Macro Express Forums

MORG22

Members
  • Posts

    1
  • Joined

  • Last visited

Posts posted by MORG22

  1. This macro will:

    1. Load the source html of a Google search result page,

    2. extract all urls from the html,

    3. filter out Google-related links (e.g. About Google, Advanced search, cached pages, etc),

    4. and display a list of urls it managed to extract.

     

    This macro only works with Firefox, and I've set the scope for only Google search result pages. You can easily change this, of course.

     

    I'd stopped using IE months ago, so I'll leave it to the holdouts to modify this script for IE or other browsers. Should be really trivial though.

     

    This script has no error-checking, and may explode if it hits bad HTML. However, this is usually not a problem for machine-generated html code like Google's results. You might also need to tweak the delays for slower machines.

     

    To use:

     

    1. Search for anything in Google, using Mozilla Firefox

    2. Wait for page to completely finish loading.

    3. Hit Ctrl-Tab

    4. URL extraction results should be displayed in under 2 seconds.

     

    The url extraction works by finding pairs of href="http and "> , which indicate a clickable link for example,

    <a href="http://www.macros.com/">

     

    -Lemming

     

    Clear Text Variables: All
    Clipboard Empty
    // Define CR/LF
    Variable Set %T95% to ASCII Char of 13
    Variable Set %T96% to ASCII Char of 10
    Variable Set String %T95% "%T95%%T96%"
    // obtain source code
    Text Type: <CONTROL>u
    Wait For Window Title: "view-source"
    Delay 20 Milliseconds
    Text Type: <CONTROL>a
    Delay 250 Milliseconds
    Clipboard Copy
    Delay 100 Milliseconds
    Variable Set String %T1% from Clipboard
    Window Close: "view-source"
    Delay 10 Milliseconds
    // Process urls
    Variable Set String %T99% "CONTINUE"
    Repeat Until %T99% = "STOP"
       Variable Set String %T98% "NORMAL LINK"
     // Look for  href="http
     Variable Set Integer %N1% from Position of Text in Variable %T1%
     If Variable %N1% = 0
       Variable Set String %T99% "STOP"
     End If
     // Delete everything up till first http
     Variable Modify Integer: %N1% = %N1% + 5
     Variable Modify String: Delete Part of %T1%
     // Look for ">
     Variable Set Integer %N2% from Position of Text in Variable %T1%
     // calc length of url
     Variable Modify Integer: %N3% = %N2% - 1
     // copy url
     Variable Modify String: Copy Part of %T1% to %T2%
     // Filter out Google links
     If Variable %T2% contains "google.com"
       // Indicate this is a Google link
       Variable Set String %T98% "GOOGLE LINK"
     End If
     If Variable %T2% contains "q=cache"
       // Indicate this is a Google link
       Variable Set String %T98% "GOOGLE LINK"
     End If
     If Variable %T98% = "GOOGLE LINK"
       // don't add to list
     Else
       // append url to T3, with CRLF
       Variable Set String %T3% "%T3%%T2%%T95%"
     End If
    Repeat End
    // display results
    Text Box Display: URLs obtained from Google

    sorry but how do you use this, i need it and am new to programming

×
×
  • Create New...