sorlov Posted January 18, 2008 Report Share Posted January 18, 2008 Greetings! I am new to ME and have a question. I want to go through an *.mrc file, copy from it all ISBN numbers and paste into Excel. The *.mrc file is a file with bibliographic records which consists of one extremely long line with information on hundreds of books. No line breaks. I wrote a macro that opens the file in EditPlus, searches for a regular expression a66[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9], copies it and pastes into Excel. Works fine, but I don't know how to tell the program to stop. I'm not sure it would work with Text File Begin/End Process, since there are no line breaks. And I don't see what variable I could define to use in Repeat Until since it's just one long stream of data. Is there a way to tell the program just to stop looping when there is no more text left? If not, what would you suggest? Thanks in advance, Stanislav Quote Link to comment Share on other sites More sharing options...
joe Posted January 18, 2008 Report Share Posted January 18, 2008 Can you supply a sample file? Zipped, if possible. It would make it easier to determine a solution. Quote Link to comment Share on other sites More sharing options...
sorlov Posted January 18, 2008 Author Report Share Posted January 18, 2008 Can you supply a sample file? Zipped, if possible. It would make it easier to determine a solution. Please find attached. This file can be opened in any text editor, including Notepad, but I'm using EditPlus becayse I need to include the regular expression a66[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9] in search. I'm looking for electronic ISBNs, and since the file has a couple varieties of those (some are 13 and some are 10 digits long), I need to find those with "a66" in the beginning and having 10 digits after the "a". If you use another editor, the macro could be edited for use with EditPlus, right? SPRHum1.zip Quote Link to comment Share on other sites More sharing options...
joe Posted January 18, 2008 Report Share Posted January 18, 2008 So we are on the same page: If your regular expression is correct, the file you sent has 133 matches in it. Are you wanting to just get a list of the "a66########" numbers within the file?: a6612345678 a6687654321 a6656473829 ... or do you need other information from the file, too? Also, is there always whitespace following the pattern of "a66" + 8 digits? Or could there be non-whitespace characters adjacent to them? Quote Link to comment Share on other sites More sharing options...
sorlov Posted January 18, 2008 Author Report Share Posted January 18, 2008 Just a list of the "a66########" numbers will do. Quote Link to comment Share on other sites More sharing options...
joe Posted January 18, 2008 Report Share Posted January 18, 2008 Here is a macro that will parse through a string without using an external regular expression. Although, for something like this, regualr expressions are MUCH better and easier. The clipboard will contain your string of numbers at the end of the macro. You will need to adjust the location and name of the input file. If I have the time, I will post a macro that generates a temporary external regular expression, uses it to get the values from a string or file, and then deletes itself when finished. // Create CR/LF string and a string of digits Variable Set %T10% to ASCII Char of 10 Variable Set %T13% to ASCII Char of 13 Variable Modify String: Append %T10% to %T13% Variable Set String %T4% "0123456789" // Read in the file to search and begin main loop Variable Set String %T1% from File: "SPRHum1.mrc" Repeat Until %N1% <> %N1% // Locate next occurrence of "a66". If not found then we are done. Variable Set Integer %N1% from Position of Text in Variable %T1% If Variable %N1% = 0 Repeat Exit End If // Delete everything in the search string prior to "a66" Variable Modify Integer: Dec (%N1%) Variable Modify String: Delete Part of %T1% // "a66" must be followed by 8 digits and a space so if a space is in position 12, this could be a good find. // If not, then delete the first character ("a") so the next search won't find it again. Variable Set Integer %N1% from Position of Text in Variable %T1% If Variable %N1% = 12 Variable Modify String: Copy Part of %T1% to %T2% Variable Modify String: Delete Part of %T1% Variable Modify String: Trim %T2% // Make sure that positions 4 through 11 are all digits. // If not then this is not a good find so don't append it to the save string Repeat Start (Repeat 8 times) Variable Modify String: Copy Part of %T2% to %T3% If Variable %T4% does not contain variable %T3% Variable Set String %T2% "" Repeat Exit End If Repeat End If Variable %T2% > "" Variable Modify String: Append "%T2%%T13%" to %T5% End If Else Variable Modify String: Delete Part of %T1% End If Repeat End // Copy the save string to the clipboard. Variable Modify String: Save %T5% to Clipboard Delay 250 Milliseconds Macro Return <REM2:Create CR/LF string and a string of digits><ASCIIC:10:1:10><ASCIIC:13:1:13><TMVAR2:08:13:10:000:000:><TVAR2:04:01:0123456789><REM2:><REM2:Read in the file to search and begin main loop><TVAR2:01:04:C:\Temp\SPRHum1.mrc><REP3:08:000002:000002:0001:1:01:N1><REM2:><REM2:Locate next occurrence of "a66". If not found then we are done.><IVAR2:01:13:1:a66><IFVAR2:2:01:1:0><EXITREP><ENDIF><REM2:><REM2:Delete everything in the search string prior to "a66"><NMVAR:09:01:0:0000001:0:0000000><TMVAR2:11:01:00:001:N01:><REM2:><REM2:"a66" must be followed by 8 digits and a space so if a space is in position 12, this could be a good find.><REM2:If not, then delete the first character ("a") so the next search won't find it again.><IVAR2:01:13:1: ><IFVAR2:2:01:1:12><TMVAR2:10:02:01:001:012:><TMVAR2:11:01:00:001:012:><TMVAR2:01:02:00:000:000:><REM2:><REM2:Make sure that positions 4 through 11 are all digits.><REM2:If not then this is not a good find so don't append it to the save string><REP3:01:000004:000001:00008:1:01:><TMVAR2:10:03:02:N01:001:><IFVAR2:4:04:8:T3T><TVAR2:02:01:><EXITREP><ENDIF><ENDREP><IFVAR2:1:02:4:T><TMVAR2:07:05:00:000:000:%T2%%T13%><ENDIF><ELSE><TMVAR2:11:01:00:001:001:><ENDIF><ENDREP><REM2:><REM2:Copy the save string to the clipboard.><TMVAR2:16:05:00:000:000:><MSD:250><MRETURN> Quote Link to comment Share on other sites More sharing options...
sorlov Posted January 18, 2008 Author Report Share Posted January 18, 2008 OMG!!! I edited the file location, clicked on "Test Run Macro" and, at first, thought it's still waiting, while it has already finished. Your macro was so fast! And so efficient. As a librarian, I have to work with text a lot, so handling it is a must. This macro does it very well. If you have time for a macro using external reg expressions, it would be great, but even w/o it, you saved me lots of time and headache Thanks a lot! Quote Link to comment Share on other sites More sharing options...
joe Posted January 18, 2008 Report Share Posted January 18, 2008 I did forget to mention that it will do its job in under 1 second on a file that small (473 k). We got lucky in that the data you wanted to extract was easy to find and parse. Anything more complicated would be better served using regular expressions. Glad it worked for you! Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.