Jump to content
Macro Express Forums
MakaPakaTobyHannah

Manipulating a text file

Recommended Posts

I need to devise a process with which to modify an EPS file. The EPS file is plain text, of course, and can be viewed in any text editor.

 

The program I use unfortunately, depending on the page size chosen, inserts an illegal "statusdict" command into the EPS file. This causes problems when trying to distill the file to a PDF. Removing the entire "statusdict" line, which could be something like:

 

"statusdict /setpage known {statusdict begin 792 1224 1 setpage end} if"

 

always solves the problem. I can easily do this in Notepad, for instance, by locating the instruction, and then deleting the entire line.

 

My strategy has been to assign the contents of the file to a variable, and then either splitting the string several times, and rejoining the split segments, or by using "Text File begin Process" to read through each line, and appending each line, except for the line containing "statusdict", to a receiver variable; once that is done, I would modify the receiver variable: Modify String: save to text file.

 

This does work. The problem is, such an EPS file can be quite long. The one I tested with is 31,191 lines long. As a result, the process is extremely extremely slow: easily in excess of one minute.

 

I cannot think of any other procedure by which to obtain the desired result; yet at this speed, I'd be faster opening the file in Notepad and manually editing it.

 

Is there a better approach?

 

Thanks in advance!

Share this post


Link to post
Share on other sites

As a result, the process is extremely extremely slow: easily in excess of one minute.

Are you using the latest version? The change log for v 4.2.1.1 says "17. Optimized the 'Split String' command.

Share this post


Link to post
Share on other sites

Are you using the latest version? The change log for v 4.2.1.1 says "17. Optimized the 'Split String' command.

Actually, the current version is v4.3.0.1!

Share this post


Link to post
Share on other sites

Actually, the current version is v4.3.0.1!

I didn't say the current version is v 4.2.1.1 only that the Split String command was optimized in that version.

Share this post


Link to post
Share on other sites

This takes me back. In my former life I was a CADD draftsman/designer and had created a home brew documentation system with this ‘new’ program called Acrobat. AutoCAD couldn’t create PDFs and the PDF print driver had not been invented yet so one had to use the distiller. And since AutoCAD could export EPS this worked well. Unfortunately given the 3D nature of the models arc rotation was often negative which was legal in EPS but Distiller choked on it. So I wrote a program, much like you have to identify and edit the G-strokes to positive rotation. Worked cool. Anyway I have a bit of experience in what you are dealing with.

 

I know exactly your problem with MEP here. Great functions but for large bits of data it’s simply too slow. One suggestion I have is not to append the variable while plowing thru with a text file process but instead write each line out one at a time. It sounds counterintuitive but in some cases it’s faster. I believe one of MEP’s major performance problems comes from resizing variables. And given the way Windows treats active files and caches disk writes it can often be faster.

 

The second option I would suggest is one I’m having to turn to more and more for exactly the reasons your experiencing and that’s to use outside programming resources. Programmers deal with this sort of problem all the time so a long time ago the created and continue to refine a weapon known as RegEX (Regular Expressions). It’s hugely powerful and difficult to understand at first but in this simple example there’s a method to replace or remove. Based on a pattern. In this case imagine the instructions being “Find ‘statusdict<any number of characters><End of Line>’ and replace with nothing. No matter how large the file the results will be practically instantaneous. And you could put this in a VBScript you could run from MEP if this is all part of a larger thing or you could use just a VBScript instead. It is a little more advanced but if you’re hacking EPS files you might find it easy and there are tons of really great simple examples online. And if you need some help just contact me directly.

Share this post


Link to post
Share on other sites

Thank you Cory, as always... for your very helpful response. It's good to know I'm not the only who has experienced this kind of issue. I will explore your suggested solutions. I've only just begun using external scripts as part of ME routines; for instance, input windows with both radio buttons and check boxes - and maybe roll-down menus etc. are in-your-dreams-only features of ME, at least for now - but can be accomplished by inserting external scripts.

 

Cheers to everyone who chimed in.

 

The second option I would suggest is one Im having to turn to more and more for exactly the reasons your experiencing and thats to use outside programming resources. Programmers deal with this sort of problem all the time so a long time ago the created and continue to refine a weapon known as RegEX (Regular Expressions). Its hugely powerful and difficult to understand at first but in this simple example theres a method to replace or remove. Based on a pattern. In this case imagine the instructions being Find statusdict<any number of characters><End of Line> and replace with nothing. No matter how large the file the results will be practically instantaneous. And you could put this in a VBScript you could run from MEP if this is all part of a larger thing or you could use just a VBScript instead. It is a little more advanced but if youre hacking EPS files you might find it easy and there are tons of really great simple examples online. And if you need some help just contact me directly.

 

And to conclude, based on Cory's suggestion, this macro now works in a fraction of a second, when it could have easily taken up to one minute:

 

Variable Set String %origEPSFile% to "*.EPS" // this will display only EPS file in the next line

Variable Set String %origEPSFile%: Prompt for a filename // I select the EPS file to be processed here

External Script: AutoIT

 

//The AutoIT script looks like this:

//

//#include <file.au3>

//Dim $aRecords

//_FileReadToArray("%origEPSFile%",$aRecords)

//For $x = 1 to $aRecords[0]

// if stringinstr($aRecords[$x], "statusdict") then _FileWriteToLine("%origEPSFile%", $x, "", 1)

//Next

//

//End of AutoIT script

 

Text Box Display: "Statusdict" command has been removed.

 

Good thing, that!

Share this post


Link to post
Share on other sites

You don't even need Macex for this. You could use the well known grep program from Unix. For Windows, you can get grep via GNU utilities for Win32 which can be found at http://unxutils.sourceforge.net/

You only need the file egrep.exe which is in \usr\local\wbin

 

For your given example:

statusdict /setpage known {statusdict begin 792 1224 1 setpage end} if

 

This one-liner will DISPLAY all lines which match the pattern in the file dirty.eps

 

egrep "\{statusdict begin [0-9]{3} [0-9]{4} [0-9] setpage end\} if" dirty.eps

 

This one-liner will REMOVE all lines which match the pattern in the file dirty.eps and write the results to clean.eps

 

egrep -v "\{statusdict begin [0-9]{3} [0-9]{4} [0-9] setpage end\} if" dirty.eps >clean.eps

 

-v mean invert-match, or select non-matching lines

[0-9]{3} means "match any three-digit number"

[0-9]{4} means "match any four-digit number"

[0-9] means "match any one-digit number"

 

The curly brackets have a special meaning in regex, so you need to "escape" them you're looking for the literal characters. Hence, the use of the backslash char in the search pattern, i.e. \{ \}

 

I regularly have to grapple with extremely huge text files (over 200 million lines, file size 2GB+ each). I processed one such file using a much more complex pattern, and egrep took only about 10 seconds. It may not seem fast, but it is still faster and easier than autoit/autohotkey. Anyway, processing such large files would not be feasible with Macro Express. For your file, which is "only" about 30,000 lines and has relatively simple patterns, I would estimate egrep will take less than a second to perform the task you want. You can also call egrep from Macex.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×