Jump to content
Macro Express Forums

Split string performance warning


Recommended Posts

I did a Split String on CRLF on a TSV file with approx. 61k lines and it took 36 minutes so beware that I don't think this command was intended for any heavy lifting.

 

This is intolerable for my purposes so I'm going to have to find another solution and I think I'm just trying to push MEP to far at this point so it's no condemnation of MEP. I'm just using the wrong tool for the job.

Link to comment
Share on other sites

I did a Split String on CRLF on a TSV file with approx. 61k lines and it took 36 minutes so beware that I don't think this command was intended for any heavy lifting.

 

This is intolerable for my purposes so I'm going to have to find another solution and I think I'm just trying to push MEP to far at this point so it's no condemnation of MEP. I'm just using the wrong tool for the job.

I don't think I agree with you. I just wrote an AutoIt script to split a 4MB text file containing over 23,000 rows, and it runs in less than 2 seconds. If AutoIt, which allows for only one universal data type - no integers, booleans, strings, etc., can manage this, why can't MEP? After all, AutoIt offers vastly richer functionality than MEP, and I'm sure you can do everything in AutoIt that you can in MEP. All MEP has that AutoIt lacks is a script-building interface.

 

I can see no reason why MEP shouldn't be able to equal AutoIt in its efficiency at running code. There's usually only a few ways of writing robust and efficient code; unfortunately MEP appears to contain much code that exhibits neither of these qualities. And just remember that AutoIt is free and is written by enthusiasts (who are extremely good at what they do).

Link to comment
Share on other sites

Let me explain my comment a little better. I believe a program like MEP should be able to handle this sort of thing but going back even to the early beta days we had performance hits whenever variables were resized frequently or en masse. As I understand it the problem is buried deep in the code and difficult to remedy at this point of its development. IE the way it was designed was not intended for heavy lifting and I'm just resigning myself to the reality that this is not likely something that will be improved soon. And MEP was really designed to run at the speed of a user. IE you're not going to notice a performance hit if you have to wait for user action on each iteration and one is not likely to have the user perform 60k ops. Since the specific application I was working on in this case ran unattended and invisibly on a file server it really is something I should be doing in a real programming language and will do so as soon as I learn a modern one;-) But I do agree with your sentiment. 60k is not a big number in the computer world.

Link to comment
Share on other sites

For many that use ME, AutoIt is not a viable alternative. It's programming versus filling in foolproof dialogs. I agree with Paul though that ME is always relatively slow. Aren't both macros and mxe files compiled at runtime? Even if there is a lot of overhead in the programming dialogs to make them foolproof, once the code is saved, surely it's possible to store in a compiled form that will be quick to run?

 

Given that ME has to be installed on the host PC I assume that the code is incompatible with other programming apps normally found on a PC. There could be commercial reasons for doing that, eg someone has to buy the software to run a macro.

 

I'm not that unhappy with ME's speed for most tasks but I have macros that take an hour of pure command runs (no waiting). PixelSearch in AutoIt is probably 30x faster than you can do with ME so that type of application is frustrating. The way one gets around that is to make a hybrid using AutoIt's versatility with ME's user interface. Things slow down though if you have to handshake information. If you can do that with AutoIt, can't the same things be done within ME? I guess these come under "Request for Features".

 

A big difference with AutoIt is that you have a number of programmers writing UDFs that make many sophisticated tasks simple (like all the Excel manipulation scripts). Easier to do when the labour is free.

Link to comment
Share on other sites

For many that use ME, AutoIt is not a viable alternative. It's programming versus filling in foolproof dialogs.

Agreed, though MEP is a long way from offering much that's "foolproof"!

Aren't both macros and mxe files compiled at runtime? Even if there is a lot of overhead in the programming dialogs to make them foolproof, once the code is saved, surely it's possible to store in a compiled form that will be quick to run?

And they are! Macros run using F9 always seem to run slower than their compiled counterparts. But if what you're compiling is, fundamentally, poor code, then compiling is not going to turn it into super code.

A big difference with AutoIt is that you have a number of programmers writing UDFs that make many sophisticated tasks simple (like all the Excel manipulation scripts). Easier to do when the labour is free.

No, not at all. UDFs (user-defined functions that extend AutoIt's functionality; take a look at some of the amazing stuff you can do with Internet Explorer thanks to the IE Management UDFs) are written by people very good at what they do, and very experienced in AutoIt development. There aren't many of these people around, and the fact that they do what they do for no financial compensation makes them even rarer!

Link to comment
Share on other sites

It is my understanding that macros are not compiled, even at run time. MEP is analogous to a text script and a scripting host like VBS is to CScript or HTML is to a web browser. MacExp.exe is compiled of course but just like how a routine written in VBS will never ruin as fast as a compiled C# app.

 

But MEP is written in Delphi and I'm sure Delphi has no problem splitting strings. What I suspect is happening is the MEP programmers do a lot of things to make it work better for newbs or whatever and I think that gets in the way. For instance using CTRL+C to copy is usually much faster than the MEP Clipboard command that does the same. You see the clipboard is a much more mysterious place than most people think and I think what ISS is doing is attempting to make it that simple for us to interact with. Unfortunately that adds overhead. It's like training wheels. Makes it so any idiot can operate a bike but you're never going to corner very fast.

Link to comment
Share on other sites

  • 5 months later...

Actually, Delphi does not have any split string utility (though it would be nice if it did), so I had to write it up myself. Also, unfortunately, the version of Delphi we're using is much slower with strings than the version we built ME3 in, which accounts for most of the slowness.

 

I'll see if I can optimize my code a bit. This is one time (of many) when I wished we had used C/C++ to write MEPro because the speed would have been much better as there are things like this built into the language.

Link to comment
Share on other sites

I'll see if I can optimize my code a bit.

 

Chris,

 

I'm afraid I strongly echo Paul's comments. Performance criticisms of ME Pro (and unfavourable comparisons with ME) have been around for ages, from several of us. And often supported by hard details, particularly from Paul.

 

It really does seem to me that a fundamental revision of some parts of the code is justified. Of course, finding which part is the hard bit! ;)

 

--

Terry, East Grinstead, UK

Link to comment
Share on other sites

I can say for sure that the version of Delphi that we're using is MUCH slower at anything requiring memory allocations than previous versions of Delphi (but those versions of Delphi have significant bugs, too, which is why we upgraded).

 

We are looking into ways around this. I may have to start preallocating memory and just reuse it (like the Linux OS does for its apps).

Link to comment
Share on other sites

You could try the ASCII File Process command instead of Split String, and see if you can get better performance. It supports tab-delimited text.

 

But yeah, 36 minutes is waaaay to long :-(

 

I did a Split String on CRLF on a TSV file with approx. 61k lines and it took 36 minutes so beware that I don't think this command was intended for any heavy lifting.

 

This is intolerable for my purposes so I'm going to have to find another solution and I think I'm just trying to push MEP to far at this point so it's no condemnation of MEP. I'm just using the wrong tool for the job.

Link to comment
Share on other sites

  • 3 weeks later...

You may be pleased to know that we found the performance bottleneck. Apparently it was all in the variable processing routines. Specifically, there were more memory allocations and deallocations occurring than we really needed. This wasn't a problem in ME3 because the version of Delphi we used was for that version was faster at these memory allocations than the current version.

 

Case in point, in MEPro 4.1.7.1, if you load a 1.03MB file containing all % symbols, it was taking more than 30 minutes to process (much like Cory reported above with the split string). In my new test build, it took under a second.

 

Since all of the commands run through this particular function, this most likely accounts for much of the performance problems everyone is experiencing. I think the main reason it's not more widespread is because most people are using smaller datasets and thus don't notice the issue as those who use much larger datasets.

 

As such, Cory, could I trouble you to send me a copy of the file that's causing the split string slow down? I'd like to make sure that this fix will work for that particular function, too. And, if it doesn't affect performance in this area, I would like to be able to find a proper fix for you.

 

I do apologize that it has taken this long to find the problem. I will do my best to make sure that situations like this don't happen again.

Link to comment
Share on other sites

You may be pleased to know that we found the performance bottleneck. Apparently it was all in the variable processing routines. Specifically, there were more memory allocations and deallocations occurring than we really needed. This wasn't a problem in ME3 because the version of Delphi we used was for that version was faster at these memory allocations than the current version.

Do you think this will also solve the problem of reading large files into a text variable and doing a replace all?

Link to comment
Share on other sites

Do you think this will also solve the problem of reading large files into a text variable and doing a replace all?

I think so. What I was testing against was a text file that contained a large number of percent signs. Actually' date=' the test file I used was 1.03MB of just percent signs (with a few 5s thrown in for good measure). It was taking upwards of 30 minutes to process the "Variable Set String from File" command just because the variable processing was having a difficult time working through all of those percents and trying to extract the contents of non-existent variables. After my change, the time decreased to just under a second.

 

So, if your text file that you're doing a replace all in contained a fairly large number of percent signs, then yes, it will increase the speed. However, I would be willing to test your file out and make any corrections to the program that are necessary to increase the speed there, too.

 

Good news, thanks Chris. Don't you just love Eureka! moments?

You have no idea :).

Link to comment
Share on other sites

Cory, please disregard my request for a test file. I created a 61265 line file totaling approximately 4.08MB. The original split string command took forever, just as you noted. I played with it this morning as my fix yesterday did nothing to speed up this particular command. As of right now, I am now able to split the file in just under two seconds. I found that loading the text file into memory twice (so, in RAM it occupies 8.16MB instead of 4.08MB) gave me a significant performance boost. Of course, that memory will be freed as soon as the command finishes.

 

Now, I just have to optimize it when using the debugger (when used in the debugger, the original, significantly slow, speed is still retained).

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...