Jump to content
Macro Express Forums

MEP and Excel 2007


Recommended Posts

I trying to create a macro that will compare two columns in two different excel sheets.

 

If an entry from excel file 1 is found in excel file 2, then delete the whole row in excel file 1

 

this is what I am doing:

 

copy the cell from excel file 1 into the clip board

activate excel file 2

CTRL+F (to open the search box) <---------------------------- how do you do that in MEP??

paste from clipboard and search excel file 2

 

if loop:

=====

if there is a match <-----------------------------------------------how do i do that in MEP???

activate excel file 1

delete row

 

ELSE

if there search result returns nothing, then <----------------------how can i make MEP see that??

close search box

activate excel file 1

go down one row and copy the cell from excel file 1

 

 

Repeat until reach end of rows in excel file 1 <--------------------------how can i do that in MEP?

 

 

I would appreciate any help in any of the steps please or if you recommend it I do it differently?

 

thanks!

Link to comment
Share on other sites

Using Excel might be a better path…………………...........

I do this task all the time in Excel and my technique is to use the CountIf function. Here’s how: Consider a list in column A of sheet 1 which we want to cull by comparing to a list on sheet 2 column A. In sheet one B1 write this formula “=COUNTIF(Sheet2!A:A,A1)”. This counts the number of occurrences of what is in S1A1 in the list on sheet 2. Now double click the grip in the lower right of B2 while selected and double click. This copies it all the way down. Now use the Autofilter to only show those with a value greater than zero, highlight and delete. Depending on you list size it will be much quicker and more reliable than making a macro. If it’s really big consider coping all of the B column and pasting it back with the Paste Special > Values option to blow away the formulas. If not Excel will recalculate on every row it deletes and if you have 30,000 rows you will be there awhile. I'll post some macro solutions in another message.

Link to comment
Share on other sites

MEP ‘internal’ methods…………………….

If I were doing this I would copy both ranges in Excel and process them internally using the variables and programming capabilities of MEP. This avoids timing issues which you will run into and makes the process practically instantaneous. However given that you have only ever posted here twice I’m concerned that some of the concepts might be a bit advanced for you at this point so I will not describe. But if you start running into problems with this method or are feeling adventurous let me know and I’ll give you some pointers. I’m a testimony to doing it this ‘internal’ method because I started as you doing human analogous processes and learned to do it this way because I was so frustrated trying to get it to work the other way. It takes a little learning but it was well worth it to me.

 

Plug alert!

Also I’m available for hire if you need a quick solution. I also make myself available to clients and will often tutor them on the macros I create so they can learn and become self sufficient. I do email, phone, VoIP, online meeting, remote sessions, and even come on site. If interested send me a PM with contact info, of visit the contact page on my website. There is also a Google Voice link there if you prefer to talk.

Link to comment
Share on other sites

Thanks Cory for your response, but Each excel file (or should i say CSV file) contains around 200,000 to 1,000,000 rows.

 

I dont think excel program will allow me to create two sheets each one with that many rows, thats why i thought only a macro will do this job.

 

quick question: do you know how to create a keystroke : CTRL + F ? is it <ctrl><f> ?

Link to comment
Share on other sites

A method more like what you're looking for...............................................

I believe the tack your taking is not the best and will make some suggestions later. First let’s answer your questions:

  1. To open the find box use the Text Type command and type CTRL+F. Wait for the window, type the search string and text type Enter.
  2. If there was no match Excel will give you a window popup that says “blah blah cannot find the data…” But the window title is simply “Microsoft Excel” so you would really need to do something like using Windows Controls to find that text in the box to know that it was not found. But that’s a more advanced subject.
  3. This ties into the previous one. In the logic condition you want to look for a failure as described above and if that window doesn’t appear then you want to use the Else command. You can read about how these logical operands work in the help file.
  4. Probably the easiest way is to create an infinite loop and at the end when the copy of the cell is empty use the Break command. EG If %Cell% = “” then Break. They double quote symbolizes nothing, in the dialog you just leave it blank.

Here’s how I would approach it. Have the user copy the cells in the second file to the clipboard and save that to a variable in MEP. EG %List%. Now you can compare to that internally instead of hopping back and forth and using the find. Now your process would look something like this:

  • Copy the cell in Excel
  • Save the clipboard contents to a variable. EG %Cell%
  • If %List% contains %Cell%
  • - Delete Row
  • Endif
  • Move to the next cell and repeat

There are some things I left out here for simplicity but how about you try this first then we can add the technicalities later. For instance you will probably want to Trim the %Cell% first. Also we need to do something to contend with partial matches. EG if your cell is “123” and in the list there is a “1234” you will get a false match. But this depends on you situation. IE if all the cells contain the same number of digits then it’s not needed. And if you do use the Excel find method you still need to think about partial matches and somehow ensure that the advanced option “Match entire cell contents” is enabled.

Link to comment
Share on other sites

Consider CSV files instead of Excel files………………………………………..

A parting thought is to step back and consider the source data. If it absolutely needs to be in Excel format then so be it. But if you don’t’ need formatting and formulas you can probably save it to CSV instead. If you can do that then you can use the Text File Process or ASCII File Process commands and never have to deal with Excel at all as MEP can process a CSV natively.

Link to comment
Share on other sites

See my other post about using the Text File Process command………………

If you have this many you really don’t want to do it the way your thinking. Let’s say you do have 200,000 and it takes 5 seconds each. Well that’s 11.5 days to process. A million would take 2 months. And you will see timing errors and bomb outs in a few thousand rows. I have another comment but I’m going to add it to the other thread.

Link to comment
Share on other sites

This is definitely the method you want…………………………

The simplest way to compare two lists is to nest two Text File Process commands. The outer goes thru the primary list and the inner looks for matches in the secondary list one at a time. If not found then output the value from the primary list to a tertiary list. It’s simple in concept and easy to code but depending on the sizes of the lists it could intolerable amount of time. Steve, others and I have all ran into this and there are some clever solutions like using arrays and binary searches (sounds scary but isn’t really). Anyway take the attached macro file and see if this works for you. If it is too slow maybe you could give us some more info and we can do some more advanced tricks. I just can't spend any more charity time today. In any case this is the direction you want to head.

Compare_two_lists.mex

Link to comment
Share on other sites

[*]To open the find box use the Text Type command and type CTRL+F. Wait for the window, type the search string and text type Enter.

 

when i do this:

 

it only actually types CTRL+F in the cell

 

 

when i try <CONTROL><F> : it doesnt do anything

 

what am i doing wrong?

Link to comment
Share on other sites

take the attached macro file and see if this works for you.

 

thanks for the Macro Cory,

 

Everytime i ran it I got the blue Screen of death! :S (maybe it's my computer)

 

I created a macro that does take 6 seconds for each search. (which is not very practical for me)

 

 

let me re-summarize the problem

 

I have two CSV files

 

File 1: one file has 10 column (one of them is the email column) - size is 1.5 million rows

File 2: the other has 1 column ( one column = email column) - size is 5 million rows

 

I am comparing the "email" columns against each other.

 

If an email in File 1 is found in File 2, then delete the whole row in File 1

If not, then go down one row and repeat.

 

----

 

is there anyway to do these without having to open the files? or the files have to be open for the macros to work?

Link to comment
Share on other sites

Have no fear, it can be done..................................................................

This can be done in MEP but to be honest it would be better to have a proper compiled app. You might want to consider having a VB guy write you a quick app. It's something a decent programmer can write in about an hour. And you can find them online in many places where you can post your need and guys will bid on the job. In a month or so I should be able to do it. 

 

The other possibility is using Access or other database. This would be a simple action query in there to purge your list.

 

Having said that it can be done in MEP and would be a fun project and MEP might be something more friendly and understandable to you. I'm on site with a client today so I can't do much in the way of examples but I can give you a few ideas about how I would do it.

 

First I would use Text File Process to suck the secondary file into an array. Alternatively one could suck it all into one variable and use the split command. I would do a simple test macro and time to see which is quickest. If your list is not already sorted I would sort it. I came up with a cool routine on how to do that when running the Text File Process loop. The end result is that you want a sorted array. If the array is not sorted you can not perform smart searching in it and plowing thru every record will take way too long.

 

Next I would go thru each record in your primary file and look for matches. For this I would probably use ASCII File Process. Now in a recent post from SCasper rberg and I (I think) were discussing ways to perform more efficient searches. If the two lists you can use a high/low water mark. IE you know how high you got in the list on the last time so it’s imperative that your next item be higher so why start from the beginning? Also you know that when an item becomes greater than the one your comparing you do not need to go any further. Keeping these high/low values forms a limited bracket. The other alternative is to do a binary search. Sounds fancy but it’s simple and humans do it all the time. What you do is divide the number of array elements in half and compare to see if you’re greater or less than. Then do that iteratively until you find the match. Bob was pointing out that you can search very large arrays in a very few iterations. I think he said at most to find a match in 1M all you needed was 20 tests. EG 2^20. However you will often hit the max because most are misses. Also you could do a hybrid. Keep a high/low watermark and do a binary search within that bracket. I would try one, the other and possibly both to see if the speed benefit is worth the extra work.

 

I don’t know why you got a BSOD. At worse MEP should only crash. Did you try my macro against real data or small sample files? It could well be that the huge files caused the problem. However MEP should never cause a BSOD so if this is the case you should report it to ISS. Also my macro would not work for these kinds of numbers. I intended it as a simple example but you do not want to use it because you need a more sophisticated approach for these huge files.

 

Don’t forget I’m available for hire if you just need to get this done. I could then explain to you the concepts with the complete macro. Otherwise I suggest that over the next few days I could walk you thru the process in easily digestible chunks.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...