patgenn123 Posted December 9, 2010 Report Share Posted December 9, 2010 Hello all, I am busy with questions today. I am parsing a text file and want to grab any text that is an email address. In general, it would go something like this: If variable contains @ and .com or .net or....(other extensions) do something.... My question is, is there a better way to make sure it's an email address and nothing else? Pat Quote Link to comment Share on other sites More sharing options...
Cory Posted December 9, 2010 Report Share Posted December 9, 2010 I struggled with this once and the only test I could really find was that some were using scripts and initiating an email to see if whatever server they were using would refuse it as an invalid format. But I thought that inelegant so I created my own script to do it. It performs several tests. I think I had it parse on the @ and the last period. Then I maintained a list of TLD (Top Level Domains like .com) and made sure the TLD was in that list. Then I made sure that the length was OK on the domain name I think then checked every character in it and the alias and made sure all the characters were valid. Might have done some other tests too. Not sure. You can simply check for valid characters by converting the ASCII Decimal value and make sure it's valid. 48-57, 65-90, 97-122 and whatever the values for "! # $ % & ' * + - / = ? ^ _ ` { | } ~.". You can read about all the email syntax rules here but they're fairly simple. I see a couple I didn't test for like two periods in a row. Anyway I have it all as one subroutine that runs pretty quickly so where I use it I have one command and a yes or no answer. Might have even returned the rule it failed. I can't remember now without digging it out and looking at it. Quote Link to comment Share on other sites More sharing options...
arekowczarek Posted December 10, 2010 Report Share Posted December 10, 2010 Hello all, I am busy with questions today. I am parsing a text file and want to grab any text that is an email address. In general, it would go something like this: If variable contains @ and .com or .net or....(other extensions) do something.... My question is, is there a better way to make sure it's an email address and nothing else? Pat Guess it's you're lucky day Pat! The attached macro processes a text file and displays a TBD everytime an email address is found. Tests included: - local part length 1≤64 - domain part length 1≤253 - local part to only contain chars allowed in it - domain part to only contain chars allowed in it - domain part to contain at lest one "." - first or last char of local part can't be "." - local part not to contain ".." - email address length ≤254 There are more conditions than that, that decide about an email address being correct or not, if you feel like adding more validation test go ahead. Refer to the website Cory provided. I think it's pretty clear in the script where certain validation test should be inserted. If not just ask. I think I had it parse on the @ and the last period. What about ".co.uk", "com.pl" and others? Did you create a rule for every country? I didn't put any validations test on those. It's important so it shouldn't be neglected. I'm leaving this part to you Pat. Time for you to contribute to the script Test file attached is the file I was testing the macro with. EMAIL EXTRACTOR.mex test.txt Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.