Programming Forums

Programming Forums (http://www.programmingforums.org/forumindex.php)
-   PHP (http://www.programmingforums.org/forum29.html)
-   -   Email validating regex (http://www.programmingforums.org/showthread.php?t=10568)

Jimbo Jun 29th, 2006 1:39 AM

Email validating regex
 
I know this was discussed in another thread, but that one's been dead and I thought it seemed peaceful that way. Anyways, I was writing a regex for email addresses and came across a little question:

On this page it has the following language for domain names:
Quote:

<domain> ::= <subdomain> | " "

<subdomain> ::= <label> | <subdomain> "." <label>

<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]

<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>

<let-dig-hyp> ::= <let-dig> | "-"

<let-dig> ::= <letter> | <digit>

<letter> ::= any one of the 52 alphabetic characters A through Z in
upper case and a through z in lower case

<digit> ::= any one of the ten digits 0 through 9
does the spec for label mean [letter][ldh-str]?[let-dig] or [letter]([ldh-str]*[let-dig])* or something else? I went for the 2nd one, but wasn't sure as I'd not seen the [] notation used as it was in the quoted form.

And for those who care, my regex (which is not perfect yet) is:
:

^[!#$%\'*+/=?^_`{|}~A-Za-z0-9-]([!#$%\'*+/=?^_`{|}~\.A-Za-z0-9-]*[!#$%\'*+/=?^_`{|}~A-Za-z0-9-])@([A-Za-z]([A-Za-z0-9-]*[A-Za-z0-9])*\.)*[A-Za-z]([A-Za-z0-9-]*[A-Za-z0-9])?$

stevengs Jun 29th, 2006 6:07 AM

Hi Jimbo,
Have you read the rfc? The grammar described in RFC 822 is extremely complex. Many are quick to understimate email. Implementing validation with regular expressions somewhat pushes the limits of what it is sensible to do with regular expressions. Perl does it better than some :

EDIT:
ok, I am having trouble formatting it .. I have one I copied from "Mastering Regular Expressions".
Let this much be said: it is over 4700 Bytes long!


In any case, good luck mon ami!

Ooble Jun 29th, 2006 7:09 AM

This may burn your eyes a little.

Jimbo Jun 29th, 2006 11:14 AM

I hadn't read it the RFC, just glanced through it. I'd seen Ooble's link in the other thread and gotten a kick out of it. I was just trying to match the domain language (the one above) and the language for the local segment:
Quote:

Originally Posted by RFC 2822
atext = ALPHA / DIGIT / ; Any character except controls,
"!" / "#" / ; SP, and specials.
"$" / "%" / ; Used for atoms
"&" / "'" /
"*" / "+" /
"-" / "/" /
"=" / "?" /
"^" / "_" /
"`" / "{" /
"|" / "}" /
"~"

atom = [CFWS] 1*atext [CFWS]

dot-atom = [CFWS] dot-atom-text [CFWS]

dot-atom-text = 1*atext *("." 1*atext)

which I had simplified to [atext]([atext]|.)*[atext]. I wasn't going for perfection, just a more in depth accuracy than most people do (i.e. ".+@.+\..+). I wasn't even checking the local-part length for being under 64 or the domain length for being under 255. Though I could do those separately with strlen().

And just looking at it after a nice night's sleep, I think it should be ([atext]*.)*[atext]* so I'll probably work things out again... maybe on paper this time... :o

Jimbo Jun 29th, 2006 12:18 PM

I looked things over yet again, and here's what I came up with, broken down:
:

<letter> = '[A-Za-z]';
<letdig> = '[A-Za-z0-9]'
<locChar> = '[!#$%&\'*+/=?^_`{|}~A-Za-z0-9-]'  # valid chars for the local part, minus the dot
<domChar> = '[A-Za-z0-9-]'  # adds the hyphen to letdig
<local> = '(<locChar>+\.)*<locChar>+'
<domain> = '(<letter>(<domChar>*<letdig>)*\.)*<letter>(<domChar>*<letdig>)*'
<email> = '^<local>@<domain>$'

which brings the regex to:
:

'^([!#$%&\'*+/=?^_`{|}~A-Za-z0-9-]+\.)*[!#$%&\'*+/=?^_`{|}~A-Za-z0-9-]+@([A-Za-z]([A-Za-z0-9-]*[A-Za-z0-9])\.)*[A-Za-z]([A-Za-z0-9-]*[A-Za-z0-9])$'

Jimbo Jun 30th, 2006 3:09 PM

regex had a bug for single-char domains... but should be good now :o
:

'^([!#$%&\'*+/=?^_`{|}~A-Za-z0-9-]+\.)*[!#$%&\'*+/=?^_`{|}~A-Za-z0-9-]+@([A-Za-z]([A-Za-z0-9-]*[A-Za-z0-9])*\.)*[A-Za-z]([A-Za-z0-9-]*[A-Za-z0-9])*$'

kurifu Jul 6th, 2006 2:58 PM

You guys should check out a call website http://www.regexplib.com it is an online resource where people litterally post regexp strings for review for certain tasks, and they are even rated by the community.

Plus the website has a cool cheat sheet you can print on the syntax, and they have a flexible online app which will allow you to test regular expressions, very handy if you are using Windows I find since it doesn't have a good utility to do this for you.


All times are GMT -5. The time now is 1:34 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC