Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old Jun 29th, 2006, 1:39 AM   #1
Jimbo
Battle Programmer
 
Jimbo's Avatar
 
Join Date: Feb 2006
Location: Bellevue, WA, USA
Posts: 748
Rep Power: 3 Jimbo is on a distinguished road
Email validating regex

I know this was discussed in another thread, but that one's been dead and I thought it seemed peaceful that way. Anyways, I was writing a regex for email addresses and came across a little question:

On this page it has the following language for domain names:
Quote:
<domain> ::= <subdomain> | " "

<subdomain> ::= <label> | <subdomain> "." <label>

<label> ::= <letter> [ [ <ldh-str> ] <let-dig> ]

<ldh-str> ::= <let-dig-hyp> | <let-dig-hyp> <ldh-str>

<let-dig-hyp> ::= <let-dig> | "-"

<let-dig> ::= <letter> | <digit>

<letter> ::= any one of the 52 alphabetic characters A through Z in
upper case and a through z in lower case

<digit> ::= any one of the ten digits 0 through 9
does the spec for label mean [letter][ldh-str]?[let-dig] or [letter]([ldh-str]*[let-dig])* or something else? I went for the 2nd one, but wasn't sure as I'd not seen the [] notation used as it was in the quoted form.

And for those who care, my regex (which is not perfect yet) is:
^[!#$%\'*+/=?^_`{|}~A-Za-z0-9-]([!#$%\'*+/=?^_`{|}~\.A-Za-z0-9-]*[!#$%\'*+/=?^_`{|}~A-Za-z0-9-])@([A-Za-z]([A-Za-z0-9-]*[A-Za-z0-9])*\.)*[A-Za-z]([A-Za-z0-9-]*[A-Za-z0-9])?$

Last edited by Jimbo; Jun 29th, 2006 at 2:01 AM.
Jimbo is offline   Reply With Quote
Old Jun 29th, 2006, 6:07 AM   #2
stevengs
Professional Programmer
 
stevengs's Avatar
 
Join Date: May 2005
Location: Bad Nauheim, Germany
Posts: 436
Rep Power: 4 stevengs is on a distinguished road
Hi Jimbo,
Have you read the rfc? The grammar described in RFC 822 is extremely complex. Many are quick to understimate email. Implementing validation with regular expressions somewhat pushes the limits of what it is sensible to do with regular expressions. Perl does it better than some :

EDIT:
ok, I am having trouble formatting it .. I have one I copied from "Mastering Regular Expressions".
Let this much be said: it is over 4700 Bytes long!


In any case, good luck mon ami!
__________________
-Steven
"Is this a piece of your brain?" - Basil Fawlty

Last edited by stevengs; Jun 29th, 2006 at 6:23 AM.
stevengs is offline   Reply With Quote
Old Jun 29th, 2006, 7:09 AM   #3
Ooble
I eat cake for breakfast.
 
Ooble's Avatar
 
Join Date: Jul 2004
Location: In my box.
Posts: 4,434
Rep Power: 9 Ooble is on a distinguished road
This may burn your eyes a little.
__________________
Me :: You :: Them
Ooble is offline   Reply With Quote
Old Jun 29th, 2006, 11:14 AM   #4
Jimbo
Battle Programmer
 
Jimbo's Avatar
 
Join Date: Feb 2006
Location: Bellevue, WA, USA
Posts: 748
Rep Power: 3 Jimbo is on a distinguished road
I hadn't read it the RFC, just glanced through it. I'd seen Ooble's link in the other thread and gotten a kick out of it. I was just trying to match the domain language (the one above) and the language for the local segment:
Quote:
Originally Posted by RFC 2822
atext = ALPHA / DIGIT / ; Any character except controls,
"!" / "#" / ; SP, and specials.
"$" / "%" / ; Used for atoms
"&" / "'" /
"*" / "+" /
"-" / "/" /
"=" / "?" /
"^" / "_" /
"`" / "{" /
"|" / "}" /
"~"

atom = [CFWS] 1*atext [CFWS]

dot-atom = [CFWS] dot-atom-text [CFWS]

dot-atom-text = 1*atext *("." 1*atext)
which I had simplified to [atext]([atext]|.)*[atext]. I wasn't going for perfection, just a more in depth accuracy than most people do (i.e. ".+@.+\..+). I wasn't even checking the local-part length for being under 64 or the domain length for being under 255. Though I could do those separately with strlen().

And just looking at it after a nice night's sleep, I think it should be ([atext]*.)*[atext]* so I'll probably work things out again... maybe on paper this time... :o
Jimbo is offline   Reply With Quote
Old Jun 29th, 2006, 12:18 PM   #5
Jimbo
Battle Programmer
 
Jimbo's Avatar
 
Join Date: Feb 2006
Location: Bellevue, WA, USA
Posts: 748
Rep Power: 3 Jimbo is on a distinguished road
I looked things over yet again, and here's what I came up with, broken down:
<letter> = '[A-Za-z]';
<letdig> = '[A-Za-z0-9]'
<locChar> = '[!#$%&\'*+/=?^_`{|}~A-Za-z0-9-]'   # valid chars for the local part, minus the dot
<domChar> = '[A-Za-z0-9-]'  # adds the hyphen to letdig
<local> = '(<locChar>+\.)*<locChar>+'
<domain> = '(<letter>(<domChar>*<letdig>)*\.)*<letter>(<domChar>*<letdig>)*'
<email> = '^<local>@<domain>$'
which brings the regex to:
'^([!#$%&\'*+/=?^_`{|}~A-Za-z0-9-]+\.)*[!#$%&\'*+/=?^_`{|}~A-Za-z0-9-]+@([A-Za-z]([A-Za-z0-9-]*[A-Za-z0-9])\.)*[A-Za-z]([A-Za-z0-9-]*[A-Za-z0-9])$'
Jimbo is offline   Reply With Quote
Old Jun 30th, 2006, 3:09 PM   #6
Jimbo
Battle Programmer
 
Jimbo's Avatar
 
Join Date: Feb 2006
Location: Bellevue, WA, USA
Posts: 748
Rep Power: 3 Jimbo is on a distinguished road
regex had a bug for single-char domains... but should be good now :o
'^([!#$%&\'*+/=?^_`{|}~A-Za-z0-9-]+\.)*[!#$%&\'*+/=?^_`{|}~A-Za-z0-9-]+@([A-Za-z]([A-Za-z0-9-]*[A-Za-z0-9])*\.)*[A-Za-z]([A-Za-z0-9-]*[A-Za-z0-9])*$'
Jimbo is offline   Reply With Quote
Old Jul 6th, 2006, 2:58 PM   #7
kurifu
Expert Programmer
 
kurifu's Avatar
 
Join Date: Jul 2004
Location: Halifax, Nova Scotia (Canada)
Posts: 784
Rep Power: 5 kurifu is on a distinguished road
Send a message via ICQ to kurifu Send a message via MSN to kurifu
You guys should check out a call website http://www.regexplib.com it is an online resource where people litterally post regexp strings for review for certain tasks, and they are even rated by the community.

Plus the website has a cool cheat sheet you can print on the syntax, and they have a flexible online app which will allow you to test regular expressions, very handy if you are using Windows I find since it doesn't have a good utility to do this for you.
__________________
Clifford Matthew Roche &lt;geek@cliffordroche.com&gt;
Web Hosting: http://www.crd-hosting.com
Consulting: http://www.crdev-consulting.com
kurifu is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 5:13 PM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC