I've been doing some data validation recently, and this afternoon wanted to get a decent date field set up. I like ISO 8601 dates, they sort well in most default locales, and for the application in question I've gone for YYYY-MM-DD as the template.
This is trivially easy to detect as distinct from plain text, with
\d{4}-\d{2}-\d{2}
but of course this does not do any validation of the content i.e. 9999−99-99 is seen as correct.
It's easy to improve on this expression to match the range of numbers for months 01 to 12 ⇒
(0[1-9]|1[012])
, but actually making it correct is much harder. We need to know which months have only 30 days as a maximum, and to be able to detect leap-years that permit February to get to 29 days.
I had a quick google around – there are quite a few regexp cargo cult code libraries around the place, many don't match quite the same date format (yy/dd/mm is horrible), and most of they are simply not explained.
Then I found a nice article from Michael Ash (http://bit.ly/mashregex) that talked about ways to extend the simple versions, and explained the best bit, leap-year detection! I didn't need everything from his examples (I didn't want to make a leading zero optional, nor allow non-hyphen separators), and I only needed to validate from year 2000 to 2099, so I ended up with a much less complex expression.
The leap-year detection was easy enough, Michael had spotted a nice pattern that enumerates all the possible leap-years in a century (specifically, one where the initial year is also leap, which 2000 was) :-
| 00 | 04 | 08 | 12 | 16 |
| 20 | 24 | 28 | 32 | 36 |
| 40 | 44 | 48 | 52 | 56 |
| 60 | 64 | 68 | 72 | 76 |
| 80 | 84 | 88 | 92 | 96 |
([02468][048]|[13579][26])
The building blocks look like this :-
(0[13578]|1[02])-31
(0[1,3-9]|1[012])-(30|29)
(0[1-9]|1[012])-(0[1-9]|1[0-9]|2[0-8])
([02468][048]|[13579][26])-02-29
Three of these four building blocks are valid for any year, and can therefore be grouped together. The pseudo-code looks like :-
And the final expression (tested with Visual REGEXP, which is already packaged for Ubuntu as visual-regexp) ..
20(([02468][048]|[13579][26])-02-29|\d\d-((0[1-9]|1[012])-(0[1-9]|1[0-9]|2[0-8])|(0[1,3-9]|1[012])-(30|29)|(0[13579]|1[02])-31))
Formatted for readability:
20
(
([02468][048]|[13579][26])-02-29
|
\d\d-
(
(0[1-9]|1[012])-(0[1-9]|1[0-9]|2[0-8])
|
(0[1,3-9]|1[012])-(30|29)
|
(0[13579]|1[02])-31
)
)