HTML: href's that have "&" with missing "amp;"
2008-11-06 10:58 i1565 [permalink]
HTML is actually getting pretty old. It started from a parent technology (SGML) and is an inspiration for other standards since (XML!). A basic idea from the start was to parse it very forgivingly. Something system-builders would like to give up, since it makes parsing complexer. XML is strict by nature, and XHTML is HTML's strict incarnation, but for us keyboard-pounders, it's another thing to keep track of when we're scripting up pages and websites. A commonly forgotten thing when writing HTML that contains a link or image, is to write ampersand's in the URL as "&" when writing the href attribute. It's an easy thing to forget, and transitional HTML is still covering this up silently, but with XHTML this is a killer and will drop the rest of the document from that point.
Knowing that, enter DirFind to grep code for these instances! Regular Expressions to the rescue: with a bit of negative look-forward you can list where you've forgotten "amp;" in no-time:
(src|href)="[^"]+?&(?!amp;)