February 08, 2005

Another Filter Evasion Trick

A recent medz spammer is pulling out the stops to prevent content-based spam filters from spotting his garbage. In Spam Wars, I describe Message Trick #18 (pp.160-161), which entails using what are called numeric character references to disguise URLs (and which may be decoded using the deobfuscator tool I provide at this site).

Related to those numeric references is a way of adding various symbols to HTML body text—like the one that let me use an em-dash in this sentence. Or the copyright (©) symbol. Pretty handy stuff.

Among the characters that can be represented this way are a bunch of Greek characters, which might normally show up in things like math formulas (and a πr² to you, too). A lot of Greek characters look like Latin characters, even if they don't sound the same. An uppercase rho character looks like a capital "P," and an uppercase alpha looks just like a capital "A."

(It's just a coincidence that I talk about this one day after discussing another technique that is based on the same concept, called a homograph. This one, however, is simply an elaboration on the old trick that substitutes zeros for uppercase Os.)

This Greek substitution trick is how today's medz spammer spelled "PHARMACY" using the following character entities:


When given certain font family and styling parameter, this gobbledygook presents the following (I'll let your browser do the rendering):


The uppercase upsilon at the end makes kind of a weak "Y," but you get the picture.

More importantly for the spammer, until users' filters learn about this trick, some messages will probably sneak through.


Posted on February 08, 2005 at 02:11 PM