Spam Wars - Our Last Best Chance to Defeat Spammers, Scammers, and Hackers

February 25, 2007

Stock and Medz Spammers Try New Images

There has been a lot of discussion among spam fighters about the uptick in the last several months of spam messages consisting of nothing but attached images. Sometimes the images are accompanied by real text extracted from public domain literature or web pages. The text, which avoids spammy words and phrases, is intended to get past content-sniffing filtering, such as Bayesian filters common to a lot of anti-spam products and services (it's too easy to filter all messages that have nothing but an image).

The content of an image, however, is more difficult for spam filters to identify. About the middle of last year, stock spammers (followed closely by prescription medz spammers) started blending their text-in-graphic images with a background whose little specks and dots could be changed almost at random without interfering with the text. Two image spam messages originating from the same bot-infested PC could have the same text in the image, but the background would be different. It would be impossible for a spam filter to predict what the "signature" of that image file could be for every potential combination.

As happens constantly in the spam wars, the spam fighters saw a mole popping up, and whacked it by incorporating optical character recognition (OCR) into their image inspection routines. Stock image spam almost always has some key words in it ("symbol", "trading", "target", etc.) that OCR could read (it's even easier for brand name medications). If the OCR software could "read" the text on the image, the spam was toast (mmmm, toasted SPAM™ sandwich).

Having found one of their mole holes blocked by a mallet, the spammers starting digging yet another hole to get through. This time, the trick is to modify the image enough so that it is hopefully still readable by humans, but not by OCR technology. Here is an example I spotted today (identifiable names/symbols intentionally smudged by me):

In this example, the text is rotated at an angle, causing the letters to become less distinct. There is also far more background "noise," which is (I suppose) intended to hamper OCR still further.

Several other spam images that have come by here are modified through different techniques. Sometimes the text is barely humanly readable.

Time for the spam fighting technologists to start carving yet another mallet.

Posted on February 25, 2007 at 03:11 PM