Outsourcing Consulting News
Development of anti spam SDK filtering engine built on artificial intelligence (AI) engine
Founded in 1999 and with offices in US, Europe, and Asia, the customer is a leading provider of OEM spam filtering and anti-phishing software, competing with Symantec, McAfee, and Cloudmark. Over 5,000 companies and 10 million consumers worldwide rely on them for their services.
Business Case for anti spam SDK software development
The goal was to develop email spam filtering engine that offered information actuality, up-to-date categorization, scalability, reliability, framework flexibility and extensibility. The client needed an ability to scan as many URLs for as many categories/fraudulent practices/phishing/spam/viruses/etc. as possible.
Solution - Anti spam filtering software, OEM Engine, SDK API Phishing
Fifteen developers have worked for almost three years to produce an OEM email spam filtering engine with the following components:
Anti-Spam SDK is the engine for anti-spam solutions, providing powerful functions to perform spam filtering and other advanced email message analysis.
In addition to the usual originating information (often falsified) in the message header, the engine can identify additional information in the message. Additional functions use this information to calculate sophisticated Bayesian statistics to determine the probability that any given message is spam or legitimate email. The engine can identify up to 99.9% of spam with a near-zero false positive rate, even for foreign-language spam.
A finely-tuned implementation can analyze up to 1,100 messages per second per process. The engine scales to arbitrarily large numbers of machines, limited only by external factors such as incoming bandwidth.
The engine can identify:
- The language or languages of a message;
- Phone numbers;
- Domain names;
- Key words and phrases.
A key component of the engine is the ability to identify key words and phrases even when hidden or obfuscated by the spammer. V14gr4 is identified as Viagra; www dot im a spammer dot com is identified as www.imaspammer.com. Sophisticated pattern analysis can identify even more subtly hidden key words and phrases.
The SDK API (Application Programming Interface) allows developers to integrate Anti-Spam solutions with a wide variety of applications.
Scores of configuration options allow OEMs to balance memory usage, throughput and detection.
Statistics and Analysis
Spammers do not rest; they're always looking out for ways overcome anti-spam measures. The engine not only evaluates messages, but also collects feedback and analyzes messages so analysts can respond to new spam techniques.
OEMs can use the SDK to:
- Monitor spam detection information and collect statistics;
- Automatically remove redundant and irrelevant key-words;
- Suggest new patterns to identify spam.
The client also uses the engine internally to implement a continuous information-gathering process that analyzes data from hundreds of thousands of messages collected from "honeypots" and known-good sources. This analysis continually refines and enhances the engine's ability to detect spam.
The engine looks for image attributes that are unlikely to exist in legitimate email, including:
- Jigsaw puzzle-style images;
- CAPTCHA-style images that intentionally obscure content;
- Images designed to emulate plain text.
The developed anti spam SDK software engine treats images-and parts of images-as attributes that can be extracted and tracked over large numbers of messages.
Ordinary gray listing solutions temporarily reject an email solution from a new sender once, assuming that legitimate senders will try to resend the message, but spammers will not bother.
SoloSoft developed a graylisting server that does much more: it also establishes and maintains a reputation for hundreds of thousands of different servers and originators to help weed out unreliable senders. It uses proprietary technology to offer the fastest throughput even on congested networks and can be easily and securely accessed across firewalls and other network security devices without diverting IT resources for elaborate configuration.
Sophisticated access and licensing controls allow OEMs the ability to profitably offer secure graylisting services.
The anti spam sdks engine combines its own home-grown reputation filters, along with global access to advanced data networks, to block phishing and other forms of email fraud.
Reputation analysis and email authentication help the system identify the rightful owners of IP addresses, domains, email address, and even message content.
The global data network includes near real-time reporting of phishing outbreaks. The engine identifies and segregates phishing from other types of spam, allowing OEMs to reject, delete, quarantine, etc., phishing attacks before they reach customers' inboxes.
Features for the Email spam filtering service
Scan IP addresses for spam/legit attributes
Extensive spyware database
Verify IP address and domain name owners
Verify software owners
Ability to prevent, detect, and disinfect zombie machines
Detection of viruses in real-time with and without signatures
Ximian Evolution plugin
Novell Groupwise plugin
Microsoft Exchange plugin
Fully customizable rule evaluation and weighting
Block spammers who spoof domain names
Extensive options to tune performance and accuracy
Adaptability to variable network conditions; reduce or delay analysis during periods of high traffic
Benefits of the anti spam Email filtering software
At its default settings, the SDK catches more than 95% of spam with less than 0.005% 'false positives.' Virtually all of the false positives are non-English bulk emails such as newsletters and legitimate advertisements.
Tools and Technologies
Supported platforms include: Linux (Certified for Redhat, Mandrake and Suse), Microsoft Windows,Solaris 8 (Sparc), Solaris Intel, FreeBSD, AIX, Mac OS X, HP-UX
Languages and Tools: C/C++, Perl, PHP, Apache, Sendmail, Gcc compiler, Gdb debugger, Gprof, Valgrind memory leak checker, Flex text parser