Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old Aug 22nd, 2005, 3:45 AM   #1
jonkata
Newbie
 
Join Date: Aug 2005
Posts: 1
Rep Power: 0 jonkata is on a distinguished road
Question How to write a spam filter ?

Hello, everyone,
I am a programmer newbie, and I have an idea... I decided to post a question in this forum, hoping that some of you will help me. I need some advices about writing a spam filter. The thing is that i don't really know where to start from and what really I am trying to make. If you could please tell me what technology to use, what programming language, where to read more info from... I know that there are a lot of good spam filters and I don't need to discover the wheel again... maybe there are some for free ?
any of your advices will be useful to me
jonkata is offline   Reply With Quote
Old Aug 22nd, 2005, 6:59 AM   #2
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
It depends on what OS you're using, how technical you want to get and what mail client you're using.

The Thunderbird mail client (from the same group that developed Firefox- you are using Firefox, right? ) has bayesian spam filtering in-built. That's the simplest option I know of, though not everyone is comfortable switching their email client.

I can't really give you any other advice without knowing a bit about your system.
Arevos is offline   Reply With Quote
Old Aug 22nd, 2005, 7:08 AM   #3
DaWei
Resident Grouch
 
DaWei's Avatar
 
Join Date: Jun 2005
Posts: 6,453
Rep Power: 10 DaWei is on a distinguished road
Nothing wrong with writing complicated things; that's how one learns. Often, one is not yet advanced enough to produce certain things. It's called "biting off more than you can chew." You rectify that by researching (Google is a wondermous tool) the subject. If way too many components pop up that ring no bells for you, then you ease up on your expectations. Take one of the required components that you don't currently understand and shift your attention to that. Maybe you can learn it directly, and maybe you have to break it down. Rinse and repeat until you have a grasp of all the components you need to construct the original, then construct it.

In the course of these activities you will discover that when you have done all you can do you still have things you just can't seem to get around. The members here, if the issue is in their toolchest, will be delighted to help you past those points.

Believe it or not, the piecemeal approach is how one plans, schedules, and executes multi-million dollar programs. Your education deserves as much.
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code.
Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers
DaWei is offline   Reply With Quote
Old Aug 22nd, 2005, 8:21 AM   #4
Infinite Recursion
Programming Guru
 
Infinite Recursion's Avatar
 
Join Date: Jul 2004
Location: United States
Posts: 3,467
Rep Power: 8 Infinite Recursion is on a distinguished road
Send a message via MSN to Infinite Recursion Send a message via Yahoo to Infinite Recursion
Define what OS you want this spam filter to run on. Then decide which email clients you want to interact with. Take time to review any APIs, etc before going in head first. Linux / Evoultion email client would probably be a good idea to target, at least first... there are tons of open source utilities out there that can get you started. However, as mentioned above, understand the foundation, the concepts behind the scenes, prior to diving in.
__________________
http://jasonpowers.net

"There are a thousand hacking at the branches of evil to one who is striking at the root."
Infinite Recursion is offline   Reply With Quote
Old Aug 22nd, 2005, 9:45 AM   #5
Pizentios
Programming Guru
 
Pizentios's Avatar
 
Join Date: May 2004
Location: Brandon, Manitoba, Canada
Posts: 2,023
Rep Power: 7 Pizentios is on a distinguished road
Send a message via ICQ to Pizentios Send a message via MSN to Pizentios
Also please define if you want this to be server side or client side
__________________
Profanity is the one language that all programmers understand.

Check out my Blog <---updated Nov 30 2007!
Pizentios is offline   Reply With Quote
Old Aug 29th, 2005, 2:29 AM   #6
jonyzz
Programmer
 
jonyzz's Avatar
 
Join Date: Aug 2005
Location: null
Posts: 40
Rep Power: 0 jonyzz is on a distinguished road
Suppose I have a webmail client - written in PHP. I would like to write a server-side spam filter - like those in yahoo for example. Is it possible to use PHP for that too ? An what can you tell me about the virus scanning of e-mails ?
jonyzz is offline   Reply With Quote
Old Aug 29th, 2005, 3:18 AM   #7
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Oh, certainly it's possible. As long as you have access to the webmail client's source, you can do just about anything. There are a number of regular updated blacklists you can use to filter out IPs. You can filter out on a wordlist of your choice. Or you can also make a bayesian filter, which learns which combination of spam is most likely to be spam.

Or you could create a whitelist, which is the approach my father took when creating a PHP spam filter. Those not on the whitelist would receive an automated reply saying "please email this whitelist address with the reason you want to contact me", or words to that effect.

Personally, I just have a bayesian filter (bogofilter) on my computer. KMail supports external filters quite well, so that works. Around 95% of spam gets killed.
Arevos is offline   Reply With Quote
Old Aug 29th, 2005, 4:26 AM   #8
jonyzz
Programmer
 
jonyzz's Avatar
 
Join Date: Aug 2005
Location: null
Posts: 40
Rep Power: 0 jonyzz is on a distinguished road
Thank you for the information and the quick replies, guys.
One thing I would like to ask you, Arevos: "There are a number of regular updated blacklists you can use to filter out IPs." - how can I get access to any of those blacklists ?
And about the whitelist? As I understood that is a list of e-mail addresses specified by each user and no messages are received from addresses which are not in that list. Did I understand right ?
Thank you in advance.
jonyzz is offline   Reply With Quote
Old Aug 29th, 2005, 5:51 AM   #9
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Yep, a whitelist is the opposite to a blacklist. A blacklist stops the listed IPs from getting in. A whitelist stops everyone except the listed IPs.

SPEWs provides some information on how to use its blacklist. SPFilter appears to be a Perl program that automatically gets the latest blacklists from a variety of sites. The latest daily updates can for SPFilter can be found here. It provides a variety of different formats, compressed using bz2 compression. You'll need software capable of decompressing bz2 (just google it; there should be plenty of programs about), and you'll need to choose which format you want the blacklist in.

DEFAULT.exim.bz2 seems a good place to start. Download it. Decompress it. Then write a program to strip all comments from it (anything after a "#" character), and to get the IPs (from the start of the line to the first ":" character).

Blacklists tend to be a bit of a brute-force tactic. Some blacklists are less precise than others, so you may find you get some false positives. I'd suggest testing it first

Programs such as SpamAssassin use a variety of techniques. They give each email address a 'score', dependant on how likely it is to be spam. For instance, if the email has an IP in its header that comes from a blacklist, add 50 to its score. If the email contains the word "Viagra", add 70 to its score. If the email comes from "hotmail.com", add 10 to its score. If the score is greater than 90, place the message in the Spam bin.
Arevos is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 1:13 AM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC