Programming Forums
User Name Password Register
 

RSS Feed
FORUM INDEX | TODAY'S POSTS | UNANSWERED THREADS | ADVANCED SEARCH

Reply
 
Thread Tools Display Modes
Old Mar 3rd, 2006, 8:17 AM   #11
DaWei
Resident Grouch
 
DaWei's Avatar
 
Join Date: Jun 2005
Posts: 6,453
Rep Power: 10 DaWei is on a distinguished road
Put your spam information in a database. It has nothing to do with web design. It might be as simple as importing a .csv file or a complex as writing a program to parse the file and stuff it into the database. It isn't an http transaction, subject to the limitations thereof. Just do it.

Design your web application to query/update/maintain/whatever.
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code.
Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers
DaWei is offline   Reply With Quote
Old Mar 3rd, 2006, 11:50 AM   #12
guess
Programmer
 
Join Date: Feb 2006
Posts: 40
Rep Power: 0 guess is on a distinguished road
So what do u advise me to use while inserting the data into the database??should I use another programme?? maybe eclipse?? or C++ builder?? or C#?? or what do u advise??
guess is offline   Reply With Quote
Old Mar 3rd, 2006, 12:29 PM   #13
DaWei
Resident Grouch
 
DaWei's Avatar
 
Join Date: Jun 2005
Posts: 6,453
Rep Power: 10 DaWei is on a distinguished road
Maybe you'd like to show a couple lines from the raw file and a proposed DB structure.
__________________
Abstraction doesn't make it impossible to write bad code; it makes it possible to write superior code.
Contributor's Corner: Grumpy on C++ Exceptions DaWei on Pointers
DaWei is offline   Reply With Quote
Old Mar 3rd, 2006, 5:25 PM   #14
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
Quote:
Originally Posted by guess
I am given a job to create a database,parse the whole 12MB file and then put them into the appropriate locations in the database.
As DaWei says, if you only have one 12MB spam file you want to put into the database, then you can treat that as a separate task.

Ideally, you'd have a program that parses and inserts the spam file into the database, and a PHP script on your webserver that searches the database.

If you tell us how your spam file is formatted, and how you want your database to be structured, then we can advise you further.

For instance, say you had a spam file that contained large number of blacklisted IP addresses, and there was one IP address per line:
86.34.1.34
103.23.6.100
...
And you wanted to put this into a database table "spamlist" that had a single field called "ipaddress".

I favour Python and MySQL for most programming tasks like this, so that's what I'll use in this example:
import MySQLdb
database = MySQLdb.connect(passwd = "password", db = "spam")
cursor = database.cursor()

file = open("spamfile.txt")
for line in file:
    cursor.execute("INSERT INTO spamlist VALUES (%s)", line.strip())
file.close()
Unless your spam file has some really complex formatting, it should be reasonably easy to transfer it to a database.
Arevos is offline   Reply With Quote
Old Mar 3rd, 2006, 5:40 PM   #15
guess
Programmer
 
Join Date: Feb 2006
Posts: 40
Rep Power: 0 guess is on a distinguished road
Quote:
17/02/06 01:02:02: [qsheff], VIRUS, queue=q-1140130762-200475-94521, recvfrom=62.10.103.114, from=a.stoppiello@virgilio.it, to=epson@datapro.com.tr, subj=`Your day'
17/02/06 01:02:04: [qsheff], ATTACH, queue=q-1140130903-759515-95055, recvfrom=85.98.59.95, from=john@bilkent.edu.tr, to=alice@prizma.net.tr, subj=`no subject', spam=`message.zip', rule=`message.zip'
17/02/06 01:02:14: [qsheff], SAFE, queue=q-1140130931-28763-95124, recvfrom=87.122.8.203, from=bvxgtfap@osgen.com, to=alptekina@sofrayemek.com.tr, subj=`Fw: Finance for alptekina'
here it is.3 lines from the file.

and the db all consists of 11 fields. I am told that [qsheff] part will not be used,but I put a rowcount column instead of it but it is not auto increment.Im incrementing it manually.Whatever,I read the whole text and put each line into an array.And then I parsed according to the commas in the lines.This arrays first element consists of 3 parts.Day,time and [qsheff] which I dont need.So Im parsing them by spaces and dump the unneeded parts of the strings with string functions. I dont know how helpful these are,but what I think about is if there is a problem about getting data from the page.Im getting them via fgets function and I set the length to 4096 bytes.But as far as I am concerned,this is for lines.Would my script run faster if I reduce the size??

here is the code of that part

[PHP]$sf = @$_POST["selectedFile"];
$fp = fopen("$sf", "r") or die("Couldn't open file");
$data = "";

while(!feof($fp))
{
$data .= fgets($fp, 4096);
}

fclose($fp);

$values = explode("'\r\n", $data);
for($i=0;$i<sizeof($values);$i++){
$values2[$i] = explode(",", $values[$i]);
$values3[$i] = explode(" ", $values2[$i][0],-1);
}[/PHP]
guess is offline   Reply With Quote
Old Mar 3rd, 2006, 5:49 PM   #16
Arevos
Programming Guru
 
Arevos's Avatar
 
Join Date: Aug 2005
Location: England
Posts: 1,499
Rep Power: 5 Arevos is on a distinguished road
What does your database table look like? Also, is the spam file just going to be entered into the database once, or is it going to be continuously updated, with the updates being added to the database periodically?
Arevos is offline   Reply With Quote
Old Mar 3rd, 2006, 5:52 PM   #17
guess
Programmer
 
Join Date: Feb 2006
Posts: 40
Rep Power: 0 guess is on a distinguished road
Arevos I have just read ur message.Actually,this is not the only thing I did for parsing.I really pushed the boundaries of my brain while forming the dynamic query.Now everything is fine except one thing.I thought everything has finished.Queries were returning accurate results(of course the part inserted before CGI error) and my javascripts works fine.However,when I was checking the system trying each query one by one,I saw that the lines which has ERR instead of VIRUS or ATTACH must be parsed in a different way.
Now I have to find another algorithm for that.

Quote:
17/02/06 02:41:39: [qsheff], ERR, error=ATTACH, hint=open_attachlist,opendir_tempdir, queue=q-1140136897-869388-19577, from=postmaster@ser-gmbh.net, to=5bepson@datapro.com.tr
This is the part I talked about.What I want to show u is,I spend lots of time for parsing this damn text and I still has some more work to do.Briefly,it is not a text that can be easily parsed.I wish it was a csv file.So that I would be able to do what u advised.

I hope we can work this problem out.I wouldnt know this error would keep me away from my work that much
guess is offline   Reply With Quote
Old Mar 3rd, 2006, 5:57 PM   #18
guess
Programmer
 
Join Date: Feb 2006
Posts: 40
Rep Power: 0 guess is on a distinguished road
It is just going to be entered at once.I work with only one table.Actually thats how I was ordered.Here is the definition of the table

Quote:
create table t1(rowcount int(11),date1 varchar(50),time1 varchar(50),stats varchar(20),queue varchar(150),recvfrom varchar(150),fromm varchar(150),tto varchar(150),subj varchar(300),spam varchar(300),rule varchar(300));
guess is offline   Reply With Quote
Reply

Bookmarks

« Previous Thread in Forum | Next Thread in Forum »

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump




DaniWeb IT Discussion Community
All times are GMT -5. The time now is 3:17 PM.

Powered by vBulletin® Version 3.7.0, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Copyright ©2007 DaniWeb® LLC