Connect. Communicate. Collaborate. Securely.

Home » Kerio User Forums » Kerio Connect » SpamAssassin Bayes DB Token limit
  •  
Machete

Messages: 262
Karma: 5
Send a private message to this user
I've read that the Bayes DB only holds so many 'tokens' - and there's supposedly a place to increase this limit, but I haven't found it with Kerio's implementation of Spam Assassin.

Since I routinely have an issue with SPAM making it past the filters, regardless of what I set - I think this is my next step is pursuing a longer term solution from my SPAM increasing (modifying how many tokens the db stores).
See here: http://spamassassin.apache.org/full/3.1.x/doc/sa-learn.html# expiration
  •  
MarkK

Messages: 454
Karma: 46
Send a private message to this user
MAKE A BACKUP COPY FIRST!!!! Also, Kerio probably does not endorse making changes to this file. Also, it will be overwritten at the next Kerio update.

I believe the file you are looking for is the .\plugins\spamserver\spamassassin\site\lib\mail\SpamAssassin \Config.pm

I would suggest creating some of your own custom Spam Assassin scores and rules. I have written a post on this, as well as posted a good starting custom rule set. It has worked wonders for me.

http://forums.kerio.com/mv/msg/27477/0/
  •  
Machete

Messages: 262
Karma: 5
Send a private message to this user
Thanks Mark - You've helped me in the past with the rules, and I've written and tweaked and it's helped immensely - but as you know, SPAM isn't a set it and forget it process.

I've got it down to where I'm only blocking 40% SPAM and the other 60% of SPAM ends up in the Likely Spam folder - better than the inbox, and finding 1-2 HAM messages a day among 100+ in Likely Spam is aggravating for users. I can't spend 2 hours a day creating/tweaking rules and then restart the mail service each time for the new rules to take effect. And that's what I've been doing.

And this problem sub-sides each time I start with a fresh Bayes.db file. In other words, I go from 40% Block & 60% Likely Spam to 80% block with a fresh Bayes DB.
  •  
MarkK

Messages: 454
Karma: 46
Send a private message to this user
It did take some time to get things tweaked. Unfortunately it will never be perfect, 100% spam detection and 100% not-spam not-mismarked, from what I can tell. I do know that I'm at a point that I don't have to do much now. As I do see unmarked spam, I'll look and see if I can tweak a rule, and then not restart the server right away just for that one change.

In Connect's custom rules, I have added an Allow exception for various email addresses or domains that are valid but get caught by a rule.

I do have coworkers that complain spam is getting caught and put in their spam folder instead of deleted. I just explain that the spam filter is doing its job, allowing them the opportunity to catch a good email that has been mismarked. For us, that is maybe 1 a month.

Not sure what to say on your Bayes issue.
  •  
Machete

Messages: 262
Karma: 5
Send a private message to this user
After you provided the suggested place to look (which provided some additional description of the variables I was curious about) I started digging.
-Kerio's implementation has auto-expire off
-Bayes.DB file itself is over 50mb (and only a year old)
-using a SQLBrowser, there's over 800,000 tokens in the DB (whatever the upper limit value is defined it's not holding it to the 150,000 default limit)

So the Bayes DB being full of tokens doesn't appear to be my issue unless the overall size of the DB is an issue.

I'm curious how much disk space other users see their bayes.db consuming? All of previous bayes.db files were in the 10MB-27MB range.

I've followed your other posts on custom SA rules, so I'll keep plugging along there until someone who knows more about expiring tokens, etc. and the bayes.db provides more details.
  •  
MarkK

Messages: 454
Karma: 46
Send a private message to this user
Mine: 7 months old
bayes.db 15,236KB
bayes.db-journal 1,954KB
  •  
Machete

Messages: 262
Karma: 5
Send a private message to this user
Thanks Mark - My journal size is the same as yours, and I recognize that volume of SPAM, amount of users, etc will affect the size the of the bayes.db as much as time does.

I really appreciate you chiming in and providing some help and assistance.
  •  
McIrish

Messages: 236
Karma: 8
Send a private message to this user
Sorry to raise this thread from the dead but I have a question about bayes.db. Mine is now at 2.7GB which seems like it might be excessive. I'd hate to have to start over again with the learning process. What's recommended?
  •  
MarkK

Messages: 454
Karma: 46
Send a private message to this user
Wow, 2.7GB seems huge. I'm at 13 months old and only 17.5MB for the database, which is the biggest it has ever been for my installation. As mentioned before, I'm guessing the volume of spam and the number of users probably plays a role in the size of this. I really don't know anything directly on the Bayes filtering.

Do you think this is causing an issue? If so, what you can try is to stop Connect, copy the existing files in to a holding folder, such as .\MailServer\store\spamassassin\bayes\20150619, and then delete the files in the .\bayes folder. When you start Connect again, it will create fresh files. Then if you are seeing adverse effects from deleting the Bayes databases, you can stop Connect again and put back the old files.
  •  
ksnyder

Messages: 557
Karma: 36
Send a private message to this user
To add to this: once wiped, the Bayes DB begins learning again after around 200 spam emails. This shouldn't take long at all in most cases and is well worth the minor short-term inconvenience.

Ken Snyder
  •  
McIrish

Messages: 236
Karma: 8
Send a private message to this user
Ken,
Is the size of my database a problem? I'd hate to start over again only to find that it was never part of the problem.
  •  
ksnyder

Messages: 557
Karma: 36
Send a private message to this user
I don't know that the size, per se, is what really matters. A combination of the age of data in the Bayes filter and the sophistication of spammers can spoil the database. See http://kb.kerio.com/product/kerio-connect/server-configurati on/antispam/optimizing-spam-protection-in-kerio-connect-265. html and the section, "Managing SpamAssassin Bayes". In there you'll see a recommendation to check the Bayes score of detected, undetected, and legitimate mails to determine if you need to reset it.

At the end of the day, if the Bayes filter hasn't been reset in over a year, there's a good chance you'll benefit from a reset. The current size of your file is potentially slowing the system down and confusing the filter.

Ken Snyder
  •  
McIrish

Messages: 236
Karma: 8
Send a private message to this user
I stopped the Kerio service and then deleted the bayes.db (after making a backup) and tried to restart kerio. It wouldn't start. Event viewer shows that it couldn't start the spam filter. I'm not sure what I might have done wrong. Got any ideas?
  •  
MarkK

Messages: 454
Karma: 46
Send a private message to this user
What about the bayes.db-journal file?
Typically, what I do is create a dated folder (.\bayes\20150731), move the 3 files in to it so that the .\bayes\ folder is empty, and restart the server.

I'm wondering if you didn't delete the journal file, it is finding it and looking for the missing db file. Just a guess.
  •  
McIrish

Messages: 236
Karma: 8
Send a private message to this user
Hi Mark,
I didn't delete the autowhitelist. My next attempt, I stopped the service and then just renamed the bayes directory. That time it worked. Man... I was freaking out when it wouldn't start up. whew!

[Updated on: Fri, 31 July 2015 19:24]

Previous Topic: Sent Items on POP3
Next Topic: Free foward-only users
Goto Forum:
  


Disclaimer:
Kerio discussion forums are intended for open communication between forum members and may contain information and material posted by members which may be useful in learning about Kerio products. The discussion forums are not intended to provide technical support for any specific product. Any information implied or expressed in the discussion forums is that of the posting member. Kerio is in no way responsible for the information posted in the forums, or its accuracy. Kerio employees may participate in the discussions, but their postings do not represent an offical position of the company on any issues raised or discussed. Kerio reserves the right to monitor and maintain the forums to promote free and accurate exchange of information.

Current Time: Thu Oct 19 10:48:25 CEST 2017

Total time taken to generate the page: 0.00526 seconds
.:: Contact :: Home ::.
Powered by: FUDforum 3.0.4.