Connect. Communicate. Collaborate. Securely.

Home » Kerio User Forums » Kerio Connect » Restrict Bayes training to specific users?
  •  
aj08

Messages: 57
Karma: 1
Send a private message to this user
I've noticed that lately SpamAssassin has begun to tag quite a few very legitimate messages as spam and it's largely due to the message getting a Bayes_99 score. Unfortunately I believe this is caused by multiple users training spam/not spam either because they are lazy and don't want to unsubscribe from certain mailings or they are using third party applications which move messages to the Junk folder where they are then learned by the Bayes system.

I know that I'm going to have to reset they Bayes system since it's pretty messed up by now but what I'd like to know is how I can prevent this from happening all over again. I'm used to being able to control what messages are added to the Bayes database by training it from the command line. On the mail systems where I have this control, the Bayesian filter is extremely accurate. Is it possible to disable automatic learning in Kerio for specific users or for all users and to require manually learning form messages that I choose?

Basically I just can't trust users to properly classify messages and I would like to take that ability away from them and handle it myself. Any ideas?
  •  
TorW

Messages: 769
Karma: 9
Send a private message to this user
You can use the bayes_ignore_to setting in local.cf to turn off Bayesian scoring for certain recipients, but the mails will still score (too) high for other users. You could also use whitelist_from_rcvd in local.cf, but that's hard to maintain.

I would simply nuke the bayes database and try to train the users in how they should classify spam. It'll pay off in the long run, and everybody will get their ham and spam separated better.

Stuff like this is hard to solve with technology.

[Updated on: Fri, 05 December 2008 21:29]

  •  
sgongola

Messages: 109
Karma: 0
Send a private message to this user
We use POP for the "normal" users so their junking an email does not affect spamassassin. Some more knowledgable people have IMAP access to get email. They have a better understanding of what should be junked into spamassassin and what shouldn't. Since spam is usually not recipient specific, the imap users likely see the same spam as others and their spamassassin training works for everybody.

Of course, it affects how you want your email handled, whether you want to keep it all on the server or not, but it seems to work for us.
  •  
aj08

Messages: 57
Karma: 1
Send a private message to this user
I'm familiar with training SpamAssassin using the sa-learn command so I'm curious how Kerio's SpamAssassin learns from Ham. Will it only learn Ham if it's tagged incorrectly as spam first and then marked as not spam? Will it learn Ham if a message just has a score below a certain value? What I really want is a way to train Ham without having to mark it as spam first. There are many very high scoring bayes messages that are just squeaking under the spam tagging score and I want to lower the bayes scores by just marking them as Ham.

Does Kerio only learn Ham if a message has been marked as spam first or does it have another way for learning Ham? I've never trusted automatic training of bayes on my other servers and so far I'm not too pleased with how it's working on the Kerio server. I'm afraid that Kerio just can't give me what I want which is a method of controlling exactly what does and does not get added to the bayes training.

[Updated on: Sat, 10 January 2009 23:22]

  •  
TorW

Messages: 769
Karma: 9
Send a private message to this user
I'm curious too, since there are little or no clues neither in the logs nor in the documentation just how KMS learns spam and ham. The only thing I've found is some log snippets on what happens when the user clicks the spam button in KOC.

In general, this is my beef with the anti spam system in KMS (on Red Hat):

1. Impossible to learn spam/ham from a central location (this is best practice almost everywhere) since there is no (?) way to start a manual learning process. Spam- and hamtraps (Bayes db seeders) thus aren't possible.

2. The quarantine is hard to use for anything since mails are forwarded there. The Spam/Not spam buttons is also probably a bad idea to use on them (particularly since it involves hours of manual labour every day). Headers are lost, and the users are confused if you send someone a false positive from the quarantine.

3. No separate spamd daemon. The keriomailserver daemon must be restarted if you tweak the SpamAssassin settings (yeah, I know we aren't supposed to), bringing the entire mail system to a standstill for half a minute.

4. No indication of SpamAssassin autolearn in the logs.

5. DCC and Razor available only through extremely convoluted and probably unsupported means.

6. Scoring on the built-in SpamAssassin rules is too conservative compared to the rule set which accompanies a vanilla SA download.

Overall, the flexibility and adaptability of the original SpamAssassin system is modified too much and thereby in my opinion lost, leaving us unable to adapt and react to our (perceived or not) spam reality. Whenever we discover new types of spam slipping through the filters, all we can do is make simple SA custom rules (see #3) or hope that lots of users mark them as spam.

As for the latter, I'm not about to tell our customers that the spam in their inbox is their own fault Wink

The above is also being sent to support as feedback.
  •  
RHarmsen.nl

Messages: 189

Karma: 0
Send a private message to this user
TorW wrote on Mon, 02 February 2009 13:21


The above is also being sent to support as feedback.


Let's hope Kerio will at least add this to the feature request queue.
  •  
freakinvibe

Messages: 1542
Karma: 62
Send a private message to this user
Quote:

4. No indication of SpamAssassin autolearn in the logs.


If you tick "Spam Assassin Processing" in the debug log, you will get log entries like

[03/Feb/2009 17:34:22][3620] {spamassassin} dbg: learn: auto-learn: message score: 4.319, computed score for autolearn: 2.153
[03/Feb/2009 17:34:22][3620] {spamassassin} dbg: learn: auto-learn? ham=0.1, spam=12, body-points=0.001, head-points=2.152, learned-points=1.567
[03/Feb/2009 17:34:22][3620] {spamassassin} dbg: learn: auto-learn? no: inside auto-learn thresholds, not considered ham or spam

Quote:

6. Scoring on the built-in SpamAssassin rules is too conservative compared to the rule set which accompanies a vanilla SA download.

You can change or replace 50_scores.cf, no problem

Dexion AG - The Blackberry Specialists in Switzerland
https://dexionag.ch
  •  
TorW

Messages: 769
Karma: 9
Send a private message to this user
freakinvibe wrote on Tue, 03 February 2009 17:43

If you tick "Spam Assassin Processing" in the debug log, you will get log entries like


Huh? All I get is this on 6.6.1.

03/Feb/2009 22:43:04][31037] {spam} SpamAssassin result string for message file /opt/kerio/mailserver/store/queue/28/4988ba67-00004dc5.eml, time 0.17s: No, -1.388,5,AWL: 0.276,BAYES_00: -1.665,HTML_MESSAGE: 0.001
[03/Feb/2009 22:43:04][31037] {spam} Spam Filter: SpamAssassin check finished, adding score -1.39
[03/Feb/2009 22:43:04][31037] {spam} Spam Filter: Custom spam rules check finished, adding score 0.00
[03/Feb/2009 22:43:04][31037] {spam} Spam Filter: Message 4988ba67-00004dc5 from <sender<_a.t_>example.com> to <rcpt<_a.t_>example.cm> got 0.00 hits, total spam score is -1.388

And why is it in the debug log anyway? This isn't debug information, it's operational info.

Quote:

You can change or replace 50_scores.cf, no problem


... until KMS is upgraded and 50_scores.cf is replaced with new scores again plus perhaps brand new meta and normal rules. Can you spell maintenance nightmare?


I should maybe point out that we (where I work, that is) run several mail servers of various makes (qmail, Exim, Exchange, KMS), which use SpamAssassin. The ones beside KMS uses a standardized SpamAssassin 3.25 setup, but on KMS we cannot employ years of SA experience because we don't know exactly how it works and how it interacts with the rest of the system. It's not documented anywhere on kerio.com, and the official SpamAssasin documentation is next to useless in this case.

I was also told today that DCC and Razor is illegal to use on KMS because of GPL licensing issues, which is a somewhat odd statement. Maybe support misunderstood when talking to the tech guys. Who knows.

I am not mad or angry or anything, just a tad frustrated that we are left with what to me seems like a half-baked, undocumented and too-customised SA setup. Spam is a hi-speed moving target and demands more of the spam filter than Kerio is offering at the moment. Plus, 7 years of SpamAssassin experience is now almost null and void.
  •  
Pavel Dobry (Kerio)

Messages: 5245
Karma: 251
Send a private message to this user
TorW wrote on Tue, 03 February 2009 23:25

I was also told today that DCC and Razor is illegal to use on KMS because of GPL licensing issues, which is a somewhat odd statement. Maybe support misunderstood when talking to the tech guys. Who knows.


Yes, this is quite incorrect. You can use DCC and Razor on your servers as long as you are in compliance with respective licenses. Unfortunately, it is (at least it was with SA 3.1.1 used in current KMS) hard to fulfill these licenses in all KMS installations so we couldn't offer this as an integral part of KMS anti-spam solution.
You are free to download and install any SA additional module. KMS is using standard directory hierarchy for Perl modules in mailserver/plugins/spamassassin.
Quote:


I am not mad or angry or anything, just a tad frustrated that we are left with what to me seems like a half-baked, undocumented and too-customised SA setup. Spam is a hi-speed moving target and demands more of the spam filter than Kerio is offering at the moment. Plus, 7 years of SpamAssassin experience is now almost null and void.


We are using standard SpamAssassin package. The only difference is that SA is started directly from C++ code via Perl interpreter and is not running as spamd daemon. In fact, there are only two Kerio patches: workaround for deadlocks in DBM used for Bayes and increased score for Bayes tests (which means our scoring is not more conservative than default SA scoresets).
  •  
Pavel Dobry (Kerio)

Messages: 5245
Karma: 251
Send a private message to this user
TorW wrote on Tue, 03 February 2009 23:25

[
Huh? All I get is this on 6.6.1.



Enable "Spam Assassin Processing" in the debug log. The output is from "Spam Filter" debugging.
  •  
freakinvibe

Messages: 1542
Karma: 62
Send a private message to this user
Quote:

Huh? All I get is this on 6.6.1.

As pdobry already wrote, use the correct option, it is

"Spam Assassin Processing" and NOT "Spam Filter".

Quote:

And why is it in the debug log anyway? This isn't debug information, it's operational info.

That seems to be a matter of preference. I personally consider it as debugging information as in my day-to-day business as an e-mail admin, I would never look at it.

Quote:

Can you spell maintenance nightmare?

That's a matter of how you organize yourself. I have a couple of custom rules, DCC and other stuff. I just copy that over after an upgrade. And I only upgrade once a year, so I keep the admin work low.

Dexion AG - The Blackberry Specialists in Switzerland
https://dexionag.ch
  •  
TorW

Messages: 769
Karma: 9
Send a private message to this user
freakinvibe wrote on Wed, 04 February 2009 09:25

Quote:

Huh? All I get is this on 6.6.1.

As pdobry already wrote, use the correct option, it is

"Spam Assassin Processing" and NOT "Spam Filter".


Crikes. It logs about 100 lines of verbose debugging instead of just tagging "autolearn=spam|ham|unavailable" at the end of the SA result string.

But as you said: it's a matter of preference, and logs are important to us. Without them we cannot know what's going on. On the other hand, logging too much tends to drown out what we're looking for. Guess I'm just too used to getting exactly what I want from FOSS systems ...
  •  
jshaw541

Messages: 471
Karma: 0
Send a private message to this user
TorW wrote on Thu, 05 February 2009 01:22


Crikes. It logs about 100 lines of verbose debugging instead of just tagging "autolearn=spam|ham|unavailable" at the end of the SA result string.



Which is why it's under debug.log and not "operational info" Smile

Kerio MailServer 6.7.1 w/AD
Windows Server 2003 SP 1
Dell PowerEdge 2850 (Dual Xeon 3.2ghz and 2 GB RAM)
~1300 users
~1000+ concurrent IMAPS connections
iPhone users
Outlook 2007 KOFF users
Apple iCal 10.5/10.6 users
  •  
TorW

Messages: 769
Karma: 9
Send a private message to this user
Yes, but knowing whether a mail was added to the Bayes db or not is not debugging information. Just because a mail is considered ham doesn't mean it's automatically learned as ham in the Bayes scoring system. Thus, we also need to feed the Bayes db with non-autolearned ham to keep the spam database in balance. A Bayes db with a skewed amount of ham vs. spam is practically useless.

On a very busy system, it's cheaper to traverse the logs to decide if a mail should be manually classified as ham, than just dumping the whole inbox in the bayes db. Take a look at the CPU load when learning happens, then take a look when you process a large textfile.

Bayesian filtering is about probabilities and statistics, not binary decisions.

Anyway, that's the last thing from me on this particular issue.
Previous Topic: iPhone Shared Calendar Beta
Next Topic: IP Whitelisted, but still spam?
Goto Forum:
  


Disclaimer:
Kerio discussion forums are intended for open communication between forum members and may contain information and material posted by members which may be useful in learning about Kerio products. The discussion forums are not intended to provide technical support for any specific product. Any information implied or expressed in the discussion forums is that of the posting member. Kerio is in no way responsible for the information posted in the forums, or its accuracy. Kerio employees may participate in the discussions, but their postings do not represent an offical position of the company on any issues raised or discussed. Kerio reserves the right to monitor and maintain the forums to promote free and accurate exchange of information.

Current Time: Tue Oct 24 07:42:36 CEST 2017

Total time taken to generate the page: 0.00604 seconds
.:: Contact :: Home ::.
Powered by: FUDforum 3.0.4.