Connect. Communicate. Collaborate. Securely.

Home » Kerio User Forums » Kerio Connect » Kerio Connect Queue SLOW (The KC server slows to a crawl and messages build up in the queue, people cannot connect)
  •  
support@KNOCKinc.com

Messages: 27
Karma: 0
Send a private message to this user
Hello everyone!

My Kerio Connect server has serious, intermittent performance issues. We run Mac OS X Server v10.6.8 on an Intel Xserve (Early 2009) 2.26GHz Quad Core Xeon with an internal 120GB SSD - but the data store lives on a SmartStor RAID 5 array with 4 7200RPM HDs and a 2TB LaCie Quad where the backups and archives live. Each of those is connected with eSATA - and that is new as a means to try and solve our problems.

The Kerio Connect server does internal authentication - no LDAP, OD or AD - and the only other process running on it's host machine is CrashPlan Pro Server by Code42, which we have turned off only to witness performance issues continuing. Spotlight is off on these data and archive / backups volumes.

We moved off of our internal RAID (mirror) to the external RAID5 array a week or two back. The problem left for a bit, but has since returned. We also identified INBOXes and Sent Items folders with over10,000 items in them and have begun separating out the emails into smaller subfolders - almost done with this and the biggest offenders have been dealt with.

What happens - and it usually happens in the mornings, then cools off at some point in the afternoons - is that the connections all drop (you can see it in the graphs - big gaps, as though the mail server shut down) and things build up in queue processing (20, 30 items) and the queue just build and builds - as much as 300 emails, sometimes hovers around 150, 100 or 75... People get emails slowly and iCal users cannot update their calendars.

I've called and emailed support and I rarely get help beyond, "Well your first problem is (HFS | Mac OS X's kernel). It's just not designed to handle lots of little files."

I get the feeling that, for servers that maintain 100 or so accounts plus, that do a certain number of emails per hour, Kerio wants to gently, but firmly, help get us on Linux or Windows for better performance. But I need to rule out a lot before we'll spend 5 to 10K on new hardware and a whole new operating system.

Has anyone seen anything like this and solved it? Has anyone else moved to Linux or Windows due to performance lags like this? Can you offer any insight while I wait for more opinions from Kerio's support team?

[img]./fa/2463/0/[/img]

  • Attachment: HTTP.png
    (Size: 87.72KB, Downloaded 278 times)
  •  
GlennK

Messages: 252
Karma: 3
Send a private message to this user
I feel your pain. I just finished clearing up a situation like this on an Xserve that was having an even worse backup problem. At least I hope its clear. The data store was moved to a RAID 10 eSATA unit and it seemed to make all the difference. I made disk speed tests beforehand to make sure it would be an improvement. Surprisingly, it is not a hardware RAID, but relying on Apple DU RAID, yet it is very fast and uses 4 7200RPM enterprise drives. DU RAID was selected for the highest compatibility in case of hardware failure. It turned out in tests to be faster than an eSATA RAID card we were going to use.

Another thing I believe made a difference was making sure that spamassasain was not checking messages in trusted networks. Previously it was, and I have a feeling that was slowing down user to user communication and contributing to queue backup for no good reason.

I have a sneaking suspicion the old drives were/are badly fragmented after several years of service, so fresh drives probably made a difference too, although I know that's not the case with you because you say the problem still exists. RAID 5 is known to be much slower than RAID 10 however, so if possible I would change that. It has been mentioned over and over in these forums that 10 is the preferred config.

All the clone operations on this server ended up to be at least 3 times as fast as before and no incoming messages stay in the queue longer than 15 seconds since the change, and it has been a few weeks. There are over 200 users on the Xserve.

IMAP users are going to make a huge difference, especially if they are endlessly deleting messages. Watch the operations log for clues.

What version of Connect are you running?

Also, definitely, positively make sure CrashPlan does not run during your work day if you are having trouble. Most of the time it will be fine, but occasionally it will need to maintain the archive and it seems to hit the disk hard.

What is your bandwidth at the server?

I've known some people that have moved to Linux but that seems like a drastic move esp for 100 users. Use what you know. Kerio is designed to run on Mac.

[Updated on: Sat, 11 February 2012 03:20]

  •  
support@KNOCKinc.com

Messages: 27
Karma: 0
Send a private message to this user
Glenn! Thank goodness there's someone out there who is seeing something similar to us!

We are using 7.3.2 build 6388, and before that 7.3.1.

My SmartStor does not support RAID 10, but my LaCie RAID does... perhaps I can swap them.

Do you know how you can determine if SpamAssassin is scrutinizing emails from my trusted networks? All the other tabs let me exclude my local network, but I did notice that the SPF tab had an option for excluding my local network that I hadn't turned on. So I just changed that.

My CrashPlan backups are pulling from a drive other than my data drive at night. But the CrashPlan Pro server runs constantly and some things back up to it all day. They all do, however, go to a Drobo that isn't involved in Kerio in any way whatsoever. And, as I said before, when I experience this disasterous slow down, I can turn off the server and client on the host Xserve and it makes squat of a difference.

My Xserve has 2 aggregated gigabit Ethernet ports that are online. I am getting some odd readings from Server Admin on the network throughput. It has a flatline at less than 200 KB/s and it's processor is flatlined at what appears to be 100%, but that can't be right. If I run top, it shows mailserver and other processes running in aggregate well over 200% and 300% at times. I am inclined to think of that as a reporting issue with whatever manages cpu and network tracking. But I suppose it could be a harbinger of my misery.

Thanks again for your response. And in advance for your reply.

Matthew
  •  
mwd

Messages: 67
Karma: 1
Send a private message to this user
Raid 5 is really the worse type of setup you can use. Writing small files to disk with Raid 5 will always be slow.

Kerio is light on CPU & Ram, you just need to ensure that you have fast drives and a good raid setup. So Raid 10, 4xSAS 10k drives + Raid Cache is the best idea.
  •  
GlennK

Messages: 252
Karma: 3
Send a private message to this user
Re spamAssasin, the wording isn't clear, but go to Spam Filter > Spam Rating. You want "Enable spam rating on" but "Enable rating of messages from trustworthy networks" off. The reason I say this is not clear is because although it doesn't say it specifically, this is part of spamAssassin so it is confusing since there is a separate tab for it. I am pretty sure the rating system is also SA, but regardless, turn that second box off if that makes sense to you. Why check messages from user to user on your own network?

I'm not sure about your numbers on the network throughput. I guess I was asking what speed your connection is and whether you have sufficient bandwidth in and out. 200KB in and out is low but use will peak way higher.

If the 200 and 300% cpu is accurate that is high. Maybe just open Activity Monitor and watch it throughout the day.

As far as Crashplan, I would be more comfortable if Crashplan Pro server was running on a different box. When you say you turn it off, do you actually kill the CPP service? Because if not, it could be performing maintenance, syncing with clients, etc. Even though it is a different drive, CPU use could be high working that data. I'm in general agreement with others that Kerio is more drive intensive than cpu intensive, however, I've noticed it does have some fairly large cpu spikes.

Are your blacklists doing a good job? Are you blocking spam? Any chance you are being hit hard there?
  •  
support@KNOCKinc.com

Messages: 27
Karma: 0
Send a private message to this user
Glenn,

Thanks a million for your advice. I have a RAID chassis here that supports RAID 10, but the current one we're using supports only RAID 5 (boo!). The one that supports RAID 10 was running RAID 5 for our network homes. So I did some swapping over the weekend and presto, the chassis that supports RAID 10 is now setup with RAID 10 and is hosting the mail server data! The other one is currently receiving the Network Home data. I love rsync, BTW.

As for CPP, yes, the commands I use are to disable the daemon.

sudo launchctl unload /Library/LaunchDaemons/com.crashplan.XXX.plist

When I disable both the client daemon and the server daemon, no change in performance occurs. I would be concerned that CPP clients throughout the day might be saturating network bandwidth as they try to backup to the server (though I only have three backing up during the day), but thanks to their choice of programming in Java, Code42's CPP can only use one CPU core, so the backups trickle (due to the processing bottle neck). I don't think it gets anywhere near it's maximum throughput - and that's also coming from observing the statistics in Activity Monitor. It should be able to do over 200 MB/s with aggregate gigabit, but it barely ever goes over 25 MB/s.

I also implemented your suggestions regarding SPAM. I have more SPAM Internet Blacklists activated (just had SpamHaus before, now I have all but WPBL selected), which may help stop the server from having to analyze so many nefarious messages that get into the queue with it's Bayesian models.

Anyway, with RAID 10 setup on the other RAID chassis and all my data transfered (as well as the aforementioned SPAM stuff), I am not experiencing the pre-noon purgatory of my last couple of work days. Smooth sailing so far. I will keep my eye on it all this week and see if this helped.

Thanks, Glenn (and mwd) for your advice!

Matthew
  •  
sfpete

Messages: 161
Karma: 9
Send a private message to this user

We've seen some of the same symptoms on some of our KC servers in operation... specifically those with higher user counts/traffic.

Are you running any Outlook 2011 Mac clients? Using calendar delegates can cause similar / disastrous results to server performance in the current KC connect versions.

Are you running any/many attachment filters? This seemed to drag performance down as well.

  •  
support@KNOCKinc.com

Messages: 27
Karma: 0
Send a private message to this user
sfpete,

Thanks for replying. We do have Outlook 2011, but it's built-in calendar is such a travesty that we only use it as a Mail client - no exchange-style connections here. We also use LDAP for contacts, but all calendaring is done via iCal.

In iCal, we do have quite a few delegates being viewed by quite a few people. Do you see performance lag in iCal, too?

We have mail attachment filters off entirely.

The frustrating thing about this is that, when we are hit by it, things that should happen quick end up taking a lot longer and you don't know if it's a symptom or cause, do you? We're still okay right now, but I won't claim victory until Friday without any issues.

Thanks!
  •  
GlennK

Messages: 252
Karma: 3
Send a private message to this user
Glad to hear things are better. Let us know how it goes. Mondays seem to be the worst everywhere I've seen. People are catching up with email, trashing junk from the weekend, cleaning inboxes and getting back to work after slacking off on Friday. If you make it through a few Mondays, you'll probably be okay.

Check all your logs frequently for errors and see what you see!
  •  
support@KNOCKinc.com

Messages: 27
Karma: 0
Send a private message to this user
Just a follow-up... our issues with this performance problem were solidly eliminated by moving off of OS X Server and on to Ubuntu 12.0.4 (not server). For many months we didn't have to think about performance. Now, our performance issues have returned. But, we still think that OS X Server is not an ideal platform for Kerio Connect.
Previous Topic: STARTTLS message in Warning log
Next Topic: New Pricing
Goto Forum:
  


Disclaimer:
Kerio discussion forums are intended for open communication between forum members and may contain information and material posted by members which may be useful in learning about Kerio products. The discussion forums are not intended to provide technical support for any specific product. Any information implied or expressed in the discussion forums is that of the posting member. Kerio is in no way responsible for the information posted in the forums, or its accuracy. Kerio employees may participate in the discussions, but their postings do not represent an offical position of the company on any issues raised or discussed. Kerio reserves the right to monitor and maintain the forums to promote free and accurate exchange of information.

Current Time: Sat Aug 19 09:26:30 CEST 2017

Total time taken to generate the page: 0.00528 seconds
.:: Contact :: Home ::.
Powered by: FUDforum 3.0.4.