Connect. Communicate. Collaborate. Securely.

Home » Kerio User Forums » Kerio Connect » Serious Kerio Performance Degradation Durin RAID Rebuild
  •  
bluehat

Messages: 7
Karma: 0
Send a private message to this user
Over the last 24 hrs we have experienced an utter melt-down of Kerio. We run a 120user server, and well, it just went south in a big way.
What happened is interesting:

1. One of the disks in our RAID 1+0 array went bad over the course of several days.
2. Once it went totally bad, we replaced it and the RAID rebuild was initiated by the Promise Raid Controller.
3. The Disks are 1TB and the rebuild took apx 10hrs.
4. During the rebuild performance of the RAID array was degraded (apx 50%).
5. During the rebuild Kerio became unusable, and unresponsive, index file rebuilds took too long, failed, and users would see email disappear from their IMAP clients.
6. webmail was utterly unresponsive, esp for users with large INBOX'es
7. error log was full of folder locked after 20 retries messages.
8. recovery of Kerio after rebuild is still not 100%.

Mad

At minimum:

a) we need better diagnostics on what is really going on, slow disk performance should not impact correct behavior
of kerio. It's acceptable for it to be slower, but not utterly unusable.

b) for gods sake give us a way to rebuild the index.fld file without having to restart kerio, the auto rebuild is not
workable when the above conditions exist.

c) What the hell does this mean:

[27/May/2010 17:00:08] ASyncKeyDatabase.cpp: ActiveSyncKeyDatabase::StoreFolderInfo: FolderName is empty
[27/May/2010 17:00:08] ASyncAirSync.cpp: ActiveSyncAirSync::ProcessCollection: Unable to open folder with CollectionId:
2B6242CD-F26E-4B01-BB09-7D757DF8165E

Mostly my point is I have no clue what to do to "fix" such an error, or where to look, this is unacceptable.

d) Sure would be nice to be able to tweak the write/read timeout that is obviously
being used to lock folders.


Anyway, that's enough for now...
  •  
gmaoret

Messages: 49
Karma: 2
Send a private message to this user
This is normal and not be a Kerio issue.

Every system will be very slow during a RAID 10 rebuild, especially if the RAID controller is an embedded or economic one.

If you want better performance go to 15Krpm SAS disks with a professional RAD 5 SAS controller.

[Updated on: Fri, 28 May 2010 10:20]

  •  
giobbi

Messages: 90
Karma: 0
Send a private message to this user
Man do i agree with A and B!

Its like wandering in total darkness every time, guessing and guessing. Takes forever to fix stupid problems like the forever breaking index.fld files...
  •  
cloud2

Messages: 35
Karma: 0
Send a private message to this user
And it didn't come to you to call or mail the support department of kerio for your issues?
No instead of that your are complaining on a user forum to calm your frustration.
  •  
rigo

Messages: 123
Karma: -3
Send a private message to this user
it never ceases to amaze me how people blame software when their non enterprise hardware is less to be desired--10hrs for 1tb rebuild should raise BIG red flags. one can cross the atlantic in a sail boat or in a transatlantic ocean liner, both will get you there--maybe, but I can sleep well knowing peoples mail will be there served from enterprise hardware.

kms has room for improvement, sometimes it seams slow and frustrating that kms does not implement core mail server features--but with all its quirks--it is a solid viable solution.
  •  
bluehat

Messages: 7
Karma: 0
Send a private message to this user
It's fascinating how people jump to conclusions.

Just for reference, the raid is a Promise with 32Drives setup as two dual 16 drive bays with fiber channel between them the controller and the dual Cisco switches that expose the array via fiber channel to the servers with volumes on the array. It's no slouch under normal conditions, and fairly decent read/write performance when it is happy.

The 10hr rebuild is about right for a 1TB 7200RMP disk running flat out at its top sustained write speed.
So no issue there.

The issue with degraded I/O comes from the dual controller providing only apx 50% throughput during rebuilds.
Just to be sure we are all on the same page that means it goes from 240MB/s to apx 120MB/s which should still
be just fine.

Kerio is pretty much flawless when the array performance is nominal. In degraded mode some reads/writes look to be delayed, so Kerio times out and *LOCKS* the folder it appeared to be accessing, thereby effectively rendering a user's email unusable -- not good in an enterprise setting. You can argue that you should spend more money on higher end disk solution, BUT, sometimes that is not an option -- as is the case here.

The fact that *software* is responsible for *LOCKING* the folders is something that IS in the control of Kerio entirely and is something that should be tweaked in order to deal with a situation such as this. Furthermore, it's not at all clear what the *LOCKED* state means other than it appears owners of the folders can't read their email.

It is entirely appropriate to discuss this in an open forum rather than just with support because, others may wish to know of this issue, and maybe, just maybe, if some of us raise a collective voice then Kerio will provide a solution to this issue that an administrator can choose to enable when something like this happens to them.

As I said in my initial posting, longer latency for operating with email would have been fine, locking people out, is not.

  •  
gmaoret

Messages: 49
Karma: 2
Send a private message to this user
bluehat wrote on Fri, 28 May 2010 17:51
It's fascinating how people jump to conclusions.

Just for reference, the raid is a Promise with 32Drives setup as two dual 16 drive bays with fiber channel between them the controller and the dual Cisco switches that expose the array via fiber channel to the servers with volumes on the array. It's no slouch under normal conditions, and fairly decent read/write performance when it is happy.

The 10hr rebuild is about right for a 1TB 7200RMP disk running flat out at its top sustained write speed.
So no issue there.

The issue with degraded I/O comes from the dual controller providing only apx 50% throughput during rebuilds.
Just to be sure we are all on the same page that means it goes from 240MB/s to apx 120MB/s which should still
be just fine.

Kerio is pretty much flawless when the array performance is nominal. In degraded mode some reads/writes look to be delayed, so Kerio times out and *LOCKS* the folder it appeared to be accessing, thereby effectively rendering a user's email unusable -- not good in an enterprise setting. You can argue that you should spend more money on higher end disk solution, BUT, sometimes that is not an option -- as is the case here.

The fact that *software* is responsible for *LOCKING* the folders is something that IS in the control of Kerio entirely and is something that should be tweaked in order to deal with a situation such as this. Furthermore, it's not at all clear what the *LOCKED* state means other than it appears owners of the folders can't read their email.

It is entirely appropriate to discuss this in an open forum rather than just with support because, others may wish to know of this issue, and maybe, just maybe, if some of us raise a collective voice then Kerio will provide a solution to this issue that an administrator can choose to enable when something like this happens to them.

As I said in my initial posting, longer latency for operating with email would have been fine, locking people out, is not.




Strange... As far I know a raid 0+1 config can be made between 4 disks and not between 32...

However, I've never had problems of corruption neither with a raid 10 with 4 SATA 7200 rpm disks in degraded mode, only slowliness.

I think that you must open a ticket with the Kerio assistance.
  •  
elias

Messages: 114
Karma: 0
Send a private message to this user
bluehat wrote on Fri, 28 May 2010 08:51
Just for reference, the raid is a Promise with 32Drives setup as two dual 16 drive bays with fiber channel between them the controller and the dual Cisco switches that expose the array via fiber channel to the servers with volumes on the array.

That's a serious array. Is it shared with other systems or dedicated to KMS?

I'm wondering if its possible that some corrupted data was written to the array as a result of the drive going bad before the controller disabled it? I've had a lot of drives go bad in arrays and have never had that happen, but it seems from your description that something like that may have happened.

-Elias
  •  
My IT Indy

Messages: 1262
Karma: 40
Send a private message to this user
I wonder if the array disables the write-cache when it's rebuilding?

-
My IT Indy
Kerio Certified Reseller and Hosted Provider
http://www.myitindy.com
Previous Topic: Unable to Check Webmail Inside LAN?
Next Topic: MS Exchange to Kerio
Goto Forum:
  


Disclaimer:
Kerio discussion forums are intended for open communication between forum members and may contain information and material posted by members which may be useful in learning about Kerio products. The discussion forums are not intended to provide technical support for any specific product. Any information implied or expressed in the discussion forums is that of the posting member. Kerio is in no way responsible for the information posted in the forums, or its accuracy. Kerio employees may participate in the discussions, but their postings do not represent an offical position of the company on any issues raised or discussed. Kerio reserves the right to monitor and maintain the forums to promote free and accurate exchange of information.

Current Time: Sat Nov 18 04:07:01 CET 2017

Total time taken to generate the page: 0.00529 seconds
.:: Contact :: Home ::.
Powered by: FUDforum 3.0.4.