Connect. Communicate. Collaborate. Securely.

Home » Kerio User Forums » Kerio Connect » Remove thousands of duplicate messages? (Remove thousands of duplicate messages?)
  •  
cenoxo

Messages: 31
Karma: 0
Send a private message to this user
Over a year ago, one of our users accumulated more than 10,000 duplicate messages (four or five copies of each message, eventually totaling over 10 GB's worth) in their Outlook 2003/Kerio Inbox. The duplication problem was solved, but the user needed to store the daily messages (and dupes) into a subfolder under their Kerio Inbox, and they simply didn't have time to manually examine and cull out the dupes.

Now we need to move that subfolder's messages into a local PST file on the user's Windows XP Pro PC. It takes a very long time to manually move a group of messages (usually only 2 or 3 days' worth at a time) from the user's Kerio subfolder into their local PST file. If we try moving more messages than this at one time, Outlook 2003 hangs or stops responding.

We've been trying to do this after hours to minimize any effects on our Kerio Mail Server (version 6.5.2). Even so, we're looking at many days of tedious work to get everything moved into the users local PST file.

Questions:

1. There are several Outlook 2003 add-ins available that can automatically find and remove duplicate messages in individual Outlook mail folders, but will these work safely with Kerio Mail Server/Kerio Outlook Connector?

2. Is there a duplicate file remover that can be used directly on the user's EML files in their Kerio mail store on our Windows 2003 Server?

Any help is appreciated.
  •  
jamesf

Messages: 119
Karma: 2
Send a private message to this user
I was an Exchange administrator for a number of years starting with Exchange 5. In my Exchange training I was told "Do Not create Inbox sub-folders" because Outlook did not always handle them correctly and could just lose them. Evidently this was removed in later releases of the training and people started creating Inbox sub-folders. I have seen and heard of problems similar to what you are experiencing and do not know of a work-around to it.

One thing you could try is to move the folder from under the Inbox and see if you can export a larger number of messages without Outlook hanging.

I know it is too late for you to change this but hope this will prevent others from doing the same thing and having similar problems.
  •  
pcunix

Messages: 592
Karma: 32
Send a private message to this user
You can do manual removal: either stop the server entirely or make certain nothing is accessing those folders and rename index.fld after.

You could do this with a Perl script if you can figure out what a duplicate really is. If it's entirely the same, you could do MD5 sums to decide; if not you'll have to make decisions based on headers or whatever.

I have a sample script at http://aplawrence.com/Unixart/remove_duplicate_files.html That's just a harness with suggestions; you'd need to modify it for your specific needs.

Tony Lawrence
Kerio Preferred Partner and Reseller
Certified for Connect, Control
http://aplawrence.com
  •  
cenoxo

Messages: 31
Karma: 0
Send a private message to this user
pcunix,

Your first suggestion sounds promising since I might bypass Outlook 2003 altogether:

Quote:
You can do manual removal: either stop the server entirely or make certain nothing is accessing those folders and rename index.fld after.


To confirm, I would manually (with a script or Windows utility) remove duplicate *.EML files in the user's appropriate #MSGS subfolder in their Kerio Mail Store folder on our Mail server.

If I know this user is not accessing their mail in Outlook or WebMail, must I stop the Kerio MailServer before removing the subfolder's dupes? I'll probably do this after hours, but this will take time and I'd prefer not to interrupt our other users.

After the dupes are removed, I would recreate that subfolder's INDEX.FLD by renaming it to INDEX.BAD, then log into the user's email account and open that subfolder, correct?

Quote:
I have a sample script at http://aplawrence.com/Unixart/remove_duplicate_files.html That's just a harness with suggestions...


Being unfamiliar with Perl scripting, I'd prefer an off-the shelf Windows application/utility. At your link above, a (rather ungracious) commenter mentions a free duplicate finder named:

Fast Duplicate File Finder
http://www.mindgems.com/products/Fast-Duplicate-File-Finder/ Fast-Duplicate-File-Finder-About.htm.

FDFF apparently compares files by content, displays a sorted list of found dupes; groups each oldest original file with its dupes; and gives the option of saving the dupes in a separate folder. I'll download this and test it on a sample copy set of our user's *.EML files.

Any caveats I should watch out for?
  •  
cenoxo

Messages: 31
Karma: 0
Send a private message to this user
After testing some other OTS duplicate finders today, this shareware utility seems to offer more controls and features, a decent interface, and is very affordable:

AcuteFinder
http://www.acutefinder.com/

Will post my results later.
  •  
cenoxo

Messages: 31
Karma: 0
Send a private message to this user
Before using AcuteFinder to eliminate duplicate EML files directly on our Kerio MailServer, I submitted a ticket to Kerio Tech Support. They can't be responsible for the behavior of any third-party software, but said that:

1. The user should be logged out of Kerio WebMail and/or Outlook email before any of their duplicate *.EML files are deleted.

2. For the benefit of other users, the Kerio MailServer could be left running while I searched for and deleted the user's duplicate message files.

3.After the dupes are deleted or moved, the INDEX.FLD file for the user's #MSGS subfolder must be rebuilt. Do this by stopping the KMS engine; rename the user's subfolder INDEX.FLD file to INDEX.BAD; restart the KMS engine; log into WebMail (or Outlook) as the user; and re-open their mail subfolder to create a new INDEX.FLD file from information stored in the INDEX.BAD file(this may take a few minutes).

Working after-hours about a week ago, I did the following:

1. Made sure the user was completely logged out of mail.

2. Opened AcuteFinder on our Kerio MailServer, set its options and search parameters*, and searched the user's appropriate #MSGS subfolder for duplicate *.EML files. This subfolder alone was about 9 gb in size with over 25,000 *.EML files.

*Although AF can find tens of thousands of duplicates in a single search, I used separate searches in descending file size ranges (500-100 mb, 100-50 mb, 50-25 mb, etc.) in case I ran into any problems and had to back out. This also let me find and remove the largest duplicates right away. My last search had no file size limits in order to catch any dupes that might have been missed previously.

I also moved the dupes out of the user's #MSGS folder onto a large USB drive attached to the mail server, but this is slower than deleting them directly in AF.

3. AF worked quickly, and it was easy to review and sort the dupes it found (these are listed in separate, color-coded sets like rows in an Excel spreadsheet.) When you select and move (or delete) an entire list of dupes, AF leaves one file in each set behind as the "original" file, and preserves its original time stamp.

4. I had no problems finding and removing all of the duplicate *.EML files in any of the separate AF searches. In the end, about 10,000 dupes were found and moved, and the user's #MSGS subfolder was reduced to about 5 GB total. Most of the remaining "original" messages were later moved into an Outlook 2003 *.PST file on the user's local hard drive.

5. Rebuilt the INDEX.FLD for the user's #MSGS subfolder as described above, then logged back into mail as the user and checked their cleaned-up subfolder.

Mail has been working OK for the user during the last week, their Outlook is noticeably faster, and we're no longer backing up all those dupes.

At $14.00, AcuteFinder was a real bargain and we'll use it again for additional cleanups.
  •  
vernados

Messages: 1
Karma: 0
Send a private message to this user
You can try Clone Remover. It will help you find duplicate images (photos, pictures). Search by contents files may have different names, but similar content. You can also use filters to search for duplicates of a certain type only. Try it.
This is website moleskinsoft com
  •  
just4work1974

Messages: 1
Karma: 0
Send a private message to this user
This software worked for me - dublicatefilesdeleter.com/
  •  
alanjader

Messages: 1
Karma: 0
Send a private message to this user
Thanks, I had the same trouble and I used dublicatefilesdeleter.com then my pc is cleaned
  •  
ameliaIvyboose

Messages: 1
Karma: 0
Send a private message to this user
hi,
I am never preferred manual technique because it is totally time consuming and lengthy process and also not secure. recently my friend send me an Outlook PST file and when I merge this PST file in my list then that time my Outlook contacts, emails, etc has become duplicate but that time I used third party software "sysTools Outlook duplicates remover" it is also suggested by my friend anjleena. If you want to try then first try its demo version.
  •  
kamila rodrigo

Messages: 1
Karma: 0
Send a private message to this user
for best result removing files "duplicatefilesdeleter"
Previous Topic: Duplicate Folders
Next Topic: External Spam
Goto Forum:
  


Disclaimer:
Kerio discussion forums are intended for open communication between forum members and may contain information and material posted by members which may be useful in learning about Kerio products. The discussion forums are not intended to provide technical support for any specific product. Any information implied or expressed in the discussion forums is that of the posting member. Kerio is in no way responsible for the information posted in the forums, or its accuracy. Kerio employees may participate in the discussions, but their postings do not represent an offical position of the company on any issues raised or discussed. Kerio reserves the right to monitor and maintain the forums to promote free and accurate exchange of information.

Current Time: Tue Dec 06 18:52:55 CET 2016

Total time taken to generate the page: 0.01147 seconds
.:: Contact :: Home ::.
Powered by: FUDforum 3.0.4.