Home » Kerio User Forums » Kerio Connect » Email Archive Script
  •  
robvas

Messages: 1
Karma: 0
Send a private message to this user
I thought I'd share a small script I made to help with archiving Kerio mailboxes. Like the rest of you, I had users with 10, 20, 30, even 50GB of email from over the years.

This script runs on Python 2.6 (have not tested under 3, as our email server is CentOS 5 - ancient!). You can run this script from the terminal in a users mail folder and it will move the files around into folders based on year.

So, on my system I run the script from:

/opt/kerio/mailserver/store/mail/mydomain.com/username

You'll end up with something like

2016
├── Deleted Items
├── Inbox
| ├── foo
| └── bar
└── Sent Items

2017
├── Deleted Items
├── Inbox
| ├── foo
| └── bar
└── Sent Items

...


And of course, after the script runs you will want to re-index the mailbox. And possibly move the old files to another server or backup drive, etc.

import os, sys
import email
import pyzmail

from shutil import copyfile
import datetime as datetime


def get_date_of_email(filename):
    """
    Return the date parsed from 'Date:' in the message header

    Args:
        filename: the name of the email file, typically .eml extension

    Returns:
        datetime object from the message header, or "NONE" if
        there is not one found, or the date is an un-standard format

    """
    input_file = open(filename)
    msg = pyzmail.parse.message_from_file(input_file)
    input_file.close()
    date_string = msg.get_decoded_header('Date')
    if date_string == '':
        return "NONE"
    else:
        date_tuple = email.utils.parsedate(date_string)
    # handle dates like 11/16/2016 3:29:16PM
    if date_tuple is None:
        print "Date was not in RFC2822 format: ", date_string
        return "NONE"
    else:
        # handle yy dates, ie '16' instead of '2016'
        if date_tuple[0] < 100:
            # convert tuple to list so we can modify, then convert back
            l = list(date_tuple)
            l[0] += 2000
            date_tuple = tuple(l)
        return datetime.date(*date_tuple[:3])

def archive_folder(folder_name):
    """
    Crawls the mailbox folder and moves all items to a new folder,
    by the date of the message headers, for each calendar year

    Args:
        folder_name: name of the folder/directory to crawl

    Returns:
        none

    """
    print "Archiving ", folder_name
    if not os.path.isdir(folder_name):
        print "Directory for folder doesn't seem to exist: ", folder_name
        return
    today = datetime.date.today()

    # recursively walk email directory
    for dirpath, dirnames, filenames in os.walk(folder_name):
        for f in filenames:
            # combine the full directory path with the file name
            full_filename = os.path.join(dirpath, f)
            print "* Processing ", full_filename

            # is this an .eml file?
            if os.path.splitext(f)[1].lower() != ".eml":
                print "Not an .eml file. Skipping ", full_filename
            else:
                # is 'noarchive' in the name?
                if any(s in full_filename.lower() for s in no_archive_directives):
                    print "'NO ARCHIVE' is set for ", full_filename
                else:
                    msg_date = get_date_of_email(full_filename)
                    if msg_date == "NONE":
                        print "DATE ERROR with ", full_filename
                    else:
                        # is this message over 1 year old? archive it.
                        diff = today - msg_date
                        if diff.days > 365:
                            # build the new directory starting with the message year
                            # create if it doesn't exist
                            target_dir_name = os.path.join(cwd, str(msg_date.year), dirpath)
                            if not os.path.exists(target_dir_name):
                                print "Need to create the directory ", target_dir_name
                                os.makedirs(target_dir_name)

                            print "copied to ", target_dir_name
                            movefile(full_filename, os.path.join(target_dir_name, f))


# main script begins

# folders to archive
folders_to_archive = ("INBOX", "Sent Items", "Deleted Items")
# ignore folders with this text in name. these will be compared case-insensitive
no_archive_directives = ("noarchive", "no archive")

cwd = os.getcwd()
print "current dir: ", cwd

for f in folders_to_archive:
    archive_folder(f)

[Updated on: Mon, 26 November 2018 13:53]

Previous Topic: Calender time incorrect
Next Topic: Outlook 2019
Goto Forum:
  


Disclaimer:
Kerio discussion forums are intended for open communication between forum members and may contain information and material posted by members which may be useful in learning about Kerio products. The discussion forums are not intended to provide technical support for any specific product. Any information implied or expressed in the discussion forums is that of the posting member. Kerio is in no way responsible for the information posted in the forums, or its accuracy. Kerio employees may participate in the discussions, but their postings do not represent an offical position of the company on any issues raised or discussed. Kerio reserves the right to monitor and maintain the forums to promote free and accurate exchange of information.

Current Time: Mon Dec 17 21:26:05 CET 2018

Total time taken to generate the page: 0.98250 seconds
.:: Contact :: Home ::.
Powered by: FUDforum 3.0.4.