Company Reconstructs Its Email History to Prepare for an Unknown Future
When a construction company learned its core, profit-making divisions were to be sold to a new Group, the company’s IT department had to take steps to ensure it could assume legal responsibility for past contracts and be able to respond to any future eDiscovery requests.
For more than a decade, they had been managing the secure retention of its email records using the third-party email archive, HP/Autonomy EAS, which held around 4,000 staff mailboxes as well as many millions of emails captured by the Microsoft Exchange journal service.
They now had to extract this data (as the EAS archive would need to be de-commissioned to save costs) but they faced some significant hurdles in ensuring this data would be protected, manageable and readily discoverable:
- Where would they put it?: Although it was clear that users’ emails needed to be extracted out of the proprietary EAS archive format, the original Company was yet to establish the exact system they would use to store and search email records going forwards. They therefore needed the most neutral and flexible format possible.
- How could they ensure everything was captured? Following an initial analysis, the Company discovered its journal archive actually excluded some emails that were found in its mailbox archives. An explanation for this was that – at some point in time – the Exchange journal service could have been configured to only capture inbound and outbound traffic, and not internally exchanged emails.
- How could they minimise storage? In order to get the most complete set of email records possible, the Company would need to combine the contents of both its journal and mailbox archives. However, to minimise storage and eDiscovery times, they wanted to preserve the combined mailbox and journal contents in separate ‘user-based silos’ but without any duplicates.
Faced with these challenges and a small time-window in which to prepare its data in readiness for a takeover, Essential Computing, experts in email archive migration, were invited to establish the optimum solution.
4 Steps for Preservation
This is how the company went about preserving their email archives:
Step 1 – Agree on a Neutral Format Archive: It was determined that PST files would make a good interim file format that could easily be ingested by any archive or email storage system such as Office 365. Moreover, if the PSTs were organised according to individual users, the relevant PSTs could be quickly identified and readily ingested into an eDiscovery platform in the event of an investigation request.
Step 2 – Extract All Mailbox Archives: This step was also straightforward. Essential exported emails from the legacy EAS archives into individual PST files – creating one PST file per archived mailbox or ‘user’.
Step 3 – Extract Individual Journal Archives for all Users: As with most email archives, the journal archive in EAS comprised one archived ‘mailbox’ containing millions of single-instanced emails. Each email was stored alongside a list of all the individual addressees that appeared in the original message header. This included the FROM, TO, and CC addressees as well as anyone included the BCC field or whose address was part of a distribution list at the time the email was sent.
In order to build individual journal PST files, Essential would need to create a copy of the original email for each sender and recipient. And, to ensure a reliable data set for eDiscovery purposes, it was vital to accurately match the original email addresses found in the archived message header with the right ‘PST owners’.
This step would prove to be challenging, as during analysis, Essential discovered four times more ‘individuals’ in the journal archive than the total number of staff members the company had ever employed (and this excluded anyone outside of its email domain).
This was not a surprise as it’s not unusual for the same user to have more than one ‘moniker’ in an organisation’s email system. Typical reasons include:
- Different surnames resulting from a change in marital status.
- A mix of SMTP-style addresses as well as internal email aliases.
- Changes in addressing conventions over the years – initial_surname, lastname –etc
- Different domains as a consequence of historic merger and acquisition activity.
Essential managed the entire process of analysing and rationalising the journal contents, and then subsequently creating an accurate set of individual ‘personal journal’ PSTs.
Step 4 – Exclude Duplicates: Finally, to avoid creating any duplicate emails, Essential has to simply skip any item it had already processed when extracting the corresponding mailbox PST from the archive.
The end result was to create two PST files per individual:
- A mailbox PST (which included the original folder structure) and
- A ‘personal journal’ PST (which excluded copies of any emails already in the mailbox PST)
The company now had a complete as possible record of its legacy emails, together with complete freedom over where its data could be moved to meet their future needs.