Working around slow PST import in Exchange Online

If you’ve tried Exchange Online PST import then you probably know that it’s as slow as molasses in January and sucks in pretty much every way.

  • “PST file is imported to an Office 365 mailbox at a rate of at least 1 GB per hour” is pure fantasy, 0,5GB per hour should be considered excellent throughput and in test runs I achieved only ~0,3 GB/h. Running in one batch seems to import PSTs with limited parallel throughput (almost serially).
  • Security & Compliance Center is just unusably slow.
  • I had to wait 5 days for Mail Import Export role to propagate for Import to activate. Documented 24 hours, you wish.
  • Feedback
  • I’ll just stop here…

I had a dataset to import and I didn’t plan to wait for a month so I looked around a bit. Only hint was in a lost Google result that you should separate imports into separate batches. However GUI is so slow that it’s just infeasible. So I went poking around in the backend.

This blog looked promising and quite helpful  but was concerned with other limitations of GUI import. Nevertheless, you should read it to understand the workflow.

PowerShell access exists and works quite well. There’s talk of “New-o365MailboxImportRequest” CmdLet but that’s just ancient history. New-MailboxImportRequest works fine, just source syntax is different from on-prem version.

Notes:

  • You MUST use generic Azure Blob Storage. Autoprovisioned one ONLY works with GUI. If you try to access it via PowerShell, you just get 403 or 404 error for whatever reason.
  • Generate one batch per PST.
  • Azure blobs are Case Sensitive. Keep that in mind when creating your mapping tables.

So in the end I ran something like that. Script had a lot of additional logic but I cut parts unrelated to the problem at hand.

#base URL for PSTs, your blob storage
$azblobaccount = 'https://blablabla.blob.core.windows.net/blablabla'
#the one like '?sv=...'
$azblobkey = 'yourSASkey'
#I used mapping table just as in Microsoft instructions and adapted my script. My locale uses semicolon as separator
$o365mapping = Import-Csv -Path "C:\Dev\o365mapping.csv" -Encoding Default -Delimiter ';'
ForEach ($account in $o365mapping) {
	#In case you have some soft-deleted mailboxes or other name collisions, get real mailbox name
	$activename= (get-mailbox -identity $account.mailbox).name
	#Name = PST filename
	#CASE SENSITIVE!!!
	$pstfile = ($azblobaccount + '/' + $account.name)
	#Just to differentiate jobs
	$batch = $account.mailbox
	#targetrootfolder and baditemlimit are optional. Batchname might be optional but I left it in just in case
	new-mailboximportrequest -mailbox $activename -AzureBlobStorageAccountUri $pstfile -AzureSharedAccessSignatureToken $azblobkey -targetrootfolder '/' -baditemlimit 50 -batchname $batch
}

So how did it work? Quite well actually. I had 68 PSTs to import (total of ~350GB). Creating all batches took roughly an hour as I hit command throttling. But as created jobs were already running, it didn’t really matter.

 (get-mailboximportrequest|measure).count
68

Exchange Online seems to heavily distribute batches over servers, hugely helping in parallel throughput.

((Get-MailboxImportRequest|Get-MailboxImportRequestStatistics).targetserver|select -unique|measure).count
65

As Exchange Online is quite restricted in resources, expect some imports to always stall.

Get-MailboxImportRequest|Get-MailboxImportRequestStatistics|group statusdetail|ft count,name -auto

Count Name
----- ----
   43 CopyingMessages
   13 Completed
    8 StalledDueToTarget_Processor
    1 StalledDueToTarget_MdbAvailability
    2 StalledDueToTarget_DiskLatency
    1 CreatingFolderHierarchy

And now numbers

((Get-MailboxImportRequest|Get-MailboxImportRequestStatistics).BytesTransferredPerMinute|%{$_.tostring().split('(')[1].split(' ')[0].replace(',','')}|measure -sum).sum / 1GB
1,41345038358122

That’s 1,4GB per minute. That’s like… a hundred times faster. I checked it at a random point when import had been running for a while when some smaller PSTs were already complete. Keep in mind that large PSTs run relatively slower and may still take a while to complete. When processing last and largest PSTs, throughput slowed to ~0,3GB/m but that’s still a lot faster than GUI. Throughput scales with number of parallel batches so probably more jobs would probably result in even better throughput.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.