The case of the fractured firewall: Outlook for Mac causes DoS attacks

Preface: A big thank you to Patrick Fergus for his efforts and this article! He worked diligently with Microsoft support for several months to troubleshoot an issue that was effectively causing an internal Denial of Service (DoS) attack against his company’s firewall. The culprit? Outlook for Mac. And while working with Patrick, Microsoft discovered a second major issue with excessive open files caused by the licensing system in Office for Mac. His work contributed to the recently released Microsoft Office for Mac 2011 14.4.1 Update. Enterprise administrators will find his story very interesting.

William Smith
OfficeforMacHelp.com

Crashing firewalls

In late August 2013 our network administrator sent the following email:

“Over the past few days the firewalls used for outbound Internet access have been crashing frequently.  Yesterday I tracked the cause to Macs generating thousands of connections containing no data to the off-premise Hosted Exchange.  I dug through yesterday’s logs and identified the IP addresses and machine names listed below.”

The network administrator’s original request included a list of seven Macs. Not a huge concern—we were in the middle of a transition from Office 2008 to Office 2011 and from on-premise Exchange 2010 to off-premise Office 365 hosted Exchange and were still trying to discover the new normal.

I wasn’t too worried about the request until I reached the first user’s desk. The user wasn’t there—hadn’t been in for a week—and the Mac had been untouched during that time.  I logged in and checked Outlook 2011 for any unusual activity. Outlook 2011 was prompting whether it should be allowed to be redirected a new Exchange server (due to DNS SRV-based AutoDiscover), which we knew was part of our new normal with the off-premise migration. I enabled logging in Outlook 2011, waited a few minutes, quit Outlook 2011, and collected the troubleshooting log.

The log included the following lines repeated thousands of times:

2013-08-26 10:09:35.822,0xFFFFFFFF,Outlook Exchange Web Services,Verbose,"EWS: Request data on thread=0x3173694, XML data=
<?xml version=""1.0"" encoding=""UTF-8""?><soap:Envelope xmlns:soap=""http://schemas.xmlsoap.org/soap/envelope/"" xmlns:t=""http://schemas.microsoft.com/exchange/services/2006/types"" xmlns:m=""http://schemas.microsoft.com/exchange/services/2006/messages""><soap:Header><t:RequestServerVersion Version=""Exchange2007_SP1"" /><t:MailboxCulture>en-US</t:MailboxCulture><t:TimeZoneContext><t:TimeZoneDefinition Id=""Central Standard Time""/></t:TimeZoneContext></soap:Header><soap:Body><GetEvents xmlns=""http://schemas.microsoft.com/exchange/services/2006/messages""><SubscriptionId>JgBieTJwcjA2bWIwNzYubmFtcHJkMDYucHJvZC5vdXRsb29rLmNvbRAAAAB0muSH/ckZTI29cX3wO/AbMZQWchxw0Ag=</SubscriptionId><Watermark>AQAAAMfRLmOmyVpBrktQAbvkhYB0WDoDAAAAAAE=</Watermark></GetEvents>"
2013-08-26 10:09:35.829,0xFFFFFFFF,Outlook Exchange Web Services,Info,"EWS: Sending request on connection=0x30086ba8, URL=/EWS/Exchange.asmx, SoapAction=""http://schemas.microsoft.com/exchange/services/2006/messages/GetEvents"""

It turned out this wouldn’t be the only report the network administrator would relay regarding this issue. Over the next couple of days between two and ten Macs per day would have Outlook 2011 open tens of thousands of connections with the outbound Internet firewall. The firewall wasn’t set up to handle an internal denial-of-service attack and would crash, adversely affecting the company’s Internet connection.

To enlist Microsoft Support’s help with this issue I submitted a Microsoft Premier Online Support ticket titled “Outlook 2011 Creating DoS Attack on Corporate Firewall” (1). With the assistance of the network administrator I mentioned there were “about 60,000 [firewall connections] per incident”—connections the firewall logged as open in the SYN state, but not being closed in a timely manner. Per the network administrator, “the Macs are generating thousands of unanswered connections…[and] the large volume of connections is causing our outbound Internet firewalls to crash.”

Hits per hour

One Mac running Outlook 2011 opened 4,000-12,000 connections with the firewall every hour in a 4-hour period.

Proving the problem to Microsoft

In response to our Premier Support ticket, Microsoft requested a variety of logs including Outlook 2011 Troubleshooting, firewall, tcpdump, and Fiddler (2).  There were questions regarding our outbound Internet firewall, firewalls on the local Mac, and antivirus software setup.  The most important request was to obtain logs before the issues between Outlook 2011 and the firewall started (more on this below).

We also began keeping track of the problem Macs in hopes of finding a common thread. Factors such as version of OS X, off-premise migration date, Office 2011 installation date, and user account were ruled out as single causes of our issues with Outlook 2011.  The common factors that did come up seemed to include:

  • The user (but only sometimes):
    After keeping track of three weeks worth of Macs affecting the firewall (a total of 66), only a handful were repeat offenders.  However, over a few months of testing that same handful would tax the firewall multiple times.
  • The Mac’s uptime (and specifically Outlook 2011’s uptime):
    Practically every Mac reported to be attacking the firewall had a user logged into OS X for at least a week.  However some Macs involved had users active from 8 a.m. to 5 p.m. daily, while other Macs had users who logged into their Mac and then didn’t touch it for a week.

In early September 2013 we received confirmation from Microsoft that Outlook 2011 was:

  • Making an Exchange request
  • Hearing back from Exchange (i.e. normally)
  • Eventually making an Exchange request and not hearing back from Exchange
  • Repeatedly making the same Exchange request that hadn’t received a response
  • Keeping all the network connections for those repeated requests open until Outlook 2011 successfully received a response from Exchange

But the reason why Outlook 2011 would exhibit this issue was unknown.

With the lack of a solid common thread between all the problematic Macs, we didn’t know the next Mac that would exhibit issues with Outlook 2011.  Microsoft Support understandably wanted logs from before the issue occurred, but knowing which of our over 1,000 Macs to enable logging to obtain the “before” log would be very difficult.  Enabling multiple logs on every Mac in the hope of finding the one Mac that would give us the perfect log would be practically impossible.  We set up testing Macs with all logs enabled in the hope we might capture the one true log.  However every attempt to try to capture these logs took between seven and ten days, adding more days spent mitigating the issue.

We took as many mitigation steps as we could brainstorm. We discovered when Outlook 2011 was attacking the firewall, the user was unaware except in the rare occasions where their mail stopped flowing.  The only people who knew there was an issue with Outlook 2011 on a particular Mac were in IT, and thus any mitigation steps had to start with IT.  Our mitigation strategies included:

  • Setting a limit of 1,000 simultaneous connections to the firewall per IP address
  • Creating an automated notification alerting our Help Desk to request users quit Outlook 2011 when their Macs attempted more than 1,000 connections to the firewall in an hour
  • Upgrading the firewall hardware
  • Attempting to write an AppleScript to set Outlook 2011 to enable and disable “Work Offline” nightly (however we determined this did not help the firewall issue)
Firewall connections for Thanksgiving weekend

Firewall connections for Thanksgiving weekend (when IT staff was not contacting users to quit Outlook 2011). Note the dark green “OTHER” at the bottom of the graph—these are the firewall connections for the entire rest of the office.

A Glimmer of hope

This mitigation effort continued through late November when we received a pair of testing builds of Outlook 2011 from Microsoft Support. One testing build contained enhanced Outlook 2011 logging, while the other testing build contained enhanced logging and a proposed fix for our firewall issue. Microsoft support requested we:

  • Log with tcpdump constantly
  • Log with Outlook 2011’s “Turn on logging for troubleshooting” constantly
  • Run lsof against the Outlook 2011 pid when overwhelming firewall connections were observed

I set up one Mac for each of the Outlook 2011 builds and configured new, empty Office 365 Hosted Exchange accounts on each. Since Outlook 2011’s firewall issue could occur at any time and could disappear by the time I reached the Mac, I wrote a short script to record the following every 60 seconds:

  • Date and time
  • Size of the Outlook 2011 troubleshooting log (since I knew the troubleshooting log grew significantly when the firewall issue was occurring I could compare the change of the log size to help find excessive connections to the firewall)
  • lsof against the Outlook 2011 pid

I started both testing builds and crossed my fingers.  Five days later the test Mac with the enhanced logging build of Outlook 2011 began to attack the firewall.  I wrote the following email to Microsoft support:

We may have a glimmer of hope.

Checking the log sizes, the enhanced logging build’s log is at 49MB while the proposed fix build’s log is at 31MB.  Checking through my log of Outlook log sizes, I discovered an example of the log size of the enhanced logging build increasing 2MB over the space of two minutes at Fri Nov 29 12:14 a.m. (i.e. overnight).  Looking at the logs (truncated for clarity):

####

2013-11-29 00:14:10.719,0xFFFFFFFF,Outlook Exchange Web Services,Info,”EWS: Response data received on thread=0x1a2e5644, XML data=…<m:SyncFolderHierarchyResponseMessage ResponseClass=””Error””><m:MessageText>The mailbox database is temporarily unavailable.</m:MessageText><m:ResponseCode>ErrorMailboxStoreUnavailable</m:ResponseCode><m:DescriptiveLinkKey>0</m:DescriptiveLinkKey><m:SyncState/><m:IncludesLastFolderInRange>true</m:IncludesLastFolderInRange></m:SyncFolderHierarchyResponseMessage></m:ResponseMessages></m:SyncFolderHierarchyResponse></s:Body></s:Envelope>”

2013-11-29 00:14:10.719,0xFFFFFFFF,Outlook Calendar,Error,Engine[Exchange folder ]: Error during sync:  Error code: -19798 ErrorStr: An unknown error has occurred in Outlook.

####

Searching for the string “The mailbox database is temporarily unavailable” in the proposed fix build’s logs yields the following (again truncated for clarity):

####

2013-11-28 00:12:41.088,0xFFFFFFFF,Outlook Exchange Web Services,Info,”EWS: Response data received on thread=0x14a7b2a4, XML data=…<m:ResponseMessages><m:SubscribeResponseMessage ResponseClass=””Error””><m:MessageText>The mailbox database is temporarily unavailable.</m:MessageText><m:ResponseCode>ErrorMailboxStoreUnavailable</m:ResponseCode><m:DescriptiveLinkKey>0</m:DescriptiveLinkKey></m:SubscribeResponseMessage></m:ResponseMessages></m:SubscribeResponse></s:Body></s:Envelope>”

2013-11-28 00:12:41.088,0xFFFFFFFF,Outlook Exchange Web Services,Warning,CExchangeNotificationMgr::CheckSubscription exception ErrorCode:-19798 Error:

2013-11-28 00:12:41.088,0xFFFFFFFF,Outlook Exchange Web Services,Warning,Sync thread 1-PO – ExchangeNotificationMgr::IncreaseBackoff=2

####

Counting strings in the Outlook logs:

“The mailbox database is temporarily unavailable”
 At least 2390 times for the enhanced logging build
At least 51 times for the proposed fix build

“An unknown error has occurred in Outlook”
At least 1800 times for the enhanced logging build
At least 35 times for the proposed fix build

“IncreaseBackoff”
Zero times for the enhanced logging build
16 times for the proposed fix build

I’m guessing “IncreaseBackoff” is the proposed fix in action?

It appeared Microsoft had added a governor to Outlook 2011’s Exchange Web Services engine that would cause Outlook 2011 to slow down its connection rate when it discovered it was having issues talking to Exchange. I continued to collect logs and provide them to Microsoft Support over the next few days until I had to set the Outlook 2011 enhanced logging build to “Work Offline “due to the strain Outlook 2011 was placing on the firewall.

Size of Outlook log over time

This graph shows the growth of the troubleshooting log for the enhanced logging build of Outlook 2011. Later in the morning of December 9, the log reached 2.1GB before I set Outlook 2011 to “Work Offline”.

There are too many files open

In mid-December 2013 we received an email from Microsoft Support that mentioned there seemed to be an additional issue (beyond our reported issue between Outlook 2011 and the firewall) where both the enhanced logging and proposed fix builds were running out of “file descriptors” (in this case open sockets for inter-process communication between licensing pieces of Office 2011). Microsoft Support seemed surprised to see this and very interested to track down the cause. I returned to the lsof logs and counted the number of lines for each file:

Outlook 2011 lsof open files over time

As recorded by lsof, the enhanced logging build of Outlook 2011 opened more and more files over time although it was configured with an idle Exchange account.

This was very unusual.  An idle copy of Outlook 2011 had about 2,400 open files after two weeks and nearly 9,000 files open after running for about two months.  This appeared to be related to another Outlook 2011 issue we had encountered where some users would randomly see the error message “There are too many files open.” But Outlook 2011 would continue to work normally:

There are too many files open

Users would randomly see an error that never seemed to break anything.

Microsoft informed us the issue with file descriptors affected Outlook 2011’s communication with Office’s licensing system.  On a hunch I checked all the Office applications on 96 in-the-wild Macs.  I removed Macs that didn’t have any Office applications open (leaving 74 of the original 96) and then counted the number of excess open file descriptors:

Excess open Office files on in-the-wild Macs

All Office 2011 applications were opening an excessive number of files. Note the rightmost “Outlook” column is truncated and should actually represent 3,262 open files.

While certainly not the original issue we reported, applications opening significant numbers of unnecessary open files could exhibit difficult-to-diagnose issues.

Service Pack 4

Microsoft subsequently provided us with an Office 2011 build with their proposed fixes for the Outlook 2011 DoS and excess open files issues.  We’ve been able to verify both of these fixes appear to function as expected and should be part of the public Office 2011 14.4.1 release.

Thanks to:

  • My management and fellow IT team members for assistance and support in obtaining necessary information
  • Microsoft Support for the assistance and tenacity in getting this fixed.

Additional notes

(1) When submitting tickets to Microsoft Premier Support for Office 2011 issues—Office 2011 tickets are best submitted via the Premier website. Calling in for support can result in being bounced between support departments, whereas submitting via the website will likely result in a call back from the Office 2011 support group.

(2) All communication between hosted Exchange and Outlook 2011 is encrypted.  Microsoft support requested we use Fiddler as an HTTPS (and HTTP) proxy, whose self-signed certificate allows it to intercept, decode, and log HTTPS traffic. Unfortunately setting up an HTTP and HTTPS proxy and accepting the untrusted HTTPS proxy certificate disturbed Outlook 2011’s connection to hosted Exchange enough to cause the DoS issue to go away.

Patrick Fergus is a Mac support technician for a Midwestern publishing company and can be found on Twitter @foigus.
Share this:
  • Twitter
  • Facebook
  • del.icio.us
  • Digg
  • StumbleUpon
  • Google Bookmarks
  • PDF
  • Print

1 comment to The case of the fractured firewall: Outlook for Mac causes DoS attacks

  • Manny

    Fascinating article! Thank you for writing and posting it.

    “Unfortunately setting up an HTTP and HTTPS proxy and accepting the untrusted HTTPS proxy certificate disturbed Outlook 2011’s connection to hosted Exchange enough to cause the DoS issue to go away.”

    This is a significant clue that the real, underlying issue wasn’t addressed: *Something* in the networking environment was causing the EWS connections to fail.
    The “governor” (IncreaseBackoff) that Microsoft added isn’t fixing the issue, so much as working around the underlying network failures.
    The “Too many open files” issue is indeed a genuine fix, although here the real fix isn’t so much to rely on the good graces of application developers to write good networking code. Rather, the kernel developers at Apple should ensure that the networking code in the OSX kernel handles resource starvation better.