Bealers.com Geeking Out Since 1998

Unable to programmatically download a Onenote file from Skydrive

U

ImageThis is not a how-to it is more of a how-do-i? The thinking is that if I distill the problem into words I may figure out where I’m going wrong.

Also, there’s always the outside chance that someone else might have the answer and put me out of my misery.

I’ll map out the problem in detail below with some background for context. If you want to cut to the chase then the summary of the problem is here.

The problem

For a side project I’m working on I am looking to build an online service that connects with a user’s Onenote file on Skydrive, parses it, and then displays of a subset of the information on that user’s mobile device.

I’m not trying to write a new Onenote client, there’s no point. Instead I’m looking to add-value to the Onenote user’s experience for a specific type of content.

I initially figured I’d play with desktop integration. I know there are COM libraries available that enable one to interact with Onenote documents and I have a copy of visual studio so with useful tutorials like this and zero experience of coding in C# (or any MS programming languages for that matter) I have managed to get a fair way towards pulling some information out of a notebook. However, pretty soon I realised that this is of no use to me. I’m interacting via the desktop application but I need to make this work in a multi-user environment so I obviously need to go server-side.

Running the Onenote client application on a server sounds seems like a pretty bad idea but for completeness it is worth some brief investigation. Yes, you can get Onenote to connect to multiple notebooks belonging to different users via the desktop application, I’ve proved this. But performance issues notwithstanding (imagine 300 concurrent user notebooks open in one application!) what about licencing? Each user connecting would, I imagine, need to have a client licence? Anyway, this is all moot. Do a few minutes of searching online and you’ll find out that it’s officially A Bad Idea. Microsoft themselves say:

Microsoft does not currently recommend, and does not support, Automation of Microsoft Office applications from any unattended, non-interactive client application or component (including ASP, ASP.NET, DCOM, and NT Services), because Office may exhibit unstable behavior and/or deadlock when Office is run in this environment.

(Thankfully) Onenote 2013 server-side is a dead end, clearly I need to go lower level.

Luckily Microsoft have excellent documentation on all their file formats and over the few months I’ve been poking at this I already know that Onenote files follow the MS-Onestore binary format and I have had some success cracking it open and reading the file headers. I also know that there’s a Skydrive REST API, so the assumption I had is that it would be simple enough to download the file then start do the actual hard work of implementing the MS-Onestore specification.

Unfortunately this is where we hit our first problem.

Problem 1: it is not possible to download documents of type notebook (Onenote files) from Skydrive via the API.

This sentence actually hides a lot of my trial and error including:

  • Using WEBDAV. This looked promising for pushing & listing files, but for GET it’s pretty much standard HTTP which doesn’t work, you just get the short cut file (see below)
  • Using the excellent python Skydrive API libraries
  • Trying the Live REST API myself from first principles just in case the library had missed something

To be clear, there is a file and you can download it but it’s just an HTML file linking to the web client. This matches what is experienced with the Windows Skydrive client. The Onenote document is just a short cut when Skydrive folders are synced to the local file system.

Digging through the docs this is alluded to in a few places, for example here in the Live SDK’s documentation on the file object:

source: The URL to use to download the file from SkyDrive.

Warning  This value is not persistent. Use it immediately after making the request, and avoid caching.
Note  This structure is not available if the file is an Office OneNote notebook.
Dav vs. local Skydrive, note that the Skydrive version is an HTML file whilst with DAV, it's a folder and I have individual .one files.
Dav vs. local Skydrive, note that the Skydrive version is an HTML file whilst with DAV, it’s a folder and I have individual .one files.

I’m pretty sure this inability to download directly is due to how Onenote handles syncronisation (Important note: if you use Onenote on multiple devices then you really should read the article in that link), the HTML document is there to give the user something to click on when synchronising, with the client (or viewing via Web) but it just a small HTML file containing lots of javascript redirecting the user to the web client.

Incidentally Webdav is the only way I’ve been able to grab the Onenote files at all from Skydrive. As mentioned above, normally all one sees is the HTML shortcut but with a Webdav connection in Windows explorer I was able see folders and within these, individual .one files just like one sees when creating a local notebook.

There is one other option to download the file and that is the MS-FSSHTTP File Synchronization via SOAP over HTTP Protocol.

I already know that if I want my vapourware application to work as I intend that I’ll have to implement MS-FSSHTTP/B/D to synchronise any changes. This is rather daunting as it’s another thick specification to go through on top of MS-ONSTORE (and ultimately MS-ONE for the high-level XML) but initially I had figured I could start on read-only then scale up to read/write, maybe I’m wrong and we have to begin with MS-FSSHTTP.

As I started to test that option out; download using the sync protocol, I came across the second part of our problem, namely:

Problem 2: the MS-FSSHTTP sync specification doesn’t cover authentication, and there isn’t anywhere that else that does for Onenote/Skydrive interaction.

From the specification document:

The protocol assumes that authentication has been performed by the underlying protocols. Authorization is dependent on the storage mechanisms of the protocol server and is not defined by this protocol.

I’ve been stalled here for a while. Going around in circles, sending the Skydrive API access token to likely looking endpoints, trying different authentication headers etc but being basically in the same place: any HTTP request either gives a 304 (moved, because the protocol server wants to go through some authentication workflow) or 404, 400 and other standard http error codes.

The final thing to note is that I recently discovered the MS-OCPROTO specification. This is a helpful overview of Office client protocols and it discusses many things including Onenote synchronisation as well as some different authentication mechanisms.

Following the workflow mentioned in there I can now see that the Office client application (i.e. OneNote itself) makes an HTTP OPTIONS request to the server to see what it supports. My previous low-level investigations of Skydrive allowed me to understand that the files are accessed via storage.live.com, so a few lines of Python later we can confirm that this server does support MS-FSSHTTP:

import requests

r = requests.options("https://storage.live.com")

print r.headers

Which gives us:

{'x-msfsshttp': '1.2', 'content-length': '0', 'accept-ranges': 'none', 'x-msnserver': 'BY2____4011517', 'ms-author-via': 'DAV', 'public': 'OPTIONS, GET, HEAD, DELETE, PUT, POST, MKCOL, PROPFIND, PROPPATCH, LOCK, UNLOCK', 'ms-storage': '1', 'dav': '1, 2', 'allow': 'OPTIONS, GET, HEAD, DELETE, PUT, POST, MKCOL, PROPFIND, PROPPATCH, LOCK, UNLOCK', 'date': 'Sun, 29 Sep 2013 08:39:22 GMT', 'p3p': 'CP="BUS CUR CONo FIN IVDo ONL OUR PHY SAMo TELo"', 'access-control-allow-origin': 'https://skydrive.live.com'}

Awesome, we’ve at least confirmed something. The problem is that during my weeks of hacking I’ve never been able to authenticate against storage.live.com so any attempts to start synchronizing fail. When trying to access resources the authentication related headers we get from storage.live.com don’t really match up with the authentication workflows mentioned in MS-OCPROTO. I get the 304 ‘moved’ response and a link to follow that turns out to be a Live login workflow.

What I really need is a low-down on exactly how a Onenote client is supposed to:

  1. Initially grab the file, is this WEBDAV? or does it – as I suspect – use MS-FSSHTTP to initially get the file?
  2. Authenticate. What endpoint, is it storage.live.com? If so what auth protocol am I following. It’s trivial to get a Skydrive auth token, but whenever (and however) I pass this over to storage.live.com I get the 304 and a login link to follow.

So, to summarise:

I get that physical downloading of Onenote files via the Skydrive API is (probably) not possible.

Furthermore, I understand that the Onenote desktop client uses MS-FSSHTTPMS-FSSHTTPB for syncronisation.

Following the guidance in MS-OCPROTO I have been able to see that storage.live.com announces its support for ms-fsshttp protocol. My problem is that I can’t figure out how to authenticate against that server. The returned “www-authenticate: WLID1.0 ….” header is not mentioned anywhere except in out of date documentation relating to earlier version of the Live API. Referring back to MS-OCPROTO the authentication types listed do not match up with the headers I’m being given.

If you are reading this and you think you have a solution. If you work for Microsoft or even better are one of the Onenote dev team I’d be very interested in getting a solution to my problem. Can you help or do you know somebody who can?

Writing this all down has been a great cathartic exercise after weeks of chipping away at the problem whenever I’ve had a spare hour. My take away from it is:

  • I should next try to get my download client to act as if it’s a web browser. Passing username/password as described in MS-PASS, but this feels so very, very  wrong.  I already have a nice and neat OAuth-like access token from my interaction with the Skydrive API, I want to use that, or some similar mechanism, not a username & password each time
  • Related to the last point, look again at WEBDAV. My earlier attempts gave me the HTML file, I should inspect the headers on d.docs.live.net and see what I’m given, maybe I can use the API access token (though I’m pretty sure I’ve already tried this). Some thing with the way that Windows Explorer accesses that remote location means that it’s getting the source files, maybe I can replicate this.

If I get anywhere further I’ll post up my results here.

About the author

bealers

Hi, I'm Bealers and I use this blog to share some of the things I learn whilst I'm on my journey. If you're new to the site then you might want to start here. The best place to find me is on Instagram.

By bealers
Bealers.com Geeking Out Since 1998

Stay up to date

Occasionally I send out an email to members of my mailing list, if you'd like to be included then please enter your details below.