Cornelia's Weblog

my sporadically shared thoughts on, well, whatever is capturing my attention at the moment.

Category : REST

More on PUT Idempotency

This has been a hot topic in my world this week. I have a couple of colleagues with whom I have been having very good discussions (in the context of two totally separate projects) and I wanted to capture one particularly interesting thread here.

One of those conversations was started with a reference to a really good blog post from Alex Scordellis addressing the question of how complete representations sent in a PUT need to be. What exactly does the client need to include in the representation that they PUT – all properties? what about hyperlinks? A simple answer might be that they need to provide “everything”, but this simple answer is just not satisfactory.

When a client provides a representation as a part of a PUT (or a POST used for creation), the server may ignore some of that representation. The simplest case is where a client, intentionally or accidentally, provides a new value for a non-writable property – say something like a server generated ID for the resource. Further, as Alex pointed out in his post, the client shouldn’t be responsible for determining the application flow, which per HATEOAS, is provided via hyperlinks in the resource representation; even if the client provides them, the server will likely ignore such links. So, if the server will ignore some of these things anyway, why not allow the client to not provide certain pieces? Makes sense to me. But how do we do this “right”?

Alex’s post summarized some conclusions that he and others reached during a healthy debate/discussion on the subject. I really like what he said there:

In response to GET requests, services serve complete representation of the current known state, including business data and available hypermedia controls. Clients PUT complete representations of the parts for which they are responsible.

He called this a “rule of thumb” and I think it is completely correct, even if it leaves plenty of room for misinterpretation. This was the essence of the conversation I had with a colleague this week.

First, there is something in this rule of thumb that is easy to miss but absolutely critical; the word “complete” in the second sentence. If the client were allowed, in a PUT, to provide only a subset of the parts for which they are responsible then the problem reduces back to the general problem with using PUT for partial updates. That type of a PUT implementation is not idempotent.

But we have to go just a bit further. This only works if every client is responsible for the same parts. You cannot, for example, have one client responsible for only property A and another client responsible for only property B and still have an idempotent PUT (you’d have this example all over again).

There are other edge cases as well. What if the resource state were effected by another resource operation? We know that we can have two resources share parts (or all) of their state; for example, the current version of a document might share the same state as version 3 of a document, specifically, when version 3 is the current version. In this case the set of properties a client is responsible for on a current document cannot be different from the ones the client is responsible for when accessing a particular version of a document. And if you have resources that are what I like to call “composite resources” then you have to watch out for the same issues, but then it starts to get very complex.

There are some choices, however, you can make to simply things a bit. For example, you can choose to make the client responsible for all properties except write-once properties, hyperlinks, and server computed properties (i.e. hash value). In other words, as you are deciding what to make the client responsible for, KISS.

I recognize, of course, that Alex’s post isn’t a specification and hence I do not offer my remarks here as a criticism of his post, rather, because I’ve already seen various ways of interpreting that post in action, I offer this only as an elaboration. The key is that whatever you do with PUT, make sure that it is idempotent. If you get very clever on how you define your service you will have to be equally careful to make sure you respect that constraint.

CMIS and Atom/AtomPub

As has been widely blogged on by Craig Randall, Chuck Hollis, Andrew Chapman, John Newton, Ethan Gur-esh, David Nuescheler and many analysts including IDC, Burton Group, CMS Watch, 451 Group and more, EMC, IBM and Microsoft today announced that they have worked together to create a draft specification for a web services based (and I use the term “web services” more generically than as SOAP-based services) standard for content management – Content Management Interoperability Services (CMIS). Other than to point out a couple of highlights, I won’t repeat what many others have already said – but those highlights that I must repeat are:

  • This is a programming language agnostic interface. No matter what language you are using to implement your client, chances are there is adequate support for the things that are needed to invoke CMIS services (uh, HTTP – ubiquitous).
  • The spec defines a domain model and services in the abstract (i.e. not mapped to any concreate binding – we agreed on the core semantics first) as well as both SOAP-based and REST-based bindings.
  • Wow – EMC, IBM and Microsoft all agreed on the contents of this draft!! (and there are not lots and lots of optional features specified which would make assessing compliance against the standard a nightmare and the spec much less valuable)
  • The spec was designed to be layered over existing repositories – that is, no reengineering of the repository implementations is required. This presents the real possibility that interoperability can be achieved over the repositories in existence today, not just the repositories of tomorrow.

But the thing I want so say a bit more about is that the REST binding is specified as an extension to Atom and AtomPub. You may notice that more recent posts on my, as of late, slightly less neglected blog touch upon the Atom technologies – my interest in Atom is not just stemming from CMIS, I see Atom’s applicability to many other use cases beyond content management. I would say that the relative simplicity, core usefulness and significant uptake of Atom greatly influenced the choice to create the RESTful CMIS binding as an extension of Atom. There is enough in that CMIS binding to generate dozens of interesting dialogs, let me just touch upon a couple of things to start.First, Atom applicability for content management is a natural. When we started to look at generating bindings for the abstract CMIS model, it was immediately apparent that it was very easy to create Atom Format representations for the core CMIS objects; also, many of the CMIS services deal with sets (I’m intentionally avoiding the term “collection” here because particularly in the context of Atom discussions that term is already heavily overloaded) of these objects. Yeah, we are talking about things that are easily represented as entries and feeds. And from a client perspective, the types of things that we want to do with our corresponding entry and feed representations are similar to what standard Atom clients already do – show the lists of objects and expose some of the attributes for each.The CMIS domain model has a bit more complexity, for example, the notion of hierarchy. Folders (one of the core CMIS object types) can contain other folders as well as documents (another CMIS object type). There are lots of different ways that hierarchies can be represented of course, a flat list with pointers to ids/URIs/keys, etc. What the current CMIS draft does is include children of a folder (folder is represented as an Atom entry) as nested entries. The simple and powerful notion of foreign markup allows for this and there are a number of other ways that CMIS takes advantage of it. The Atom community has talked about nested collections before – CMIS offers an opportunity for a renewed dialog on that subject. Is it proof that entries needn’t be nested or is it a catalyst for inclusion? (In order to keep this initial post a bit on the less-long side I’ll address some of the other foreign markup that CMIS defines in future posts).So on to AtomPub – this is where things get really interesting. You’ll notice that the REST binding starts off by defining the resource model for CMIS. It defines the folder, document, relationship and policy resources as well as many collections including children (of a particular folder), descendants (of a particular folder – this is where the hierarchy I talked about above comes in), checked out documents (ooh, now things are getting interesting), as so on. (Okay, so you’ll notice that I use the term “collections” here – I’ll admit that not all of what we call collections in CMIS follow the rules that I am being a stickler about here. It’s a draft – we’re still working on it.) Then, as good disciples of Richardson and Ruby we define which of the basic HTTP operations are supported against each. It’s pretty straight forward for many of the resources – GET on a document returns the metadata for that document (an ), GET on the document media URL gets the document contents, GET on a the folder children resource returns a containing an entry for each document or folder contained therein, … – you get the picture.So what about the very core content management service of checkout? It’s tempting to think about the document that we want to check out as the resource that we want to manipulate – but then what operation do we apply? It surely ain’t GET or DELETE. PUT is kinda tempting – maybe I can PUT a representation that includes an attribute – true? If someone is really interested, I can dedicate a whole post to why this isn’t a good idea but the short of it is that it is generally not a good idea to model semantics such as these with a side effect to some state. So for now, believe me that this PUT approach is not good. What about POST? Well, POSTing is usually reserved for adding entries to collection resources and the document I want to check out isn’t a collection. So do we need a new verb – CHECKOUT? Hmmm, what was it that Richardson and Ruby warned us to do when we were tempted to create new verbs? And what was that we were just saying about POST and collections? AH HA – that’s it!! That is where the collection of checked out documents comes in. To check out a document using the CMIS REST, AtomPub-based binding you issue a POST of the document to the checked out documents collection!! Now that is cool. Ah, but I also have to acknowledge that by being posted to that new collection changes the state of the document. Gotta think on that a bit.So as I reflect on what I’ve written so far I realize that I have once again gone quickly down into the technical weeds – sorry, can’t help myself, I find the weeds rather fun. But in the interest of closing out this post while it is still announcement day in some timezones I’ll save some of those technical details for future posts. My aims today really were to first, celebrate the milestone that has been reached with CMIS and also to pique interest in the particularly the RESTful binding. I’m very interested in feedback from members of the REST and Atom communities – oh, forgot to mention that as said in the press release, the draft spec will go to OASIS for ratification – watch for announcements from OASIS on when the initial meeting will be. Until the OASIS forums are established I look forward to discussions on existing mailing lists and blogs. (Note, for lack of having an automated solution for keeping spam out of my comments I currently moderate all of them – so any comments posted here won’t show up until I approve them. I’m not on vacation so there shouldn’t be too much of a delay).To paraphrase John Newton, congratulations to EMC, IBM and Microsoft for putting aside their differences for the benefit of customers and the industry as a whole!