Cornelia's Weblog

my sporadically shared thoughts on, well, whatever is capturing my attention at the moment.

Posts Tagged ‘PUT’

More on PUT Idempotency

This has been a hot topic in my world this week. I have a couple of colleagues with whom I have been having very good discussions (in the context of two totally separate projects) and I wanted to capture one particularly interesting thread here.

One of those conversations was started with a reference to a really good blog post from Alex Scordellis addressing the question of how complete representations sent in a PUT need to be. What exactly does the client need to include in the representation that they PUT – all properties? what about hyperlinks? A simple answer might be that they need to provide “everything”, but this simple answer is just not satisfactory.

When a client provides a representation as a part of a PUT (or a POST used for creation), the server may ignore some of that representation. The simplest case is where a client, intentionally or accidentally, provides a new value for a non-writable property – say something like a server generated ID for the resource. Further, as Alex pointed out in his post, the client shouldn’t be responsible for determining the application flow, which per HATEOAS, is provided via hyperlinks in the resource representation; even if the client provides them, the server will likely ignore such links. So, if the server will ignore some of these things anyway, why not allow the client to not provide certain pieces? Makes sense to me. But how do we do this “right”?

Alex’s post summarized some conclusions that he and others reached during a healthy debate/discussion on the subject. I really like what he said there:

In response to GET requests, services serve complete representation of the current known state, including business data and available hypermedia controls. Clients PUT complete representations of the parts for which they are responsible.

He called this a “rule of thumb” and I think it is completely correct, even if it leaves plenty of room for misinterpretation. This was the essence of the conversation I had with a colleague this week.

First, there is something in this rule of thumb that is easy to miss but absolutely critical; the word “complete” in the second sentence. If the client were allowed, in a PUT, to provide only a subset of the parts for which they are responsible then the problem reduces back to the general problem with using PUT for partial updates. That type of a PUT implementation is not idempotent.

But we have to go just a bit further. This only works if every client is responsible for the same parts. You cannot, for example, have one client responsible for only property A and another client responsible for only property B and still have an idempotent PUT (you’d have this example all over again).

There are other edge cases as well. What if the resource state were effected by another resource operation? We know that we can have two resources share parts (or all) of their state; for example, the current version of a document might share the same state as version 3 of a document, specifically, when version 3 is the current version. In this case the set of properties a client is responsible for on a current document cannot be different from the ones the client is responsible for when accessing a particular version of a document. And if you have resources that are what I like to call “composite resources” then you have to watch out for the same issues, but then it starts to get very complex.

There are some choices, however, you can make to simply things a bit. For example, you can choose to make the client responsible for all properties except write-once properties, hyperlinks, and server computed properties (i.e. hash value). In other words, as you are deciding what to make the client responsible for, KISS.

I recognize, of course, that Alex’s post isn’t a specification and hence I do not offer my remarks here as a criticism of his post, rather, because I’ve already seen various ways of interpreting that post in action, I offer this only as an elaboration. The key is that whatever you do with PUT, make sure that it is idempotent. If you get very clever on how you define your service you will have to be equally careful to make sure you respect that constraint.

Why PUT can’t be used for partial updates

I’ve been discussing PUT vs. PATCH with some colleagues and finally took the time to come up with a concrete example of why PUT absolutely should not be used for partial updates of resources. One colleague pointed out the following points made by Roy Fielding on the AtomPub listserv:

FWIW, PUT does not mean store. I must have repeated that a million times in webdav and related lists. HTTP defines the intended semantics of the communication — the expectations of each party. The protocol does not define how either side fulfills those expectations, and it makes damn sure it doesn’t prevent a server from having absolute authority over its own resources. Also, resources are known to change over time, so if a server accepts an invalid Atom entry via PUT one second and then immediately thereafter decides to change it to a valid entry for later GETs, life is grand.

Roy’s right, of course, but there is a subtlety that is easily overlooked. Sure, the server has the authority to “fix” the resource representation, for example, it might modify part of the resource representation passed in (i.e. if the client tries to modify the server controlled id property, the server can ignore that), but there is an important constraint that the server has to follow. PUT needs to be idempotent! This means the server can’t use the current state of the resource to “fix” the resource representation passed in. Have a look at the following simple example:

Let’s say that I have a resource that consists of a couple of properties, A and B, and I want to use PUT to support partial updates of this resource. The payload of the PUT will carry the properties that we wish to update; if a resource property is not present in the payload then that property will remain unchanged in the resource. We’re playing the “the server has control over what they do with the resource property” card. Check out the following sequence of operations then:

Ouch! The PUT is not idempotent! Client 1 does the same PUT twice, leaving the resource in two different states – one of which could very well be inconsistent. PUT semantics, when correctly followed, give the client the ability to control the consistency of the resource they are writing.

Hence RFC 5789 – PATCH Method for HTTP. PATCH does not have the constraint of being idempotent so the client cannot simply retry requests for which they never received a response. I do think it is interesting that RFC 5789 includes the following:

A PATCH request can be issued in such a way as to be idempotent, which also helps prevent bad outcomes from collisions between two PATCH requests on the same resource in a similar time frame. Collisions from multiple PATCH requests may be more dangerous than PUT collisions because some patch formats need to operate from a known base-point or else they will corrupt the resource. Clients using this kind of patch application SHOULD use a conditional request such that the request will fail if the resource has been updated since the client last accessed the resource. For example, the client can use a strong ETag [RFC2616] in an If-Match header on the PATCH request.

That is, you could make PATCH idempotent if you use the likes of ETags and conditional invocations. The same could be said for a PUT that does partial updates but the opportune world here is “could” – PUT is required to be idempotent even without the use of things such as ETags.

All of that said, while neither RFC 2616, nor RFC 5789 require the use of ETag like mechanisms, in most cases it’s a good idea.