Cornelia's Weblog

my sporadically shared thoughts on, well, whatever is capturing my attention at the moment.

Cleaning WordPress Malware

I was hacked. I cleaned it. I was hacked. I cleaned it. I was hacked…

All told, I think it was 4 times before I stopped the attacks. I don’t like to mess with IT-type issues, least of all security issues, so when something like this comes up, I don’t know what I don’t know, and it takes me some time to learn enough to deal with the issue completely. Before I describe what I did to finally fix things once and for all (knock on wood), let me describe my naivety, because I suspect I am not the only one who suffers from it.

First, I never worried too much about security. With three people reading my blog on a semi-regular basis (hi again Dad), I figured “who would want to hack my site?” Bots don’t care about readership. Second, I figured my password was secure; I had a pretty clever password, wasn’t even in English (and see aforementioned comment on readership). Bots are multi-lingual. When I cleaned the site the first time, I figured I’d be safe for a while – how concerned would the hacker be to reinfect my little old site again so quickly? Bots work 24×7. And there are lots of them. No, I didn’t really believe any of these false assumptions, you think about the issue for, oh, maybe a microsecond and it’s clear exactly how naive these assumptions are. So beware – if you have a site on the World Wide Web, you are a target for hackers.

The First Sign of Trouble

I first became aware of the hack from my Dad (see aforementioned comment on readership) when he said that Google had reported to him that my site was compromised. I now believe that my site had been compromised for far longer than I knew – it was only when Google checked it out and started reporting its findings that I had been alerted. I’m not sure exactly when Google launched this service but, based on my experience, I am betting it was within the last 6 months or so. After I cleaned my site the first time I used Google Web Master Tools to request a re-evaluation of my site – you must have a google account. I got a clean bill of health but alas, a couple of weeks later I was blacklisted again; sure enough, the infection was back. I’ll say more about ALL of the steps I took to finally rid the site of the infection “for good,” but first, kudos to Google – nice service those web master tools; easy to use, seemingly complete. Good stuff.

How I fixed it

Fortunately, in the end, the cleaning procedure is very straight-forward (disclaimer: This statement applies to the type of attack that I specifically had. I have no doubt that there are other hacks that are MUCH more difficult to clean up.) You don’t have to download any special software (you will need an ftp client, but you should have that anyway), or generate any long log files that are then posted to some forum where a wizard will then give you five other free software packages to download and run (those things always scare me away). You need to be able to edit files (really simple edits), change some file permissions, and set your password for the site. Oh, and one other comment: security by obscurity is never a good idea (see aforementioned comments on bots) – sites that suggest you remove all mention of “WordPress” from your pages are being naive, even after thinking about it for far longer than a microsecond.
Okay, so on to the steps:

  1. Edit the files. The type of infection I had was really simple – some of my .html files had a line or more of the following form inserted into them:
    <script src=”http://some.domain.com/somefilename.js”></script>
    I was lucky that in most cases the lines were at the top of the file, though in once instance the hack was a bit more clever and inserted the line more centrally located. BTW, there are lots and lots of examples on the web about more clever insertions that are more obfuscated, but if you are editing the files by hand, even those should be relatively easy to spot. So, I edited each of the infected files (see below on the list of infected files I had) and deleted these lines.
  2. Set file permissions. In a WordPress site, the .html and .php files typically have permissions that allow the owner and group read/write access and the world to have read only access. As a result of the hack, the file permissions for the compromised files allowed the world write access. My web site hosting company doesn’t provide any interface that allows me to view or change file permissions so I used Filezilla – you’ll need to set the file permissions numerically – you want to set the value to 644.
  3. Make sure you get all infected files. I had files infected both in my WordPress and in my phpMyAdmin directories. The full list of files I cleaned (from the root of my directory structure) is:
    • index.html
    • /blog/index.php
    • /blog/wp-admin/index.php
    • /blog/wp-admin/network/index.php
    • /blog/wp-admin/user/index.php
    • /blog/wp-content/index.php
    • /blog/wp-content/plugins/index.php
    • /blog/wp-content/themes/index.php
    • /blog/wp-content/themes/classic/index.php
    • /blog/wp-content/themes/constructor/index.php
    • /blog/wp-content/themes/constructor/home.php
    • /blog/wp-content/themes/default/index.php
    • /blog/wp-content/themes/pool/index.php
    • /phpMyAdmin/index.php
    • /phpMyAdmin/main.php

    Of course, your mileage may vary, particularly in the themes area – check all of the php files in the themes you have installed for files that have permissions other than -rw-rw-r–.

  4. Change your password. I confess (in the hopes that my ineptness will benefit someone else), the first few times I cleaned my site I didn’t do this. I believe that changing my password after the cleanup is what finally twarted further attacks. Change your password, and change it to something that can’t be found in a dictionary somewhere.

Balisage 2011

I made it to the Balisage conference this year. I’d been eyeing the conference for a couple of years, so I submitted a paper proposal that was accepted and spent the first week of August in Montreal. I confess, aside from a couple of morning runs in the Mount Royal Park, (which is just fantastic) I saw very little of Montreal, but what I did see this week was quite spectacular – the conference itself. Looking at the conference proceedings doesn’t give even a hint as to the uniqueness of this event, the pictures that Tommie Usdin, the conference chair, posted do show a bit of the peculiar (peculiar in a good way) side, but neither of those things bring it all together. This post is my feeble attempt at capturing some of the magic.

The conference is small – about 100 attendees this year – but with luminaries in the XML world like Michael Kay and Norm Walsh in attendance (and Jon Bosak on the advisory board) it’s clearly about quality over quantity. The very eclectic group of participants includes philosophers, computer scientists and librarians, opensource practitioners, folks that produce commercial products, consultants and standards body representatives. The talks were equally diverse ranging from theoretical (TagAl: A tag algebra for document markup), through very practical (Including XSLT stylesheets testing in continuous integration process) to even a game-playing session (Balisage Bluff – an Entertainment). It’s clear that many of the folks involved have known each other and worked together for a great many years, but even as a newbie, by the end of the week I felt a part of the group.

And here’s the punch line (I’ve never been good at holding it to the end) – I really get the sense that this small, quirky conference has significant influence on the shape of the XML landscape. There were several talks that presented work coming out of discussions that happened at last year’s Balisage. For example, Michael Kay did an impromptu session one evening showing usSaxon running in the browser, work that he says came as a direct result of Vojtech Toman’s paper last year on XProc running in the browser. As you might know, browser vendors have not moved to supporting XSLT 2.0 in the browser, severely limiting what developers can do there (XSLT 1.0 is, well, just that, a version 1 spec – v2 is so much more powerful). Saxon-CE (client edition) gets us out of the proverbial chicken-egg situation, giving developers a chance to demonstrate the value and perhaps it will lead to native browser support in the future. New stuff this year included a proposal for blending parts of XSLT and XQuery, a language and processing model for sequences of XML items and tag libraries for XSLT and XQuery – I really liked this paper as it provides a means for implementing custom “HTML tags” using XML-based languages instead of Java. And, because not every attempt is successful, there were new looks at old things like Eric van der Vlist’s paper on multi-ended hrefs, something with potential reach well beyond XML. There were two security related papers (and I actually got them), one showing the risks of XQuery injection and the other that shows promise in allowing encrypted XML to still be processed in certain ways – it was really cool! There were SO many ideas flowing, in a very open and collaborative style, that I think everyone left with renewed enthusiasm and a very full queue of things to explore moving forward.

I’ve been to lots of conferences, marketing conferences, developer conferences and earlier this year, the World Wide Web conference which is quite academic/research focused (and very large). And while I was just about to say I’ve never seen one anything quite like Balisage, come to think of it, the Linked Data on the Web workshop that was co-located with WWW 2011 had a bit of the feel that Balisage has – quirky group of participants, luminaries in the field, work presented that was influenced by previous workshops, lots of excitement. At the time of that workshop I suppose it didn’t make as much of an impression on me because I was an observer in that space rather than a participant. The WS-REST workshop, in which I was a participant, didn’t quite have that same magic, perhaps because it is relatively young. Hmm… have to think about that.

In any case, it was just a fantastic week. I have a ton of stuff I want to work on as a result, and I plan to be back next year to share and see what else has cooked in the mean time.

Paul Maritz’s Keynote At EMC World 2011

Of all of the keynotes and big stage presentations from EMC World 2011, my absolute favorite was the Tuesday morning session from VMWare CEO, Paul Maritz. I tried to tweet a few things but in the end decided that too much would be lost in 140 character snippets.

I remember seeing a VMWare presentation about three years ago, maybe four, probably at an EMC World but I don’t remember exactly. After the presentation, in which they presented a pretty weak view of the future of VMWare, I figured they had peaked. Microsoft was making a hypervisor now and it sure looked like VMWare’s bread and butter was being commoditized. It’s clear now that I couldn’t have been more wrong. VMWare is still on the top. They have great products, they have made some fantastic acquisitions (Spring Source!) and have a very strong vision for the future.

The overall theme of the session was about the transformations we are seeing across the entire information technology landscape. Paul talked about transformations happening at the infrastructure level, at the application level and at the user interface level. My aim with this post isn’t to summarize all that he presented, I couldn’t possibly hope to do justice to Paul’s outstanding talk, rather, I want to share with you a couple of insights that really caught my attention. Sound bites if you will.The first is the 1,000,000 foot view of the VMWare future: the hypervisor is, in fact, commoditized, and that is perfectly okay with VMWare. They give the hypervisor away for free (something that delights the likes of me when I want to use it for personal use) – so you see, it’s not their bread and butter anymore. What makes VMWare a viable business are the services that are put around virtualized resources. It’s the ability for virtualization to solve the restart problem for long running computation-intensive loads. It’s the ability to move around workloads, even while they are in-flight, that allows server consolidation. It’s the ability provide fault tolerance with a checkbox. It’s all in the services baby!

I’m not usually a process person, if you know me, you know I am a self-proclaimed “propeller-head,” but something Paul talked about resonated with me. He said that virtualization allows us to measure IT resources in a way that we couldn’t do before. Yeah, cloud-motherhood and apple pie. But the spin he put on it was that this measurability isn’t only used to assess the effectiveness of your IT department but, is also used to hold the IT customer accountable for what they are asking for. Have any of you ever asked IT for something and then never used it? Guilty. If we have to pay for it (which we can only do if it is measurable), then we will surely be more careful with our request, ultimately making the company as a whole function more effectively. Okay, so enough of me dabbling in business concerns. ;-)

Perhaps my favorite sound bite from the session was when Paul moved from talking about infrastructure to talking about applications. He was lamenting the design of modern programming languages, complaining, at first, that languages like Ruby are extraordinarily hard to compile and optimize. But then he acknowledged that with the level of power in the machinery we have today, that the time has come for the machines to do the heavy lifting, rather than the developer. Maybe, with these new programming languages we allow the developer to be more productive, or perhaps , we open up programming to a broader cross section of the population with a more accessible programming experience. Instead of giving developers a toolset so they can build things designed to execute efficiently on the computer, we give them a toolset designed to make them more efficient and let the computer do the hard work. Yeah! That time has come.

Finally, the last tidbit I want to share starts with Paul recalling that beginning in the 1970s with Xerox Parc, and continuing into the 80s and 90s at places like Microsoft, a fundamental focus was on automating the physical desktop. When they looked at the physical desktop it had things on it like documents, file folders and inboxes and this is why you see these abstractions in computer systems today; heck, the name of my former company, Documentum, makes that very clear. The insight Paul shared is that the tasks that individuals are doing today are decreasingly document-based and are increasingly stream-based. We take in streams, modify them, combine them with other streams and generate new streams. While that insight isn’t earth shattering (though it is, perhaps, a bit subtle), when I think about what we as an industry need to do to adapt to that transformation, well, it’s daunting. When I think about programming models, for example, document abstractions are deeply ingrained (XML D(ocument)OM, HTML DOM, etc.). I remember a time when explaining the folder metaphor to a non-computer user was hard. Decades of training and use have cemented those notions not only into the toolsets but also into our subconscious and now we need to change that. Bring it – I’m up for it!!

More on PUT Idempotency

This has been a hot topic in my world this week. I have a couple of colleagues with whom I have been having very good discussions (in the context of two totally separate projects) and I wanted to capture one particularly interesting thread here.

One of those conversations was started with a reference to a really good blog post from Alex Scordellis addressing the question of how complete representations sent in a PUT need to be. What exactly does the client need to include in the representation that they PUT – all properties? what about hyperlinks? A simple answer might be that they need to provide “everything”, but this simple answer is just not satisfactory.

When a client provides a representation as a part of a PUT (or a POST used for creation), the server may ignore some of that representation. The simplest case is where a client, intentionally or accidentally, provides a new value for a non-writable property – say something like a server generated ID for the resource. Further, as Alex pointed out in his post, the client shouldn’t be responsible for determining the application flow, which per HATEOAS, is provided via hyperlinks in the resource representation; even if the client provides them, the server will likely ignore such links. So, if the server will ignore some of these things anyway, why not allow the client to not provide certain pieces? Makes sense to me. But how do we do this “right”?

Alex’s post summarized some conclusions that he and others reached during a healthy debate/discussion on the subject. I really like what he said there:

In response to GET requests, services serve complete representation of the current known state, including business data and available hypermedia controls. Clients PUT complete representations of the parts for which they are responsible.

He called this a “rule of thumb” and I think it is completely correct, even if it leaves plenty of room for misinterpretation. This was the essence of the conversation I had with a colleague this week.

First, there is something in this rule of thumb that is easy to miss but absolutely critical; the word “complete” in the second sentence. If the client were allowed, in a PUT, to provide only a subset of the parts for which they are responsible then the problem reduces back to the general problem with using PUT for partial updates. That type of a PUT implementation is not idempotent.

But we have to go just a bit further. This only works if every client is responsible for the same parts. You cannot, for example, have one client responsible for only property A and another client responsible for only property B and still have an idempotent PUT (you’d have this example all over again).

There are other edge cases as well. What if the resource state were effected by another resource operation? We know that we can have two resources share parts (or all) of their state; for example, the current version of a document might share the same state as version 3 of a document, specifically, when version 3 is the current version. In this case the set of properties a client is responsible for on a current document cannot be different from the ones the client is responsible for when accessing a particular version of a document. And if you have resources that are what I like to call “composite resources” then you have to watch out for the same issues, but then it starts to get very complex.

There are some choices, however, you can make to simply things a bit. For example, you can choose to make the client responsible for all properties except write-once properties, hyperlinks, and server computed properties (i.e. hash value). In other words, as you are deciding what to make the client responsible for, KISS.

I recognize, of course, that Alex’s post isn’t a specification and hence I do not offer my remarks here as a criticism of his post, rather, because I’ve already seen various ways of interpreting that post in action, I offer this only as an elaboration. The key is that whatever you do with PUT, make sure that it is idempotent. If you get very clever on how you define your service you will have to be equally careful to make sure you respect that constraint.

IHE Connectathon – An Oustanding Event

I’m sitting at O’hare waiting for my flight home after a wonderful visit with some friends from my graduate school days at IU that was preceeded by a week of work at the IHE North American Connectathon. As I start writing, it’s in the forth quarter of the NFC championship game, right here in Chicago, and to mood isn’t that great. Go Bears?

I’d be lying if I said that I had been looking forward to this trip. We in the CTO Office had built an XDS Registry and Repository solution to bring to the Connectathon, and while I was very happy with the solution architecture (I’ll write a separate post on this shortly) and we learned a great deal constructing it, I wasn’t expecting the actual Chicago event to be that interesting or fun. I couldn’t have been more wrong.

The event is held in the lowest level of the Hyatt Regency Hotel that sits right on the lake and river in downtown Chicago. There is nothing fancy about the Connectathon layout, in fact, if you follow the link above to the Connectathon site you’ll see it exactly. There are three very long stretches of tables that seat perhaps 75 people on a side. Most companies participating in the event bring 1-2 solutions (EMC brought 2) with about 3 people supporting each solution. A solution has an assigned location along these tables onto which they set up their system. While we brought (shipped) a couple of racks, one primary and one redundant, and some vendors brought actual medical equipment, most just run their solutions on one or more laptops. Setup begins on Sunday, which is a time you can ensure that your system still checks out okay, can properly connect to the network, etc. and Monday at 10:30 the games begin.

The aim of the Connectathon is very simply to demonstrate the use of the IHE Technical Framework (TF) for IT Infrastructure in facilitating the connectivity between a variety of vendors in order to support operations in a healthcare setting. The IHE TF for IT is litterally hundreds of pages spread over a half dozen volumes plus another dozen addendums. They are almost exclusively specified via the application of multitude of standards such as SOAP, ebXML and HL7,plus a significant details specific to the healthcare market such as the definition of a domain for medical records, documents, patients, etc. I worked a bit with these standards more than a year ago, but then got very familiar with them over the last several months as we built out our registry and repository solution. After this development period and some pre-Connectathon testing I was already fairly impressed with the standards, they are certainly thorough, although also extraordinarily complex in places.

What happens when the “games begin” is that each vendor is assigned partners to whom they are to connect to within certain scheduled testing periods, and a list of specific tests that they must demonstrate and pass with that partnership configuration. For some tests the IHE provides a tool that dispatches those tests and for all tests the IHE provides a system into which test results are recorded by each of the vendors. As results are recorded they are graded by a team of a couple of dozen monitors or proctors who are also there to answer questions and help interpret the specs. As a registry and repository vendor we had partnerships that tied our registry with third party repositories, consumers (of data) and sources (of data), and similarly tied our repository with third party registries, consumers and sources. We didn’t know ahead of time who our partners would be. The services of a system are all offered (as specified by the IHE) as (I think exclusively HTTP) endpoints so when you connect to a partner you simply configure in their URIs and you should be good to go.

And here’s the thing – IT WORKS! This is my first reflection on the whole Connectathon event. With hundreds of pages of specifications,and probably thousands of developers interpreting them, for the most part, the systems could then interoperate. We had sources submitting documents and medical records, other sources applying lifecycle events such as versioning of existing documents, consumers querying for content, and other consumers retrieving DICOM images, etc. It was really, really cool. No one off solutions, no custom code to work with vendor A vs. vendor B. It actually all works the way it is supposed to.

My second reflection on the event is that it is a valuable event for the IHE organization because it does expose areas of vagury in the specifications when one vendor implements things one way and another does things another way. In some cases the verbiage in the spec leaves too much room for interpretation.

And third, what an extraordinary opportunity for us vendors. The five days we spend in that room connecting our systems to other vendors tests our solutions in a way that dozens or even hundreds of days of testing within in-house labs could never do.

In retrospect, I was totally blown away.

Finally, I’d be remiss if I didn’t mention how collaborative the spirit was in the room. Of the 101 vendors at the event, more than 70 of them had essentially competing registry and repository solutions, yet we really stretched to help each other as much as possible, finding and sometimes diagnosing bugs with those “competitors” and burning the midnight oil to resolve something. I’m sure this spirit is largely due to the fact that the vast majority of the folks in the room are technologists, and well, geeks have in common an enthusiasm for technology and a desire to make things work. The days were long, but very fun and rewarding.

Oh, and I’m pleased to report that the EMC solutions both passed all tests receiving a nod of approval from the IHE. This is our ticket into HIMSS so come see us there if you are interested.

Every business trip I go on I ask myself at the conclusion whether it was worth it. This time the answer is a resounding “YES!”

Oh, and GO Jets!! (hi Boss. :-) )

XSLT and the default namespace

A colleague of mine who is relatively new to XSLT was getting bitten by an issue with a “hello world” stylesheet where the elements in the source document were all in a namespace and he wasn’t addressing or even including the namespace in the declarations of the XSLT. I found a great write-up that describes the issue at hand, but there is one point that Steve makes in that explanation which is rather subtle. When he says:

…because in XPath 1.0, an unqualified element in a match pattern only matches elements with no namespace URI at all.

All the info is there in that sentence, but it is, especially for someone fairly new to XML namespaces and XSLT, a pretty dense sentence.So, to be more explicit, when you include the namespace declaration in the XSLT you MUST use a namespace prefix. So, referring back to Steve’s post, the following would NOT work.

<xsl :stylesheet version="1.0" xmlns:xdl="http://www.w3.org/1999/XSL/Transform"
   xmlns="uri:something">
      ...    
     <xsl :template match="Foo/Bar[@baz='bop']">

The stylesheet is totally valid, but the xPath in the “match” attribute says to look for a “Foo” element with no namespace and a “Bar” element with no namespace.

Make sense?

Jax-RS : Regular Expressions in @Path Annotations

Sometimes I really wish I did something else for a living. Okay, perhaps a bit over-dramatic but I’m feeling a bit tired this evening and wasting even 20 minutes on something totally useless (which, let’s face it, isn’t unusual in computing) has me ready to call it a night. But since googling the error codes I was getting netted nothing (other than this 2 year old thread), perhaps I can be of service to the next person bitten by this.

I’m writing a RESTful web service, using Jax-RS with a CXF runtime (v 2.2.10). I’m defining a resource with a URI something like /foo/bar/… – that is, the first part of my uri will have the literals “foo” and “bar” and then I want everything else on the URI to go into a parameter. So I want /foo/bar/a, /foo/bar/a/b and /foo/bar/ab/c to all resolve to the same resource/method with a path parameter bound to “a”, “a/b” and “a/b/c”, respectively. So I create the follwoing @Path annotation:

@Path("foo/bar/{therest : .*}")

My initial source for this tidbit was the Safari Books Online copy of Restful Java with Jax-RS, page 47 which showed an example just like this.

When I tried to launch my web service, however, I got an error that indicated something was wrong with the regular expression.

org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'Prototype': Error setting property values;
nested exception is org.springframework.beans.PropertyBatchUpdateException;
nested PropertyAccessExceptions (1) are:PropertyAccessException 1: org.springframework.beans.MethodInvocationException: Property 'serviceBeans' threw exception;
nested exception is java.util.regex.PatternSyntaxException: Illegal repetition near index 27/foo/bar/{therest : .*}(/.*)?...

The answer was really simple, get rid of the spaces around the ‘:’. Argh! Since when do languages care that much about whitespace?! And Argh again! How can a published book have this error?! Or perhaps the bug is in CXF? I’ll look into that when I have a chance.

Team!

I didn’t do sports growing up. I’m not sure exactly how or when I learned the meaning of the term “team”, though at some point I must have, because I now have a tremendous appreciation for the team that I am a part of – the Architecture Group in the EMC CTO office. Not sports, but a fabulous experience nonetheless. But I mention this only to establish that I do understand the concept.

This post is about the team that is the Dos Pueblos Chargers Cross Country team. Our son is a sophomore this year, but it is only his first year on the CC team – last fall he played freshman football, but then some time during the track and field season in the spring he decided to run year around. So we are learning with him, and while today was the first meet of the season, even before today we were getting a strong sense of the nuance of the sport. Sure, for the most part, each athlete is running independently, but, take for example this meet, there were only two individual awards, the fastest boy and the fastest girl, and there were 19 team awards. Most of the team awards are scored by taking the sum the top finishers, usually the top five, either by time or by place, and the lowest score wins. Every single place is meaningful, every second counts.

So, this meet. In this meet the races ran by age – there were separate races for freshman, sophomore, junior and senior levels. My son’s team took 21 boys to this meet where more than 70 teams competed. The seniors ran first, then the freshmen, then juniors and finally, sophomores – the races were about an hour apart. During the last team meeting last night, after Coach had explained that each of the non-senior groups could sleep-in later, one of the sophomores, that is one of the boys who wouldn’t run until after 11am and could hence sleep 3 hours later than the seniors, spoke up and said that he thought everyone should be there for the senior’s race. Everyone agreed.

So, today. Everyone is there for the start of the senior’s race, a race that set the tone. Our top guy finished first, setting a new course record; and every other of our seniors set a PR, finishing 4 in the top 25 (of roughly 200). Hovering around the team tent immediately thereafter virtually all that I heard was the seniors describing the conditions of the course in detail, for the benefit of the teammates that would follow them. “Hey Wes, listen up, when you go around that last turn over by the beach, be really careful because the footing is really loose.” It goes without saying that every athlete was there for every race in which their teammates ran.

So, the last race. By this time it was fairly late in the day (ok, fairly late in the morning, but it already felt like a whole day). We have a really strong sophomore team, lead by a wicked fast guy who, from our perspective, was heavily favored to win the race. And sure enough, at the 2 mile mark he had almost a 10 second lead over #2. He was looking like a sure win. What he said when he passed us at that two mile mark, with one mile to go, brought tears to my eyes. With a sparkle in his eyes he said, “we are 1, 2, 3!” He was as excited about his teammates positions as he was about his own. Whoah.

My son knows at 15 what I didn’t get until I was in my 40s.Chargers at St. John's Bosco Cross Country meet - Sept 11, 2010

Why PUT can’t be used for partial updates

I’ve been discussing PUT vs. PATCH with some colleagues and finally took the time to come up with a concrete example of why PUT absolutely should not be used for partial updates of resources. One colleague pointed out the following points made by Roy Fielding on the AtomPub listserv:

FWIW, PUT does not mean store. I must have repeated that a million times in webdav and related lists. HTTP defines the intended semantics of the communication — the expectations of each party. The protocol does not define how either side fulfills those expectations, and it makes damn sure it doesn’t prevent a server from having absolute authority over its own resources. Also, resources are known to change over time, so if a server accepts an invalid Atom entry via PUT one second and then immediately thereafter decides to change it to a valid entry for later GETs, life is grand.

Roy’s right, of course, but there is a subtlety that is easily overlooked. Sure, the server has the authority to “fix” the resource representation, for example, it might modify part of the resource representation passed in (i.e. if the client tries to modify the server controlled id property, the server can ignore that), but there is an important constraint that the server has to follow. PUT needs to be idempotent! This means the server can’t use the current state of the resource to “fix” the resource representation passed in. Have a look at the following simple example:

Let’s say that I have a resource that consists of a couple of properties, A and B, and I want to use PUT to support partial updates of this resource. The payload of the PUT will carry the properties that we wish to update; if a resource property is not present in the payload then that property will remain unchanged in the resource. We’re playing the “the server has control over what they do with the resource property” card. Check out the following sequence of operations then:

Ouch! The PUT is not idempotent! Client 1 does the same PUT twice, leaving the resource in two different states – one of which could very well be inconsistent. PUT semantics, when correctly followed, give the client the ability to control the consistency of the resource they are writing.

Hence RFC 5789 – PATCH Method for HTTP. PATCH does not have the constraint of being idempotent so the client cannot simply retry requests for which they never received a response. I do think it is interesting that RFC 5789 includes the following:

A PATCH request can be issued in such a way as to be idempotent, which also helps prevent bad outcomes from collisions between two PATCH requests on the same resource in a similar time frame. Collisions from multiple PATCH requests may be more dangerous than PUT collisions because some patch formats need to operate from a known base-point or else they will corrupt the resource. Clients using this kind of patch application SHOULD use a conditional request such that the request will fail if the resource has been updated since the client last accessed the resource. For example, the client can use a strong ETag [RFC2616] in an If-Match header on the PATCH request.

That is, you could make PATCH idempotent if you use the likes of ETags and conditional invocations. The same could be said for a PUT that does partial updates but the opportune world here is “could” – PUT is required to be idempotent even without the use of things such as ETags.

All of that said, while neither RFC 2616, nor RFC 5789 require the use of ETag like mechanisms, in most cases it’s a good idea.

Upgrading WordPress

… or Why I Love this Software

For the handful of you who have actually been reading my blog on occasion (hi Dad) you might have noticed that not only have the posts been scarce, but the site itself was sorely neglected. So recently, where I thought I had more to share, I became more and more bothered by the state of my blogging software. So a few weeks ago I sat down to upgrade.

For a short time I considered using this as a good opportunity to write my own blogging software. I reasoned that since I was building it for only my consumption (the administrative functions, that is) that I wouldn’t have to build all of the bells and whistles into it and might be able to build it fairly quickly. Of course, my plan was to leverage the XMLREST framework that I’ve talked about here and here. But then I started thinking about categories, and tags, and spam filters, and… and I decided that I’d likely get bored long before I created something that was really functional, so I came back to the task of upgrading WordPress.

The WordPress docs point out that the auto upgrade, “which will work for most people”, is the easiest – and I couldn’t agree more, having used it on other occasions – but, well, I’ve often been accused of not being like most people. The auto upgrade option is available from version 2.7 on but, eegahds!!, I was on 2.2.something (I’ve really neglected things!). So I had to do the manual.

The long story involves:

  • figuring that an upgrade would be very problematic because of how large of a gap there was between what I had installed and the latest version (have you ever heard of enterprise software skipping releases and have a decent upgrade experience?!), I set up another blog side-by side, using the WordPress export and import functions. This worked fabulously, but would break any permalinks that folks (hi again Dad) might have to the old entries, because my old permalinks had post IDs in them (and the post IDs changed on import).
  • and then when I sat down to implement the redirects I thought, “what the heck, let’s just try upgrading the old installation, if it fails, I’ve already got the new site up and running.”

I did as the WordPress docs advised, downloaded and unzipped the latest version, used my ftp client to upload that on top of my existing deployment (overwriting files) and then accessed the admin page. Indeed, it asked to upgrade the DB, which I did, and voila – my blog is upgraded. I had spent some time setting up a new theme which I brought over from the parallel, temporary blog, only had to tweak a few things by hand and I was golden.

All of this is to say – this is software that is easy to use, has ample documentation and works as advertised. This is something that all technology vendors should strive for! I love WordPress.