Cornelia's Weblog

my sporadically shared thoughts on, well, whatever is capturing my attention at the moment.

Category : XML

Passing parameters to an XQuery or XSLT through an XProc Pipeline

(syndicated from original posting on the EMC Community Network)

I’ve recently been helping someone get started with the XML REST Framework, and they asked about getting query string parameters into the XQueries or XSLTs that were a part of their resource implementation.  For those of you who are not familiar, what the XML REST Framework does is leverages Spring MVC for all of the HTTP protocol stuff, a developer builds a very thin Java shim to an XProc pipeline that implements the RESTful service.  The post I link to above and two earlier versions explain this all in more detail.  What I am focusing on in this post is how I can get values from the query string into the right places within my pipeline; there are two parts to this.  1) you need to get your hands on the query string argument and 2) you need to pass that into the XProc pipeline in such a way that it gets to the XQuery or XSLT.  It turns out that #2 was already demonstrated in the Patients.java class file in combination with the resourceGET.xpl.  Let’s drill in on that a little bit.

If you take a look at the top of the xpl you see the following code – really declarations of the inputs and outputs for the pipeline.

<p:declare-step name="main" xmlns:p="http://www.w3.org/ns/xproc"
     xmlns:c="http://www.w3.org/ns/xproc-step" version="1.0">
   <p:input port='xqueryscript' />
   <p:input port="stylesheet"/>
   <p:input port="stylesheetParameters" kind="parameter"/>
   <p:input port="xqueryParameters" kind="parameter"/>
   <p:output port='result' sequence='true' primary='true' />
   <p:output port='error' sequence="true">
      <p:pipe step='checkXquery' port='error' />
   </p:output>
…

You’ll note two different input ports for parameters, one called stylesheetParameters and the other xqueryParameters; these were names of my choosing (just like parameters in any other language would be.) A parameter input port is named and the value is a list of keyword value pairs; that’s right, a parameter input port actually carries a set of key/value pairs (Norm Walsh, XML guru and one of the lead authors on the XProc spec says “[XProc] parameters suck.” – they are kinda weird and we are thinking about changing things in the next version of XProc).  In this example, I built the pipeline to keep the XQuery parameters separate from the XSLT ones; if I wanted to have a key/value pair passed down into both my XQuery and my XSLT, at the java level I could just add that pair to both parameters.  So speaking of that, our framework allows you to add parameters to the pipeline input with a call to the addParameter method:

pi.addParameter("stylesheetParameters", new QName("baseURL"), request.getRequestURL().toString());

Going back to the xpl then, let’s look at how the input parameters to the pipeline make their way down into the XQuery or XSLT. In each case, we just pass the parameter from the main pipeline down into the appropriate step.  XProc is nice that way and while the current parameter design might feel a bit odd, it does enable this easy, easy approach.

For XQuery:

<p:xquery name="xquery">
   <p:input port='source'>
      <p:document href="xhive:/" />
   </p:input>
   <p:input port="query">
      <p:pipe step="main" port="xqueryscript" />
   </p:input>
   <p:input port="parameters">
      <p:pipe step='main' port='xqueryParameters'/>
   </p:input>
</p:xquery>

And for XSLT:

<p:xslt>
   <p:input port='source'>
      <p:pipe step='xquery' port='result'/>
   </p:input>
   <p:input port='stylesheet'>
      <p:pipe step='main' port='stylesheet'/>
   </p:input>
   <p:input port='parameters'>
      <p:pipe step='main' port='stylesheetParameters'/>
   </p:input>
</p:xslt>

So then the only thing that remains is how to get your hands on the query string argument (#1 of the two things we needed to do) and because our framework uses Sprint MVC for our REST protocol layer, this is in the Java code. The following method could be added to the Patients.java class of the sample application, to show this:

@RequestMapping(method = RequestMethod.GET, value="/alt")
@ResponseStatus(HttpStatus.OK)
public String getPatientAlt(HttpServletRequest request, HttpServletResponse response, Model model, @RequestParam("pid") String pid)
 throws XProcException, TransformerException, IOException {

  try {

    PipelineInputCache pi = new PipelineInputCache();
       // pass the patient ID into the pipeline for use in the xQuery (to look up the right patient)
    pi.addParameter("xqueryParameters", new QName("pid"), pid);
       // supply current resource URL as the base URL to craft hyperlinks
    pi.addParameter("stylesheetParameters", new QName("baseURL"), request.getRequestURL().toString());

    PipelineOutput output = m_getPatient.executeOn(pi);

    model.addAttribute("pipelineOutput", output);
       return "pipelineOutput";
  } finally {
       ;
  }
}

The ONLY difference between this and the original getPatient method is where I get the value for the patient ID that I will pass down into the XQuery.  The way that Sprint MVC does this is with the @RequestParam annotation.  You can see that for testing I changed the @RequestMapping path to be /alt so the URL that will invoke this code is something of the form:

http://example.com/XProcPatientServiceMVC/patients/alt?pid=12345

Note that I didn’t even have to create a new XProcXMLProcessingContext because I just reused all of the implementation from the original get patient – I’m just getting the value for the passed in parameter from somewhere else.

Balisage 2011

I made it to the Balisage conference this year. I’d been eyeing the conference for a couple of years, so I submitted a paper proposal that was accepted and spent the first week of August in Montreal. I confess, aside from a couple of morning runs in the Mount Royal Park, (which is just fantastic) I saw very little of Montreal, but what I did see this week was quite spectacular – the conference itself. Looking at the conference proceedings doesn’t give even a hint as to the uniqueness of this event, the pictures that Tommie Usdin, the conference chair, posted do show a bit of the peculiar (peculiar in a good way) side, but neither of those things bring it all together. This post is my feeble attempt at capturing some of the magic.

The conference is small – about 100 attendees this year – but with luminaries in the XML world like Michael Kay and Norm Walsh in attendance (and Jon Bosak on the advisory board) it’s clearly about quality over quantity. The very eclectic group of participants includes philosophers, computer scientists and librarians, opensource practitioners, folks that produce commercial products, consultants and standards body representatives. The talks were equally diverse ranging from theoretical (TagAl: A tag algebra for document markup), through very practical (Including XSLT stylesheets testing in continuous integration process) to even a game-playing session (Balisage Bluff – an Entertainment). It’s clear that many of the folks involved have known each other and worked together for a great many years, but even as a newbie, by the end of the week I felt a part of the group.

And here’s the punch line (I’ve never been good at holding it to the end) – I really get the sense that this small, quirky conference has significant influence on the shape of the XML landscape. There were several talks that presented work coming out of discussions that happened at last year’s Balisage. For example, Michael Kay did an impromptu session one evening showing usSaxon running in the browser, work that he says came as a direct result of Vojtech Toman’s paper last year on XProc running in the browser. As you might know, browser vendors have not moved to supporting XSLT 2.0 in the browser, severely limiting what developers can do there (XSLT 1.0 is, well, just that, a version 1 spec – v2 is so much more powerful). Saxon-CE (client edition) gets us out of the proverbial chicken-egg situation, giving developers a chance to demonstrate the value and perhaps it will lead to native browser support in the future. New stuff this year included a proposal for blending parts of XSLT and XQuery, a language and processing model for sequences of XML items and tag libraries for XSLT and XQuery – I really liked this paper as it provides a means for implementing custom “HTML tags” using XML-based languages instead of Java. And, because not every attempt is successful, there were new looks at old things like Eric van der Vlist’s paper on multi-ended hrefs, something with potential reach well beyond XML. There were two security related papers (and I actually got them), one showing the risks of XQuery injection and the other that shows promise in allowing encrypted XML to still be processed in certain ways – it was really cool! There were SO many ideas flowing, in a very open and collaborative style, that I think everyone left with renewed enthusiasm and a very full queue of things to explore moving forward.

I’ve been to lots of conferences, marketing conferences, developer conferences and earlier this year, the World Wide Web conference which is quite academic/research focused (and very large). And while I was just about to say I’ve never seen one anything quite like Balisage, come to think of it, the Linked Data on the Web workshop that was co-located with WWW 2011 had a bit of the feel that Balisage has – quirky group of participants, luminaries in the field, work presented that was influenced by previous workshops, lots of excitement. At the time of that workshop I suppose it didn’t make as much of an impression on me because I was an observer in that space rather than a participant. The WS-REST workshop, in which I was a participant, didn’t quite have that same magic, perhaps because it is relatively young. Hmm… have to think about that.

In any case, it was just a fantastic week. I have a ton of stuff I want to work on as a result, and I plan to be back next year to share and see what else has cooked in the mean time.

XSLT and the default namespace

A colleague of mine who is relatively new to XSLT was getting bitten by an issue with a “hello world” stylesheet where the elements in the source document were all in a namespace and he wasn’t addressing or even including the namespace in the declarations of the XSLT. I found a great write-up that describes the issue at hand, but there is one point that Steve makes in that explanation which is rather subtle. When he says:

…because in XPath 1.0, an unqualified element in a match pattern only matches elements with no namespace URI at all.

All the info is there in that sentence, but it is, especially for someone fairly new to XML namespaces and XSLT, a pretty dense sentence.So, to be more explicit, when you include the namespace declaration in the XSLT you MUST use a namespace prefix. So, referring back to Steve’s post, the following would NOT work.

<xsl :stylesheet version="1.0" xmlns:xdl="http://www.w3.org/1999/XSL/Transform"
   xmlns="uri:something">
      ...    
     <xsl :template match="Foo/Bar[@baz='bop']">

The stylesheet is totally valid, but the xPath in the “match” attribute says to look for a “Foo” element with no namespace and a “Bar” element with no namespace.

Make sense?

The JSON and XML debate and what worries me most about JSON…

…my concern is not one of features of JSON – please read on.Okay, so I’ve spent a fair part of the afternoon today surfing through dozens of articles that address JSON vs. XML. And like any other “vs.” debate, these articles express capabilities of each and there are obviously “things” that one addresses differently than the other. I completely agree with what many have said, that it isn’t a matter of one being better than the other, rather they serve potentially different needs. The choice then on which to use when you are designing a system is dependent on what the usage scenarios are. Specifically in the JSON vs. XML discussion, how is the response message (that will come in JSON or XML or perhaps some other format) to be processed by the client?There are a few people in the blogosphere who have commented this this affect, pointing out for example that the XML toolset allows for processing beyond deserialization, but I’m afraid that this point is getting lost in the noise. Mind you I loved Dare Obasanjo’sUpdated: XML Has Too Many Architecture Astronauts and while I confess to going too high in the atmosphere at times, I hear the point loud and clear that we cannot argue away something as popular as JSON with an architectural discussion – it’s popular because it enables something that users want (and no, I am not implying that JSON is fundamentally flawed and needs to be argued away). Absolutely right – JSON is seeing tremendous uptake because it makes things easy for a lot of developers. But that does not change the fact that there might be things we want to do with a received message beyond what JSON can support.I don’t think that there is any argument that JSON, with supporting tools in Javascript and other languages, makes it really easy to take a serialized data structure and parse it, loading the data into a data structure accessible by the client application. What scares me is precisely the fact that this is what a large percentage of the population most often or even always wants to do. JSON is so popular because it allows developers to stay in their comfort zone.* Again, before you throw flames, I don’t mean to imply that the preferred programming paradigm, which remains largely procedural, is always bad, but I am saying that in certain cases there is something better. (*As an aside, this also scares me in the world of SOAP-based Web Services – WSDL is used to generate client side structures and XML is just used to transport the data over to the client structure.)I love XSLT. I was telling a colleague recently about some of the things I’ve built with XSLT and he suggested that XSLT was a relatively rare skill set. Boy, I hope that isn’t so. When I first encountered XSLT I was sooo psyched – you see, I am a functional programmer at heart, having studied programming languages at IU under Dan Friedman – we did EVERYTHING in Scheme. I REALLY like the notion of declarative programming. I much prefer to express what I want done over precise details on how to do it. I am more than happy to have some framework (i.e. the Scheme interpreter) just “make it so.” As to the prevalence of XSLT use, I do think it is used a fair bit, AND, coming full circle back to the JSON/XML topic, there are a lot of programs where what I want to do on the client side (I don’t necessarily mean in the browser) is simply, or first, transform content, maybe into HTML for rendering, maybe into some normalized data format, maybe something else. Why do so many developers still insist on procedural programming when there is an alternative? A failing of XML/XSLT? Maybe. In any case, this is what scares me the most about JSON.Admittedly I am an XML bigot and therefore probably fit the profile of someone who considers it “my precious XML” (as Dare puts it). I will, however, reiterate that I am also pragmatic enough to see the popularity and the value of JSON and when appropriate it will have its place in my designs. While David Megginson posted In praise of architecture astronauts (an excellent post in many respects) primarily to support his argument that JSON and XML are really not that different, they are both tree markup languages, he also shares my viewpoint stating

“In various situations, one syntax may have an advantage due to software support — for example, web browsers have built-in support for parsing XML or styling it using CSS, and they can convert JSON directly to JavaScript data structures using the eval() function…”

Exactly the point!

XML Spy bug: predicate evaluation order

The Xpath 2.0 specification describes predicates and specifically predicate ordering with

“In the case of multiple adjacent predicates, the predicates are applied from left to right, and the result of applying each predicate serves as the input sequence for the following predicate.”

Unfortunately XML Spy seems to have a bug. Given the following example, again taken from the Xpath 2.0 spec:

<bib><book><title>TCP/IP Illustrated</title><author>Stevens</author><publisher>Addison-Wesley</publisher></book><book><title>Advanced Programming in the Unix Environment</title><author>Stevens</author><publisher>Addison-Wesley</publisher></book><book><title>Data on the Web</title><author>Abiteboul</author><author>Buneman</author><author>Suciu</author></book></bib>

The Xpath expression (evaluated from the root) of bib/book/author[. = "Stevens"][1] should evaluate to a single author element with the value “Stevens” – that is, it should, by the definition cited above, evaluate to the same thing as (bib/book/author[. = "Stevens"])[1]. Unfortunately it does not – the former evaluates to two author elements with the value “Stevens”, the latter evaluates to a single author element with the value “Stevens”.Bummer.I am running XML Spy version 2006, sp2 – Enterprise Edition.