Monday, June 13, 2011

ISO 8601 date/time/duration manipulation with XSL

I have an XML log full of events (yay). The vendor have chosen to represent events and event durations with two XML variables: eventStart and eventDuration. My challenge: I need to transform the following XML

<eventDescription>Cisco Burger Maker cannot make burgers</eventDescription>

into CSV that can be scraped by another application

7/13/2011,18:00,19:30,"Cisco Burger Maker cannot make burgers"

You might comment "that eventDescription looks normal, but what kind of silly notation are eventStart and eventDuration in?!" It's ISO 8601 which is the standard for interchange of date and times. It's commonly used in XML documents prevent a Abbott and Costello "Who's on first?" ambiguity when representing date, time and duration.

"One second? No, I need you to tell me the duration now!"

The XML contains the eventStart and eventDuration, but no end time. To produce the output I need, I'm going to need to do some date manipulation. This would be easy enough in any other languages: Java and C# have classes that deal with date manipulation. Unfortunately, XSL isn't as flexible. To do the sort of transformations requires to get the output, we'll need to do a bit of string hacking. To ease the string happening, I've expended all my photoshop skills to produce this diagram that shows the character positions.

I installed Photoshop for this?!

Let's get manipulating!

1) How do you convert an ISO 8601 date to DD/MM/YYYY?
We can do this with simple string manipulation. Because ISO 8601 requires padding of date variables (ie. the Queen's birthday is stored as 2011-06-13 and not 2011-6-13), we are guaranteed that that the first four characters are the year, the 6th and 7th are the month, and the 9th and 10th are the days. You can use the substring command to grab the right characters, some / characters to separate them, and the concat command to glue it all together.

<xsl:value-of select="concat(substring(.,9,2)),'/',substring(.,6,2),'/',substring(.,1,4))"/>

Using that operation could result in the output 02/05/2011. What if we want to drop the preceding zero (ie. get 2/5/2011)? The number function does that.

<xsl:value-of select="concat(number(substring(.,9,2))),'/',number(substring(.,6,2)),'/',number(substring(.,1,4)))"/>

2) How do you get the time from an ISO 8601 date?
This can be performed with easy string manipulation. We can use substring to grab all the five characters after the T and stick them into a new variable called start-time.

<xsl:varaible name="start-time" select="substring(substring-after(.,'T'),1,5)"/>

Applying this to 2011-06-13T18:30:00 gives 18:30.

3) How do I convert an ISO 8601 duration into a 24hr duration?
For the purposes of simplicity, I'm going to assume that your periods contain only hours ('H') and minutes ('M') (ie. your period will either be in the form PT30M, PT1H, PT1H30M). No days/weeks/months/years.

To do this, I'll create three variables.
  • duration-dirty will contain the duration in ISO 8601 format, except with the PT and M characters removed. I'm using this variable to reduce the amount of substring and translate functions in the later code.
  • duration-hour will contain the hour digits
  • duration-minute will contain the minute digits
Here's a diagram that shows these variables relation to the original eventDuration element.

To get dirtyDuration, we can use the substring functions to perform some slicing and dicing. To start, we can get rid of the PT and M characters.

<xsl:variable name="duration-dirty" select="translate(translate(eventDuration/text(),'PT',''),'M','')"/> 

Once we've done that, the period will look something like 30 (30 minutes), 1H (1 hour) or 1H30 (1 hour and 30 minutes. We can determine whether the duration contained hours by converting the duration-dirty variable to a number. If the conversion outputs NaN (not a number), we know there were more than 60 minutes in the duration.

<xsl:variable name="duration-hour">
          <!-- If it's not NaN (ie. a valid number), then the hours are 0 -->
          <xsl:when test="not(string(number($duration-dirty)) = 'NaN')">
                    <!-- If it's NaN, there were hours. Use substring-before to grab anything before the H. -->
                    <xsl:value-of select="substring-before($duration-dirty,'H')"/>

Calculating the minutes is same same but different: we check if the duration-dirty element can be converted to a number. If it can, then dirty-duration contained only minutes (so we can use it). If converting it to a number returns an NaN, there were hours so we need to grab everything after the H.

<xsl:variable name="duration-minute">
          <!-- If it's NaN, there are no hours. duration-dirty is good to use. -->
          <xsl:when test="not(string(number($duration-dirty)) = 'NaN')">
               <xsl:value-of select="$duration-dirty"/>
               <!-- If it's not NaN, then there are hours! Grab everything after the H. -->
               <xsl:value-of select="substring-after($duration-dirty,'H')"/>

4) How do I add times together?
Suppose we want to calculate the end time of an event given eventStart and eventDuration. Step 2 will give us 18:00 from 2011-07-13T18:00:00. Step 3 will give us the variables duration-hour and duration-minute (1 and 30 respectively). But how do we add these two?

Start by calculating the end hour. If we add duration-minute to the minute digits in start-time and exceed 60, an hour has passed. And if we have more than 24 hours...go back to zero using the modulo function! The modulo function is sorta like the the math equivalent of word wrap: 22 mod 24 = 22, 23 mod 24 = 23, 24 mod 24 = 0, 25 mod 24 = 1. Perfect for 'resetting' back to 0.

     <xsl:when test="substring($start-time,3,2) + $duration-minute > 60">
          <!-- An hour has passed! Add an extra hour -->
          <xsl:value-of select="(number(substring($start-time,1,2)) + $duration-hour + 1) mod 24"/> 
          <!-- An hour has not passed. Just add the hours together. -->
          <xsl:value-of select="(substring($start-time,1,2) + $duration-hour) mod 24"/> 

Awesome! But...if you have less than 10 hours, your output won't look pretty (ie. we want 09:30 rather than 9:30). We can easily fix this by padding a zero character if the hours are less than 10.

<xsl:if test="((number(substring($start-time,1,2)) + $duration-hour + 1) mod 24) &lt; 10">

Good. Now calculate the end minute. I'm no physics major but if I recall correctly, there are only 60 minutes in an hour. If there are 60 minutes, the hour increments and the minutes reset back to 0.

     <xsl:when test="not(string(number($duration-minute)) = 'NaN')">
          <!-- More than 60 minutes - go back to 0! -->
     <xsl:value-of select="number(substring($start-time,3,2) + $duration-minute) mod 60"/>
          <!-- Less than 60 minutes. Easy. -->
          <xsl:value-of select="substring($start-time,3,3)"/>

Awesome! This code assumes that your event starts and ends during the same day. I'll leave incrementing the day as an exercise for you. Not because I don't know how, but because my '<' key is playing up!

No comments:

Post a Comment