Archive for December, 2008

Making Relative URLs Absolute with Groovy

December 21st, 2008

I’m working on a project where I am using Grails as a content delivery layer for an XML based content management system.  The content management system has the ability to publish XHTML content with inline links and images; however the inline img and a tags reference the content as relative paths.  I need the links to internal pages to be absolute from the server root (ie /sub/content.html rather than sub/content.html) and I am hosting all static content like images at Amazon S3 so I need the image links to be absolute to a different DNS name (ie http://media.mywebsite.com/images/image.jpg rather than images/image.jpg).

Instead of implementing some hacks in the CMS itself, I opted to do the transformation in Grails as the content was rendered allowing greater flexibility.  I created a taglib for rendering the XHTML block that does a Groovy replaceAll with regular expressions.  The tag gets called like this: <g:processXHTML content=”${content.mainBody.text()}”/> .  My GSP has reference to a content variable that is an XMLSlurped xml document with a mainBody tag.  Here is the code for the tag:

/**
 * Given XHTML content will return the same content with any relative a tag hrefs prefixed with a /
 * and any relative images prefixed with the mediaURL
 */

def processIngeniuxXHTML = {attrs ->
  def content = attrs.content
  def rootURL = grailsApplication.config.mediaURL

  if (content) {
    // first look for links within the content that are relative URLS and prefix with a slash so they are
    // resolved from the root rather than the current context
    def regex = /(< \s*a\s+[^>]*href\s*=\s*[\"'])(?!http)([^\"'>]+)[\"'>]/
    def replace = { fullMatch, a, b ->
      if (b[0] != "/"){
        // for some reason it's not outputing the closing quote so I added it at then end explicitly
        "${a}/${b}\""
      } else
        "${fullMatch}"
    }

    content = content.replaceAll(regex, replace)

    // second if there are any images referenced with relative URLs, prefix with the DNS for the S3 bucket
    regex = /(< \s*img\s+[^>]*src\s*=\s*[\"'])(?!http)([^\"'>]+)[\"'>]/
    replace = { fullMatch, a, b ->
      if (b[0] != "/"){
        // for some reason it's not outputing the closing quote so I added it at then end explicitly
        "${a}${rootURL}/${b}\""
      } else
        "${fullMatch}"
    }

    out < < content.replaceAll(regex, replace)
  }
}

And here are the test conditions to demonstrate what’s expected:

void testProcessXHTML() {
  ContentUtilTagLib tl = new ContentUtilTagLib()

  def i = """<div><a href="Explore/test"/>"""
  def o = """<div><a href="/Explore/test"/></div>"""

  tl.processXHTML(content: i)
  assertEquals o, out.toString()
  out.getBuffer().setLength(0)

  i = """<div><a alt="blah" href="Explore/test"/></div>"""
  o = """<div><a alt="blah" href="/Explore/test"/></div>"""

  tl.processXHTML(content: i)
  assertEquals o, out.toString()
  out.getBuffer().setLength(0)

  i = """<div><a href="Explore/test"/><img width="100" src="myImage/a.jpg"/></div>"""
  o = """<div><a href="/Explore/test"/><img width="100" src="http://stub/myImage/a.jpg"/></div>"""

  tl.processXHTML(content: i)
  assertEquals o, out.toString()
  out.getBuffer().setLength(0)

  i = """<div><a href="/Explore/test"/></div>"""
  o = """<div><a href="/Explore/test"/></div>"""

  tl.processXHTML(content: i)
  assertEquals o, out.toString()
  out.getBuffer().setLength(0)

  i = """<div><a href="http://Explore/test"/><img width="100" src="http://anotherhost/myImage/a.jpg"/></div>"""
  o = """<div><a href="http://Explore/test"/><img width="100" src="http://anotherhost/myImage/a.jpg"/></div>"""

  tl.processXHTML(content: i)
  assertEquals o, out.toString()
  out.getBuffer().setLength(0)

}

More related to this project to come…

Powered by Web Design Company Plugins

Switch to our mobile site