Querying XML Columns in SQL Server 2008

We make use of XML columns in a few places on our current project.  XML columns are great for storing “blobs” of data when you want to be able to change the schema of the stored data without having to make database changes (it also provides flexibility in what you store in that column).

Occasionally, we need to query based on parts of the XML in these columns.  Now because these are XML columns it’s not quite as easy to query on parts of the column as an int or nvarchar column.  This is further complicated when the XML being stored uses namespaces. 

Let’s start with the following table:

customer_table

And here’s a sample of the XML stored in the column:

<CustomerInfo xmlns=http://schemas.datacontract.org/2004/07/Sample.Dto
               xmlns:i="http://www.w3.org/2001/XMLSchema-instance"
               xmlns:d="http://custom.domain.example.org/Addresses">
    <FirstName>Jeremy</FirstName>
    <LastName>Wiebe</LastName>
    <d:Address>
    <d:AddressId>82FC06FA-46A9-47CA-81AA-A0B968BCEE49</d:AddressId>
    <d:Line1>123 Main Street</d:Line1>
    <d:City>Toronto</d:City>
    </d:Address>
</CustomerInfo>

Now we could query for this row in the table by first name using the following query:

WITH XMLNAMESPACES ('http://www.w3.org/2001/XMLSchema-instance' as i,
                    'http://schemas.datacontract.org/2004/07/Sample.Dto' as s,
                    'http://custom.domain.example.org/Addresses' as d)
     select CustomerInfo.value('(/s:CustomerInfo/s:FirstName)[1]', 'nvarchar(max)') as FirstName,
            CustomerInfo.value('(/s:CustomerInfo/s:LastName)[1]', 'nvarchar(max)') as LastName,
            CustomerInfo.value('(/s:CustomerInfo/d:Address/d:City)[1]', 'nvarchar(max)') as City,
     from   Customer
     where  MessageXml.value('(/s:CustomerInfo/d:Address/d:AddressId)[1]', 'nvarchar(max)') = '82FC06FA-46A9-47CA-81AA-A0B968BCEE49'

This would list out the FirstName, LastName, and City for any row where the AddressId was ‘82FC06FA-46A9-47CA-81AA-A0B968BCEE49’. There are a two things to note:

Namespaces

Any time you query XML (in any language/platform) that include XML your query must take that into account.  In T-SQL, you define namespaces and their prefixes using the WITH XMLNAMESPACES statement.  This statement defines the namespace prefix and associated namespace URI. 

Once you’ve defined them, you can use the namespace prefix in your XPath queries so that you can properly identify the XML elements you want.  One final note about namespaces here is that we have to define a namespace prefix for the namespace that was the default namespace in the XML sample.  As far as I know there’s no way to define a default XML namespace using the WITH XMLNAMESPACES statement.

XPath Query results

Notice that the XPath queries (the CustomerInfo.value() calls) always index into the results to get the first item in the result.  The .value() function always returns an array of matching XML nodes, so if you only want the first (which is often the case, you need to index into the results.  Also note that with these XPath queries, the results are a 1-based array, not 0-based.  So in this example we’re always taking the first element in the result set.

 

There are other functions you can use to query XML columns in T-SQL and you can find out more about this in the Books Online or via MSDN.   This page is a good starting point for querying XML columns: http://msdn.microsoft.com/en-us/library/ms189075.aspx.

Advertisements