Frequently I need to pass some text along in an XML document that may have special characters like less-than (<), greater-than (>), or the really annoying nuisance, ampersand (&). Most people fix this lazily by using CDATA nodes. But I hate CDATA nodes with a passion!! I've been using a trick I've used for years and I dunno why I never blogged it. You don't need to use CDATA, nor do you need to manually perform a scan and replace for these special characters. Microsoft already did the dirty work for you in XmlDocument by setting a node's InnerText value and getting the InnerXml value back.
So when I'm generating an XML file, such as with using a StringBuilder or using repeaters on an .aspx template, this is what I sneak into the code-behind:
private static System.Xml.XmlDocument _staticDoc = null;
public static string XmlEncode(string str)
{
if (str == null) return "";
if (_staticDoc == null)
{
_staticDoc = new System.Xml.XmlDocument();
_staticDoc.LoadXml("<text></text>");
}
lock (_staticDoc)
{
_staticDoc.LastChild.InnerText = str;
return _staticDoc.LastChild.InnerXml;
}
}
Then I can just use:
<%# XmlEncode("Ed & Bob")%>
.. where "Ed & Bob" actually comes from a data object. :) This in turn outputs "Ed & Bob".
There's also XmlTextWriter, which I haven't tried yet.
UPDATE: Alright, now I've tried XmlTextWriter. I had a need for ASCII enforcement of XML encoding so that weird Unicode characters are converted to their "&#??;" entity replacements. The method isn't so simple anymore but first tests seem to pass. I'll update this again if I find it to be flawed. Note that I'm putting this among other things into a shared XmlUtil class full of handy static methods. But this update in particular is important because XmlDocument.Load() was failing to load because of some Unicode characters that could be best described in XML entities.
namespace XmlUtil {
private static XmlDocument _staticDoc = null;
private static StringWriter _staticStringWriter = null;
private static XmlWriter _staticXmlWriter = null;
/// <summary>Converts Unicode text into ASCII-compliant XML encoded text</summary>
public static string EncodeText(string str)
{
if (str == null) return "";
if (_staticDoc == null)
{
_staticDoc = new System.Xml.XmlDocument();
_staticDoc.LoadXml("<text></text>");
_staticStringWriter = new StringWriter();
XmlWriterSettings settings = new XmlWriterSettings();
settings.ConformanceLevel = ConformanceLevel.Fragment;
_staticXmlWriter = XmlTextWriter.Create(_staticStringWriter, settings);
}
lock (_staticDoc)
{
_staticDoc.LastChild.InnerText = str;
str = _staticDoc.LastChild.InnerXml;
}
// ASCII enforcement
StringBuilder sb = new StringBuilder();
char[] chars = str.ToCharArray();
for (int i = 0; i < chars.Length; i++)
{
char c = chars[i];
if ((int)c > 127) // goes beyond ASCII charset
{
lock (_staticStringWriter)
{
lock (_staticXmlWriter)
{
_staticXmlWriter.WriteCharEntity(c);
_staticXmlWriter.Flush();
StringBuilder _sb = _staticStringWriter.GetStringBuilder();
sb.Append(_sb.ToString());
_sb.Length = 0;
}
}
}
else sb.Append(c);
}
return sb.ToString();
}
}