
Saturday, September 24, 2016

PowerShell v3 Inferring a Schema XSD from XML File

As a part of my research I have stumbled across need for a script to generate an XSD (XML Schema Document) directly from XML. This is great if you write XML files regularly but havent had time to generate a schema. Plus, if you need to validate XML, this gives you a free, easy to use tool without having to download the Windows SDK. Here is the script. Ill try to walk through it to explain the process,
$file = C:PowershellProjectsPowerShell and XML ypes_xsd.xml
$xsd = C:PowershellProjectsPowerShell and XML ypes_xsd.xsd
# Remove existing XSD
if(Test-Path $xsd)
      Remove-Item -Path $xsd

# Read xml file
$reader = [System.Xml.XmlReader]::Create($file)

# Instntiate XmlSchemaSet and XmlSchemaInference to process new XSD
$schemaSet = New-Object System.Xml.Schema.XmlSchemaSet
$schema = New-Object System.Xml.Schema.XmlSchemaInference

# Infer schemaSet from XML document in $reader
$schemaSet = $schema.InferSchema($reader);

# Create new output file
$file = New-Object System.IO.FileStream($xsd, [IO.FileMode]::CreateNew)

# Create XmlTextWriter with UTF8 Encoding to write to file
$xwriter = New-Object System.Xml.XmlTextWriter($file, [Text.Encoding]::UTF8)

# Set formatting to indented
$xwriter.Formatting = [System.Xml.Formatting]::Indented

# Parse SchemaSet objects
$schemaSet.Schemas() |
ForEach-Object {
      [System.Xml.Schema.XmlSchema] $_.Write($xwriter)

The first couple of lines are purely set up. The .xml and .xsd file paths, then, I remove the .xsd file if it already exists. Since this is just a proof of theory script, youd obviously handle this differently in production grade scripts/functions. Here are the main steps:

  1. I create a [System.Xml.XmlReader]::Create($file) to parse the file specified in the variable.
  2. With the XML document mapped into the $reader object, I then instantiate two new objects: 
    1. System.Xml.Schema.XmlSchemaSet and 
    2. System.Xml.Schema.XmlSchemaInference
  3. Once I have these two objects, I infer the schema, $schema.InferSchema($reader), and, store the inferred  XmlSchemaSets in the  System.Xml.Schema.XmlSchemaSet object,  $schemaSet.
  4. I then create a FileStream object,  $file = New-Object System.IO.FileStream($xsd, [IO.FileMode]::CreateNew), to prevent the underlying .xml file from getting locked by the reader. 
  5. To parse the file, I need an Xml object, so, I create one:   New-Object System.Xml.XmlTextWriter($file, [Text.Encoding]::UTF8).
  6. To ensure my output file is reasonably well-formed, I then set the  $xwriters output formatting to Indented,  [System.Xml.Formatting]::Indented.
  7. Since an XmlSchemaSet object may contain multiple  XmlSchemaSets, I call the $schemaSetSchemas() method enumeration to a  ForEach-Object loop and send each Schema to the $xwriter.
  8. Lastly, to close the stream, I call the appropriate .Close() methods on the two wrappers:
    1. $xwriter.Close()
    2. $reader.Close()
I know this is a bit cryptic, but, this is a pretty .NET heavy script. Its a step in the direction of the developer. Nonetheless, being able to parse your own XML files and generate .xsd files gives you a lot of power, and, considering its power PowerShell, makes it very automation-friendly.

Using this script I was able to read a file with the following XML,

<?xml version="1.0"?>
  <body>Dont forget me this weekend!</body>

And it generated this XSD

<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" >
  <xs:element name="note">
        <xs:element name="to" type="xs:string" />
        <xs:element name="from" type="xs:string" />
        <xs:element name="heading" type="xs:string" />
        <xs:element name="body" type="xs:string" />

I know there are probably 500 ways this could be improved upon, so, feel free to leave long as they dont talk about your web hosting company in South Korea. Hint: Spammers, thats directed at you.

Related Posts by Categories


Post a Comment