If you are like me, then you hate it when your code is doing guesswork. When I write an XPath or XQuery statement, there is always room for errors that can't be caught by the compiler.
Below I describe a way to create a sufficiently concrete object oriented wrapper, from any language that can be described by xml. The process, including debugging should take 5-15 minutes depending on your skill level, mostly wasting your time with my superfluous comments.
Attention: If you just want is to parse an OPML file, the code is located in the folder below.
The code is in Visual Basic .net however it's trivial to turn it into c# . Just paste it
hereand click convert. Always remember that all the .net languages are completely equivalent.
Prequisites: You need a version of Visual Studio, preferably 2005 or newer. You can always download one of the free (as in beer)
Express Editions.
Step 1:Now to the feat itself. First off, you need the XSD grammar of the xml language. There are three options:
You most likely have it, or you can Google it. Eg try "opml xsd" -- which won't fetch anything useful by the way. If you have the DTD -again Google that- , then open it in Visual Studio and click XML > Create Schema. If you have nothing, then you can still infer the schema from a sample file, provided that the sample is sufficiently representative. Again use the Create Schema option.
If your initial schema either DTD or XSD has an error, Studio will underline it in red for you. Check out your file to make sure it's ok.
Rant:In the abhorent case of the
OPML, there exists no valid grammar out there. Bizzare huh ? The best you can hope for is an antiquitated DTD [
here] which isn't even a valid DTD. At some point, there's a (true|false) option, where the defaults are true and false. Change this:
isComment (true|false) false
isBreakpoint (true|false) false
so that the false values are in double quotes.
OPML.org, sustained by Mr Winer doesn't help much. In fact the spec, doesn't make
anymentions to urls feeds and such. To his defense, the OPML wasn't intended for that. However something called OPML is now the standard for exchanging feed subscriptions, and just like with the history of RSS, he just pushes and forks without resolving the chaos in the community.
That said, good luck parsing OPML 2. The spec is a text document.
Step 2: Now you need to generate the wrapper classes. Lodged deeply in the Visual Studio package there exists a tool called xsd.exe . In the Start Menu, type 'comm' and open the Studio Command prompt. It should be the first to appear.
Go to the schema file, right click on it, while holding shift, then select Copy as path.
Go back to the command prompt and write "cd " then right-click, select paste and trim the file's name. Hit enter. This should change the prompt to locate you in the schema's directory. Type in dir /w if you want to make sure you're there.
Type "xsd <filename>.xsd /language:VB /c /eld" , substituting the filename with whatever you have. Hit enter. By now there should be a file with the same name as the xsd, which is the object wrapper.
You can change the VB output to CS to get C# files. The eld switch adds support for Linq in the output file. Don't neglect the /c switch. I won't go in the details here, but xsd is allmighty. If you play with it a little you'll solve your problem. If you play with it a lot, you'll loose your sanity.
Once generated, you can go in and do your code menagerie or documentation as required. Note that you should never change the names or casing of the generated classes and their fields. Don't change anything unless you know how serialization works. It's safe to add whatever you want though.
Naturally, you might want to postpone changes until you have verified that parsing is ok. That said, xsd
nevermakes mistakes. If something is wrong, look at you xsd and verify that the xml you are parsing is valid against it.
Step 3:To the parsing itself now. If you are familiar with xml serialization, then we'll be deserializing objects. If not just paste the following lines in your code, and the return class hierarchy will be populated with values from the input file.
Public Shared FunctionParse(ByVal Content As String) As Opml
Dim ss As New Xml.Serialization.XmlSerializer(GetType(opml))
Dim ts As New System.IO.StringReader(Content)
Dim o As opml = ss.Deserialize(ts)
ts.Close()
Return o
End Function
or if you just want to load from a file
Public Shared Function Parse(ByVal Path As String) As opml
Dim ss As New Xml.Serialization.XmlSerializer(GetType(opml))
Dim instream As IO.Stream = IO.File.Open(Path, IO.FileMode.Open)
Dim o As opml = ss.Deserialize(instream)
instream.Close()
Return o
End Function
Naturally you want to replace the opml class with whatever xml element is the root of the xml document you're parsing. You'll find the respective class in the generated wrapper you have.