What do we mean when we say format?

I was rereading a post by Jakob Voss got me revisiting an issue that I've thought about quite a bit since I started working with and building web services. What do we mean when we say format? For me format is a really thorny issue. Sometimes when we say format we mean a metadata format (or language for describing the thing that is being described) other times what we really mean is serialization. There is a pretty big differences between these two things.

When we talk about metadata format we are expecting to know something about what fields will be present and their "structure" and relationships to one another. Format in this context can be MODS, MARCXML, SKOS, MADS, Atom, RSS.

When we talking about serialization we're talking about something completely different. The two best examples of serializations out their in web service land are XML and JSON. However, the fact that something is JSON or XML doesn't tell you really tell a developer anything about the fields that will be output. This presents a problem when building web services because if a developer says to me "Karen I'd really like that output as JSON" it doesn't necessarily give me enough information because I don't really know what fields they want. If a developer saying that they want the Dublin Core fields serialized as JSON is more specific and a easier jumping off point. One of the biggest challenges with non-XML serializations for library web services is that the library community doesn't seem have much experience with them. Most library metadata format are serialized as XML and to my knowledge no one has taken the time to serialize many library metadata formats JSON. This creates some interesting challenges when creating new library web services which one wants to have JSON output.

The issue of outputting multiple serializations seems a little less thorny for Google which in my opinion handles this in a pretty parsimonious way. Basically you always get Atom as a metadata format from them. However, you can get it back as XML or JSON. Here are a couple examples using the LITA Event Google Calendar

Google Calendar as Atom/XML

Google Calendar as Atom/JSON

But what is Google wanted to make iCal of this data available too? You can get an iCal of your whole calendar but looking at the documentation it doesn't seem like one can retrieve a specific web service query out in iCal format.

The other interesting thing that about Google's APIs is that they embed their own (GData) metadata format within the Atom. Embedding additional schemas is valid in many metadata formats which makes it a little challenging to know what you're getting back as a developer. And because you can declare namespaces wherever you want in an XML document, one can't just look at the root element to find out what you're dealing with. Ultimately you need to apply some human smarts and look at the document before your write code to consume it.

The one helpful thing is that if the web service uses standard metadata formats then you'll get some consistency of practice. The web services at OCLC that use SRU are all very similar which gives one a leg up. Each tries to use standard metadata formats where they can. However, some like Identities and Registry have their own schemas which were developed by OCLC because no existing standard matched the needs of these web services.

Every time an organization exposes a new web service they have to go through the process of thinking about what appropriate metadata formats and serialization are for that service.

  • Karen Coombs

    Karen Coombs

    Senior Product Analyst