Software Engineering/Data Format

Navigating JDOM Trees

BAGE 2008. 5. 8. 14:49

Once you’ve parsed a document and formed a Document object, you’ll probably want to search it to select out those parts of it your program is interested in. In JDOM, most navigation takes place through the methods of the Element class. The complete children of each Element are available as a java.util.List returned by the getContent() method. Just the child elements of each Element are available in a java.util.List returned by the getChildren() method. (Yes, the terminology is a little confusing here. This is a case where JDOM is marching out of step with the rest of the XML world. JDOM uses the word children to refer only to child elements.)

Because JDOM uses the Java Collections API to manage the tree, it is simultaneously too polymorphic (everything’s an object and must be cast to the right type before you can use it) and not polymorphic enough (there’s no useful generic interface or superclass for navigation such as DOM’s Node class.) Consequently, you’re going to find yourself doing numerous tests with instanceof and casting to the determined type. This is far and away my least favorite part of JDOM’s design. Furthermore, there’s no standard traversal API as there is in DOM to help you avoid reinventing the wheel every time you need to walk a tree or iterate a document. There is a Filter interface that can simplify some of the polymorphism and casting issues a little, but it still won’t let you walk more than one level down the tree at a time.

Let’s begin with Example 14.9, a simple program that reads a document and prints the names of the elements in that document, nicely indented to show the hierarchy. Pay special attention to the listChildren() method. This recursive method is the key to the whole program.

Example 14.9. A JDOM program that lists the elements used in a document

import org.jdom.*;
import org.jdom.input.SAXBuilder;
import java.io.IOException;
import java.util.*;


public class ElementLister {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java ElementLister URL"); 
      return;
    } 
      
    SAXBuilder builder = new SAXBuilder();
     
    try {
      Document doc = builder.build(args[0]);
      Element root = doc.getRootElement();
      listChildren(root, 0);      
    }
    // indicates a well-formedness error
    catch (JDOMException e) { 
      System.out.println(args[0] + " is not well-formed.");
      System.out.println(e.getMessage());
    }  
    catch (IOException e) { 
      System.out.println(e);
    }  
  
  }
  
  
  public static void listChildren(Element current, int depth) {
   
    printSpaces(depth);
    System.out.println(current.getName());
    List children = current.getChildren();
    Iterator iterator = children.iterator();
    while (iterator.hasNext()) {
      Element child = (Element) iterator.next();
      listChildren(child, depth+1);
    }
    
  }
  
  private static void printSpaces(int n) {
    
    for (int i = 0; i < n; i++) {
      System.out.print(' '); 
    }
    
  }

}

The main() method simply parses a document and passes its root element to the listChildren() method along with a depth of zero. The listChildren() method indents a number of spaces equal to the depth in the hierarchy. Then it prints the name of the current element. Next it asks for a list of the children of that element by invoking getChildren(). This returns a java.util.List from the Java Collections API. This list is live. That is, any changes you make to it will be reflected in the original Element. However, this program does not take advantage of that. Instead, it retrieves a java.util.Iterator object using the iterator() method. Then it iterates through the list. Since each item in the list is known to be a JDOM Element object, each item returned by next() can be safely cast to Element and passed recursively to listChildren(). Other than the knowledge that each object in the list is an Element, every step is just standard list iteration from the Java Collections API. Internally, JDOM is actually using a special package-private subclass of List, org.jdom.ContentList; but you don’t need to know this. Everything you need to do can be done through the documented java.util.List interface.

Here’s the beginning of the output when this program is run across this chapter’s source code:

D:\books\XMLJAVA\examples\14>java ElementLister file://D/books/XMLJava/jdom.xml
chapter
 title
 caution
  para
 para
 itemizedlist
  listitem
   para
  listitem
   para
  listitem
   para
  listitem
   para
 para
 para
 caution
  para
 sect1
  title
  para
  blockquote
  …

The getChildren() method only returns elements. It misses everything else completely. For instance, it doesn’t report comments, processing instructions, or text nodes. To get this material you need to use the getContent() method which returns everything. However, this makes life a little trickier because you can no longer assume that everything in the list returned is an Element. You’ll probably need to use a big tree of if (o instance of Element) { … } else if (o instanceof Text) { … in order to choose the processing to perform on each member of the list. Example 14.10 demonstrates with a simple program that recursively lists all the nodes used in the document. Elements are identified by their name. All other items are identified just by their types.

Example 14.10. A JDOM program that lists the nodes used in a document

import org.jdom.*;
import org.jdom.input.SAXBuilder;
import java.io.IOException;
import java.util.*;


public class NodeLister {

  public static void main(String[] args) {
  
    if (args.length == 0) {
      System.out.println("Usage: java NodeLister URL"); 
      return;
    } 
      
    SAXBuilder builder = new SAXBuilder();
     
    try {
      Document doc = builder.build(args[0]);
      listNodes(doc, 0);      
    }
    // indicates a well-formedness error
    catch (JDOMException e) { 
      System.out.println(args[0] + " is not well-formed.");
      System.out.println(e.getMessage());
    }  
    catch (IOException e) { 
      System.out.println(e);
    }  
  
  }
  
  
  public static void listNodes(Object o, int depth) {
   
    printSpaces(depth);
    
    if (o instanceof Element) {
      Element element = (Element) o;
      System.out.println("Element: " + element.getName());
      List children = element.getContent();
      Iterator iterator = children.iterator();
      while (iterator.hasNext()) {
        Object child = iterator.next();
        listNodes(child, depth+1);
      }
    }
    else if (o instanceof Document) {
      System.out.println("Document");
      Document doc = (Document) o;
      List children = doc.getContent();
      Iterator iterator = children.iterator();
      while (iterator.hasNext()) {
        Object child = iterator.next();
        listNodes(child, depth+1);
      }
    }
    else if (o instanceof Comment) {
      System.out.println("Comment");
    }
    else if (o instanceof CDATA) {
      System.out.println("CDATA section");
      // CDATA is a subclass of Text so this test must come
      // before the test for Text.
    }
    else if (o instanceof Text) {
      System.out.println("Text");
    }
    else if (o instanceof EntityRef) {
      System.out.println("Entity reference");
    }
    else if (o instanceof ProcessingInstruction) {
      System.out.println("Processing Instruction");
    }
    else {  // This really shouldn't happen
      System.out.println("Unexpected type: " + o.getClass());
    }
    
  }
  
  private static void printSpaces(int n) {
    
    for (int i = 0; i < n; i++) {
      System.out.print(' '); 
    }
    
  }

}

Here’s the beginning of the output when this program is run across this chapter’s source code:

D:\books\XMLJAVA\examples\14>java NodeLister file://D/books/XMLJava/jdom.xml
Document
 Element: chapter
  Text
  Element: title
   Text
  Text
  Element: caution
   Text
   Element: para
    Text
   Text
  Text
  Element: para
  …

The only pieces that are missing here are the attributes and namespaces associated with each element. These are not included by either getContent() or getChildren(). If you want them, you have to ask for them explicitly using the getAttributes(), getNamespace(), getAdditionalNamespaces(), and related methods of the Element class.

In the next chapter, we’ll look more closely at the classes of objects that appear when you’re navigating a JDOM tree (Element, Text, Comment, etc.) and what you can learn from each one.