xml2txt Documentation

Author: Paul Tremblay
Version: .6
Date: 2006-12-16
Copyright: 2002 Paul Henry Tremblay

Table of Contents

Overview

The script xml2txt converts a valid XML document to formatted text. You would use this script if you needed to convert XML to a plain text document and you needed the text formatted in a special way.

The script offers the following formatting options:

In order to have the script work, you must first convert your XML to valid XML according to the DTD provided with this script. When you run the resulting XML through the script, you will get formatted text. In order to complete step 1 you need to perform some type of XSLT transformation.

Installation

  1. Download and install python found at www.python.org.

  2. Download and install the PyXml module, found at http://pyxml.sourceforge.net/.

    The PyXml module contains the xml tools for python, including SAX, on which xml2txt relies heavily. My latest linux installation included PyXml, so you may not have to download it.

  3. Download the xml2txt from https://sourceforge.net/projects/xml2txt/

  4. Unpack the tar ball:

    tar -xvzf xml2txt.5.tar.gz
    
  5. Move the the newly created directory:

    cd xml2txt.5
    
  6. Install the way you would with any python module:

    python setup.py install
    

    This command installs the around half a dozen modules in the default python library, and an executable script, also in a default location. If you don't have write permission, you will need to use the --prefix option.

  7. Test the script by changing to the test_files directory. Type:

    python test.py
    

    You should see no message if everything is installed and running correctly.

Usage

The script xml2txt can read from standard in or from file. It can output to a file or to standard out. Type:

xml2txt --help

to see the command line options.

If you want to output to a file, use the --output option.

The script xml2txt does a rudimentary check for validity. If the input XML is not valid, it will quit.

Creating a Valid XML Document

In order to transform you XML to fomratted text, you must first transform it to a valid XML document according to the DTD found at:

http://xml2txt.sourceforge.net/dtd/text-object.dtd

If you want concrete examples of valid XML documents, search in the directory test_files. The document "everything.xml" contains examples for every type of formatting possible, and example and explanation are interspersed.

A valid docment is relatively simple. It contains no namespaces. The root element is <doc>, and there are only 4 possible child elements, <page-specs>, <block>, <box>, and <table>.

The page-specs element

The page-specs element contains no child elements and can contain two attributes, body-width and normalize-space. This element is optional.

body-width:

set this attribute the an integer indicating the number of characters of your text. The width can be changed for individual blocks as well.

normalize-space:
 

Set this to "true" or "false". The default is "true" and will result in newlines and extra space being eliminated within the block. Several short lines with new lines at the end will become one long line of the proper length. Blank lines will be eliminated. 2 or more contiguous spaces will become one space.

This option can be changed for local blocks of text.

The block element

The block element contains no child element. Text can only appear inside a block element. Formating of text is determined by the attributes. The following attributes, all optional, are possible:

body-width:

set this attribute the an integer indicating the number of characters of your text.

normalize-space:
 

Set this to "true" or "false". The default is "true" and will result in newlines and extra space being eliminated within the block. Several short lines with new lines at the end will become one long line of the proper length. Blank lines will be eliminated. 2 or more contiguous spaces will become one space.

This option can be changed for local blocks of text.

literal:

This attribute only takes the value of true, which indicates that not fomratting will take place at all. The block of text will be copied as is.

space-before:

Use an integer to set how many spaces you want before your first line of text.

space-after:

Use an integer to set how many spaces you want after your last line of text.

left-indent:

Use an integer to set how many spaces you want before each line of text on the left side.

right-indent:

Use an integer to set how many spaces you want before each line of text on the right side.

new-lines-before:
 

Use an integer to set how many new lines (blank lines) you want before your block of text. If you set the number to 0, then the block of text will appear next to the preceeding block of text, which is sometimes what you want. The default is 2.

new-lines-after:
 

Use an integer to set how many new lines (blank lines) you want after your block of text. If you set the number to 0, then the block of text will appear next to the following block of text, which is sometimes what you want. The default is 2.

first-line-indent:
 

Use an integer to set how much space you want want before the first line of text. You can use this attribute in combination with the left-indent option. If left-indent is set to 5, and you set first-line-indent to 6, then the first line will have 11 spaces before it.

You can also set this number to negative value to get a hangning indent. For example if you set left-indent to 10, and first-line-indent to -10, then the first line will have no spaces before it, but the following lines would be indented 10 spaces.

top-border:

Use text to determine the border you want before your block of text.

top-border-length:
 

This option determines the length of the top border. It can take an ingeter to determine the number of characters, the word "text" to indicate the length of the first line in the block, or a combination of both.

<block top-border="#" top-border-length="3">Usage</block>

Result is:

###
Usage

<block top-border="#" top-border-length="text">Usage</block>

Result is:

#####
Usage

<!--set the top border 5 characters longer than the length of text-->
<block top-border = "#" top-border-length = "text { + 5}"> Usage</block>

Result is:

##########
Usage
bottom-border:

Use text to determine the border you want after your block of text.

bottom-border-length:
 

This option determines the length of the bottom border. It can take an ingeter to determine the number of characters, the word "text" to indicate the length of the first line in the block, or a combination of both. See the above example on top-border-length.

left-border:

Use a text string to determine what will appear as a border on the left of the block.

left-padding:

Use an integer to set how much space should appear between the border and the text.

right-border:

Use a text string to determine what will appear as a border on the right of the block.

right-padding:

Use an integer to set how much space should appear between the border and the text.

The box element

The box should have one or more block elements as children. Use the <box> element to create borders around one or more blocks of text.

The box element uses attributes to determine the type, padding, and length of borders.

top-border:Use a string of text to determine the type of border for the top of the box.
top-padding:Use an integer to determine the space between the top border and the first block.
bottom-border:Use a string of text to determine the type of border for the bottom of the box.
bottom-padding:Use an integer to determine the space between the bottom border and the last block.
left-border:Use a string of text to determine the type of border for the left of the box.
left-padding:Use an integer to determine the space between the left border and the blocks.
right-border:Use a string of text to determine the type of border for the right of the box.
right-padding:Use an integer to determine the space between the right border and the blocks.
new-lines-after:
 Use an integer to determine the number of new lines after the box.
top-border-length:
 This attribute works exactly as it does for the block element.
bottom-border-length:
 This attribute works exactly as it does for the block element.

The table element

The <table> element can contain only <row> elements. The <table> element must have the attribute columns. Columns must be a list of the widths of each column in the table. Each number must be separated by a comma.

The row element

The <row> element can contain only <cell> elements.

It takes the following optional attributes:

bottom-border:Can only take a value of "header". The value "header" indicates that the bottom border should consist of "=".
bottom-border-length:
 Takes a series of numbers separated by a comma. Each number represents the bottom border for a cell. So if you wanted a bottom border only under the second and third cells, you would write: <row bottom-border-length = "2,3">

The cell element

The <cell> element can contain only <block> elements.

It takes the following optional attributes:

add-columns:Takes a numeric value. If you want the current cell to span one row to the right, use a value of "2". (The cell spans 2 rows.) I know this doesn't agree with the reStructure XML, so I may have to change it. (The reStructure XML uses a value of "1" where I use two.)
http://sourceforge.net/sflogo.php?group_id=83171&amp;type=5