F77xml library


News
Introduction
Description
Download
ChangeLog
License
FAQ
Contact

News

2007-02-13: version 1.0 Download it here.
The library is no longer maintained. I have too many things to do unfortunately, and I cannot keep working on the library unless it's really needed.

2005-01-13: version 1.0beta is out. Download it here.
Improvements are too many, including support for Fortran 90, full DOM2 API and xpath. F77/F90xml is the future reference library for integration and communication of Quantum Chemistry packages under the COST D23 project "A meta-laboratory for code integration in ab-initio methods". See the AbiGrid website for other informations.

Documentation for the library is provided in the package. I really need a hand to cleanup. Tasks are simple: cleaning up documentation and the... well, website, providing examples. Some skill in packaging and autotools is useful. I'm not really smart with autotools, so I'm not able to assure a perfect compliant package. If you can do better you are very welcome.

2004-06-07: After an informatical odissey, version 1.0alpha is out. check it out.
Yes, I need help. Are you a talented F77 programmer not scared of using my library and writing a program no longer than 60 lines of code? Then this announce is for you.
I need a testsuite for the library, now that the API is stable. The test directory already holds various tests for different calls. If you have time to spend and want to contribute, please consider this. The library code is trivial, but the testsuite is the most time consuming task. Subscribe to the mailing list or email me directly. you are welcome!

2004-05-03: As reported on the mailing list, the project has been forced to a stop due to the fact that my laptop was stolen. I have a backup of the last pre1.0 release, but it's currently on my digital camera. Since the cable (which was also stolen) seems to be proprietary, I'm forced to wait for the cable from AIPTEK (which I wish to thank. They'll send me another cable at no cost) and for another laptop (I have problems accessing the camera using older versions of linux).

So please pardon me for the delay, but I'm doing all the possible for releasing the 1.0 version as soon as possible.

As I remember you, take a look at the draft for the 1.0 release.

2004-03-29: a mailing list for the project has been opened. You can subscribe using this web form

Introduction

In these days, xml plays a central role in all internet infrastructure. Grid computing is a reality.
At the same time, lot of calculations in scientific environments such as physics, chemistry, astronomy and so on are still bound to Fortran (90 in best cases, 77 or even 65 in the worst ones, and i'm not joking, believe me) for reasons that range from compatibility, historical, library availability, efficiency and, last but most important, human factor.
Lots of scientific developers need to cope with code from the past. Given they are not (usually) computer scientists, they learn only the one language they need in their daily job, thus Fortran.

Given these premises, the lack of a Fortran library to handle xml is a problem (indeed, the input handling in Fortran is conceptually a pain, and only with F90 the namelist became a standard and language driven way for reading plain text parameters). Moreover, even in the need of input/output wrappers to convert from xml to namelists (or viceversa) the human factor reason stick to Fortran instead of using more advanced languages or resources, like python, perl or <place your favourite language here>.

All the currently available solutions for reading xml from Fortran are targeted to SAX parsing, using F90 as a (complete) programming language. This produces libraries that are
The first five points are indeed critical. You'll have no hope for advanced features. Fortran has limits in strings, file handling and object management. Period.
The latter is more controversial. Most people doesn't care, since there are lots of F90 professional compilers. Some of them are free (as in beer), but you'll never know for how long they remain so. Scientific computing is moving to Linux and free software for economical and technical reasons, but the lack of a free (as in speech) Fortran 90 compiler is a problem. Also, most of the codes that exists today are still in F77, and the groups that still develop them usually choose either to stick to pure F77 or to mix F77 and F90. They never port everything  to F90.

So we need a library that provide DOM parsing, sticks to F77, is extensible and compiles with GNU g77. In pure F77, this is impossible. In C it become more feasible.

Description


NOTE: These notes are outdated. they refer to the 0.x versions. A new architecture is currently planned and discussed for the 1.0 version. If you want to partecipate, read the draft and join the mailing list

F77xml is a C library designed to provide DOM parsing functionality to Fortran 77.
It acts as a wrapper to gdome2 library. At the moment, the API is very unstable, the code is probably full of memory leaks, there are some conceptual problem to be faced, but you already can read and add elements, text or attributes to an xml file.

The main problem I need to face with the development of this library is to stick to a maximum of 6 characters for each function. This is a major problem when you need to map, for example Element::firstChild to something that is still comprehensible in 6 characters. Also, a lot of namespace pollution problems arise. How to solve?
It's not black magic. I introduced two concepts in the usage of the library
  1. signatures
  2. multiplexers
signatures is a well known term for whom is accustomed to C++, for example. Indeed is very near. Suppose these functions from gdome2

GdomeNode*  gdome_el_firstChild  (GdomeElement *self, GdomeException *exc);
GdomeNode* gdome_el_appendChild (GdomeElement *self, GdomeNode *newChild, GdomeException *exc);


Here you can see some analogy. firstChild accepts an element, returns a node and an error. appendChild accepts an element, a node and an error, and it returns a node (indeed the same as newChild, so it's redundant and could be neglected). In conclusion, if we consider a node having a "code" (in C speech, its pointer, in F77 speech we'll se later), both functions need to handle 2 codes and one error. They are of signature cce (code,code,error). It's not so important if in the first case they are (input,output,output) and in the second case (input, input,output). They are simply parameters, passed by reference and thus can act both as input and as output.
If you take, for example
void gdome_el_setAttribute (GdomeElement *self, GdomeDOMString *name, GdomeDOMString *value, GdomeException *exc);

the signature here is csse (code, string, string, error), which is different from the previuos cases.
The first cases are named "p3" (which means: 3 parameters). The second case "p4" (four parameters). Please note that there isn't always a straightforward match between the gdome choice of parameters and the F77xml signature. In other words, given a gdome function, you cannot a priori obtain the F77xml signature, and viceversa.
Also, please note that the need to distinguish signature "cce" from "sce" still holds. They have both 3 parameters, but they are of different type. For this reason, the first ones are "p3t1" signature functions, and the second ones "p3t2". we prepend an "x" letter and we obtain the name of the associated multiplexer.

What is multiplexer? we still need to solve the problem of calling so different functions with the limitation of low pollution in the fortran namespace. So why not pass the function name as a parameter? this is F77xml solution. To call, for example, firstChild you need to write
func="Element::firstChild"
call xp3t1(func, ...)

and to call setAttribute, simply
func="Element::setAttribute"
call xp4t1(func, ...)
(obviously, func is a character*n)

Ok, not so userfriendly, because you need to lookup the multiplexer function (xp3t1, xp4t1 subroutines) every time, While you still need to know the number and the order of parameters, you need to lookup the type, which is annoying. Also, declaring the func every time as the new string is very error prone (if you write the wrong name for the multiplexed function, the multiplexer complains at execution time, so the program will apparently compile correctly) and requires unneded typing. For this reason, another tool comes in hand: the f77xml  preprocessor.
Currently still in development, the f77xml preprocessor read your sources and do the lookup work for you, then substituting the more user friendly syntax
call f77xml::Element::firstChild(...)

with the more compiler friendly syntax
func="Element::firstChild"
call xp3t1(func, ...)

At the time of writing, the f77xml preprocessor is partially written, so you need to be compiler friendly, but the whole architecture works.

Download

The current (1.0 alpha) is here.
The (0.4.0) is here.
The (0.3.1) is here.
You also need:
  1. gdome2-0.8.0
  2. glib-1.2.10
  3. libxml2-2.5.11
  4. python 2.3
  5. g77+gcc
please note that reported version numbers are only indicative. Probably F77xml compiles against more older or recent version of these libraries, but I have not tested.
You need python because:
  1. Lots of API related stuff with multiplexers are very standard and can be generated automatically given an API description (so you need python to compile f77xml)
  2. the preprocessor is written in python (so you need python to develop using f77xml). This will change, if there's the need of a C preprocessor

ChangeLog

version 0.4.0:

* added Element::parentNode
* Cache now do an assert on the introduction of the same pointer
with addPointer. This is a quick fix (not particularly elegant,
i know) for giving back always the same object code when the
same object is referenced. The Cache clients _must_ check
if the pointer is already present before feeding it to the cache
and eventually free the allocated resource. A future release will
include reference counting, thus freeing the clients from this check
and enforcing a transparent usage.
* Simplified PointerType. There's no need for child of Type_GdomeNode,
since we can use the virtualized destructor of Node for each child
* f77xml_Cache_query marked as deprecated in favour of queryPointer
* added f77xml_Cache_queryCode
* defined NullCode
* added test5
* added autotest script
* Added autoconf/make/libtool stuff
* moved src directory to libf77xml
* fixed offset in xp4t1
* moved signature for xp4t1 from cose to the correct csoe
* implemented el_getAttribute under p5t1
* added test4 to get and set attributes
* added Document::createComment
* first implementation of preprocessor parser in directory fpp


License

The license is LGPL

FAQ

Q: where are the previous versions?
A: in my CVS. I named the first public release 0.3.1 just for coherence between my tags and the version available on the net. If you need previous versions drop me a line, but don't take the changelog as a guide, for these versions. It isn't.

Q: Is the API stable?
A: Absolutely not. I won't manage to have a stable API until version 1.0 rolls out.

Q: If I insert new nodes, my file is not modified. why?
A: The file is not modified for the lack of an API to write files. Currently, the modified file is written in a popular italian root password "pippo" (the italian name for Goofy). Check this file to see your changes.

Contact

Contact me at m u n e h i r o @ f e r r a r a . l i n u x . i t (removing spaces, just a tiny prevention against spam). Collaboration, suggestions, usage reports very appreciated.