|
Raosoft, Inc., Seattle,
WA 1-800-787-8755 raosoft@vovici.com |
|
This document is a technical description of EFS. A non-technical whitepaper is also available.
InterForm and EZSurvey are based on EFS. EZSurvey adds a number of FORM attributes, mostly related to translating the form into other formats (HTML, text, Palm, etc.).
| Date | 31 January 2000 |
| Version | 1.0 |
| XML DTD location | http://www.raosoft.com/xml/efs.dtd |
| File name extension | .form |
The goal of the EFS is to define a basic, extensible file format for forms and surveys. This standard is geared toward electronic forms and web browsers, but an EFS form could concievably be printed out on paper, faxed, compiled on a disk, or placed into a telephone answering system. While EFS is about human interface to databases, EFS does not specify schemes for storing or presenting form data.
An application is said to be "EFS 1.0 Compliant" if it recognizes the
specifications in EFS Level 1 and uses those specifications in Level 2 and
Level 3 which are applicable. The application need not handle all the types
of input and data which could be included in a .form file, so
long as it only modifies that portion of the file that it understands.
EFS is an "Open" standard, meaning that anyone may suggest changes and improvements. The authors encourage public commentary. Please send suggestions to Shanti Rao.
Design goals for EFS are:
Since EFS was designed with internet forms in mind, EFS borrows heavily from HTML 4.0. While a web browser has a fighting chance of displaying most of the information in an EFS file, you should use a program to generate HTML. This does not have to be a one-way translation. It is possible to convert much of the form information in a HTML file into EFS. We are working on an open-source program to do both translations.
A EFS Level 1 file describes the data collected in an electronic form. The form usually includes a number of inputs. Inputs are also referred to as fields, questions. In EFS, each field/question corresponds to one and only one input. Here is a simple example, containing one question.
File simple.form
<?XML version="1.0" ?> <FORM ACTION=http://www.raosoft.com/cgi-bin/raosoft/preview.cgi> <INPUT DATA="TEXT" TYPE="Text" NAME="email" TITLE="Email address"> <TEXT>What is your email address?</TEXT> </INPUT> </FORM> |
There are fundamentally only three types of inputs: text, single choice,
and multiple choice. The DATA attribute of an INPUT should be
one of TEXT, SINGLE, or MULTIPLE. If
the DATA attribute is blank, then the input doesn't actually
refer to a data field, and should be ignored by data analysis programs.
There are many more ways of displaying input. The TYPE
attribute may be one of many input types or their abbreviations. They are:
| Type | Abbreviation | Description | DATA | Size | Options | ||||||||||||||||||||||||||||||||||||||||||||||||
text | (blank) | Write-in text. | TEXT | * | |||||||||||||||||||||||||||||||||||||||||||||||||
hidden | (none) | Text data which is not shown on a data entry screen. | TEXT | * | |||||||||||||||||||||||||||||||||||||||||||||||||
number | N | Write-in number. | TEXT | * | |||||||||||||||||||||||||||||||||||||||||||||||||
password | A password | TEXT | * | ||||||||||||||||||||||||||||||||||||||||||||||||||
date | D | A date. EFS does not define a format for date data. An unambiguous way of representing dates to people is 31-JAN-2000. | TEXT | ||||||||||||||||||||||||||||||||||||||||||||||||||
time |
A feature common to many data entry systems is validation -- making sure data is in a proper, recognizable format. A data entry system may choose to implement validation attributes as part of an input tag.
| Attribute | Comments |
REQUIRED=0|1 | Indicates that a response is required. |
CAPS=0|1 | Indicates that a response must be all caps. Numbers and punctuation are permitted. |
RANGEMAX | Maximum value for a numeric input. |
RANGEMIN | Minimum value for a numeric input. |
scheme:MASK | A validation string against which
data may be compared. Data entry systems may implement validation strings
differently, so the validation tag should be prefixed by a scheme name.
Recognized schemes are regxp, pdx (for Paradox),
and js (for JavaScript code which operates on the variable
value and returns true or false) and py
for Python (which behaves the same as the JavaScript tag, but with Python
instead).
|
VALUE | If a default value is appropriate, it may be
specified with a VALUE attribute. If this value contains quotes
or other characters which may interfere with an XML parser, it should be
URL-encoded.
|
Here is an EFS Level 1 file that shows the use of response options.
File options.form
<?XML version="1.0" ?>
<FORM ACTION=http://www.raosoft.com/cgi-bin/raosoft/preview.cgi>
<INPUT TYPE="checkbox" NAME="fruit">
<TEXT>What types of fruit do you like?</TEXT>
<OPTION VALUE="A">Apple</OPTION>
<OPTION VALUE="B">Banana</OPTION>
<OPTION VALUE="C">Cherry</OPTION>
<OPTION VALUE="D">Durian</OPTION>
</INPUT>
</FORM>
|
All input types must include these:
Name | attribute | Field name |
Data | attribute | The type of data. |
Type | attribute | The type of input. Abbreviations may be used. |
Title | attribute | A short description of the input ("Email Address") |
Text | tag | Full text associated with the input ("What is your email address?") |
Input types which, in the table above, have a star in the Size column must include these:
Size | attribute | Maximum response length |
Input types which, in the table above, have a star in the Options column must include these:
Option | tags | Response options. |
Checkbox (and listmulti) questions are often implemented through separate
database fields for each response. Therefore, it is understood that database
field names for these types of questions may of the form
field_value and the data values 1s or 0s. Alternately, the
option values may be stored as a comma-separated list in a single field.
HTML 4.0 defined <fieldset> ... </fieldset> tags to
be used to group related inputs. Although no web browsers implemented it, we
think that it was a good idea. In EFS, fieldset tags may be placed
around groups of inputtags to denote questions that are
conceptually related. Although a fieldset is intended to tell data
analysis programs about related inputs, it commonly means that the inputs
it includes should appear on the data entry screen together.
Vendors may extend the input definition with features that pertain to data entry and presentation of the question. A program which reads and writes EFS files must accept and preserve tags and attributes associated with inputs that are created by other programs. Some of these extensions are included for convenience in the EFS DTD.
Some recognized extensions to EFS are:
A copyright applies to each input, not to the entire form, because an form may be used as a question library. If a form designer mixes and matches questions from several libraries, then the file format will automatically keep track of the ownership of the questions.
In case two vendors happens to use the
same name for different extensions, we recommend (but do not require) that
proprietary tags be prefixed by the company name. For example,
<raosoft:POPUP> and <pdc:AUTOJUMP> are good
tag names.
Since we agree with the W3C's policy of separating form from function, but we recognize that formatting is an important part of electronic forms, we propose a simple, yet robust, set of rules for describing the appearance of an electronic form.
The best way to explain this is through an example. On top is the EFS form, and below it an appropriate rendering into HTML.
File style.form
<?XML version="1.0" ?>
<FORM NAME="preferences">
<STYLE CONTEXT="HTML" NAME="question" FONT.FACE="sans-serif" FONT.SIZE="2" />
<STYLE CONTEXT="HTML" NAME="choice" PARENT="question" DIV.ALIGN=CENTER />
<INPUT NAME="email" STYLE="question" FONT.COLOR="blue">
<TEXT>What is your email address?</TEXT>
</INPUT>
<INPUT TYPE="checkbox" NAME="fruit" STYLE="choice">
<TEXT>What types of fruit do you like?</TEXT>
<OPTION VALUE="A">Apple</OPTION>
<OPTION VALUE="B">Banana</OPTION>
<OPTION VALUE="C">Cherry</OPTION>
<OPTION VALUE="D">Durian</OPTION>
</INPUT>
</FORM>
|
This should be an obvious implementation of cascading style sheets. To prevent circular references, parent styles must appear in the EFS file prior to child styles. The recognized contexts and their associated style attributes are:
| Context | Attributes | Comments |
| HTML | Font.*, Div.*, Body.* | Font and Div apply to
any input. Body applies when the implementation of the form calls for
each input to appear on a separate web page, or to a fieldset
which groups several questions onto a single web page.
|
| Paper | Font.*, Div.* | Font and Div apply to any input. We recommend that paper printouts of forms deal gracefully with markup tags which may appear in input text, such as <b> and <i> |
EFS encourages vendors to add new style attributes.
To include the DTD for EFS Level 1+2, include the following XML statement in your form:
| <!DOCTYPE EFS SYSTEM "http://www.raosoft.com/xml/efs.dtd"> |
Level 3 concerns extensions which do not affect the form itself. These are
XML tags which appear outside the <FORM> ... </FORM>
section of the .form file. An application may create new Level 3
extensions to EFS. The extensions listed here are the ones we feel will be
useful.
In cases where an extension object refers to another object in the same file, precede the object name with the # character. For most of these, a DTD is inappropriate, but the top-level tags are included in the EFS DTD.
A .form file may define many queries. EFS does not specify a
format for queries -- they may be in any language that a program supports.
Two defined query types are SQL and Raosoft.
<QUERY NAME="Commercial" TYPE="Raosoft" FORM="#preferences"> email.endswith ".com" </QUERY> |
A data analysis program may bundle summary information about the data in a
form in a <SUMMARY> section. There may be any number of these
in a .form file. The purpose of this is to allow comparison
between data sets. For instance, a research firm could sell an employee
survey along with industry-standard results for comparison, or one could
compare summary results of a customer satisfaction survey from year to year.
Here is an example:
<SUMMARY CREATOR="EZSURVEY" QUERY="#Commercial" DATE="31-JAN-2000"> <fruit count=100> <A count=91 /> <B count=42 /> <C count=73 /> <D count=3 /> </fruit> <age count=100 sum=3402 sum2=8315642 /> |
For multiple-choice questions, the summary includes the number of responses in the data set for each of the options. The summary could be extended to include crosstab information.
For a numeric question, the summary includes the number of responses and the moments of the data. This way, statistics such as standard deviation, standard error, skewness, and kurtosis may be generated from combined summary sets.
A SUMMARY tag may include a copyright
statement if it contains proprietary information.
An EFS file may bundle a form and data in the same file. Data should appear
outside of the <FORM> ... </FORM> section of the file, in
a separate <DATA> ... <DATA> section. There may be more than
one DATA section, corresponding to multiple records. The XML data
should look like...
<DATA> <email>shanti@raosoft.com</email> <fruit>A,B,C</fruit> </DATA> |
Since the layout the DATA section depends on the FORM section, there is no DTD for EFS Level 3. Therefore, in order to make parsing straightforward, we require that field tags in the DATA section not be nested. If a relational database is to be represented in the DATA section, then tags belonging to each record should be encapsulated with a tags of the form
<table name=table_name> ... </table> |
An .form file may contain a <REPORT> ...
</REPORT> section. Since there are more reporting languages than
there are software vendors, a program should set NAME and
TYPE attributes in the REPORT tag to identify the
report. The report section will then be ignored by programs which cannot
interpret them.
A suitable behavior for working with a foreign REPORT section
is to display it in a text editor. Standardization is a matter for a future
revision of EFS.
An .form file may contain a <WORKFLOW> ...
</WORKFLOW> section. Since there is no standard workflow system, a
program should set NAME and TYPE attributes in the
WORKFLOW tag to identify the actions that involve the form. The
workflow section will then be ignored by programs which cannot interpret it.
A suitable behavior for working with a foreign WORKFLOW
section is to display it in a text editor. Standardization is a matter for a
future revision of EFS.