Electronic Form System

This document is a technical description of EFS. A non-technical whitepaper is also available.

InterForm and EZSurvey are based on EFS. EZSurvey adds a number of FORM attributes, mostly related to translating the form into other formats (HTML, text, Palm, etc.).

Summary information

Date31 January 2000
Version1.0
XML DTD locationhttp://www.raosoft.com/xml/efs.dtd
File name extension.form

The goal of the EFS is to define a basic, extensible file format for forms and surveys. This standard is geared toward electronic forms and web browsers, but an EFS form could concievably be printed out on paper, faxed, compiled on a disk, or placed into a telephone answering system. While EFS is about human interface to databases, EFS does not specify schemes for storing or presenting form data.

An application is said to be "EFS 1.0 Compliant" if it recognizes the specifications in EFS Level 1 and uses those specifications in Level 2 and Level 3 which are applicable. The application need not handle all the types of input and data which could be included in a .form file, so long as it only modifies that portion of the file that it understands.

EFS is an "Open" standard, meaning that anyone may suggest changes and improvements. The authors encourage public commentary. Please send suggestions to Shanti Rao.

Design goals for EFS are:

  1. Make information about the form, the fields and data types, commonly available to form design, data entry, and data analysis programs.
  2. Include information about presentation and formatting for the benefit of data entry programs, which may be ignored by data analysis programs.
  3. Since electronic form and survey software from different vendors may have different features, we allow for ad-hoc extensions.
  4. Allow for protection of intellectual property of surveys and comparison data compiled by third parties.
  5. EFS does not attempt to do everything for everybody. Rather, it includes the basic features of an electronic form that you are likely to need.

Since EFS was designed with internet forms in mind, EFS borrows heavily from HTML 4.0. While a web browser has a fighting chance of displaying most of the information in an EFS file, you should use a program to generate HTML. This does not have to be a one-way translation. It is possible to convert much of the form information in a HTML file into EFS. We are working on an open-source program to do both translations.

EFS Level 1

A EFS Level 1 file describes the data collected in an electronic form. The form usually includes a number of inputs. Inputs are also referred to as fields, questions. In EFS, each field/question corresponds to one and only one input. Here is a simple example, containing one question.

File simple.form
<?XML version="1.0" ?>
<FORM ACTION=http://www.raosoft.com/cgi-bin/raosoft/preview.cgi>
  <INPUT DATA="TEXT" TYPE="Text" NAME="email" TITLE="Email address">
  <TEXT>What is your email address?</TEXT>
  </INPUT>
</FORM>
What is your email address?

There are fundamentally only three types of inputs: text, single choice, and multiple choice. The DATA attribute of an INPUT should be one of TEXT, SINGLE, or MULTIPLE. If the DATA attribute is blank, then the input doesn't actually refer to a data field, and should be ignored by data analysis programs.

There are many more ways of displaying input. The TYPE attribute may be one of many input types or their abbreviations. They are:

TypeAbbreviationDescriptionDATASizeOptions
text(blank)Write-in text.TEXT* 
hidden(none)Text data which is not shown on a data entry screen.TEXT* 
numberNWrite-in number.TEXT* 
password A password TEXT* 
dateDA date. EFS does not define a format for date data. An unambiguous way of representing dates to people is 31-JAN-2000.TEXT  
timeTA time. EFS does not define a format for time data, although HH:MM:SS is traditionally used.TEXT  
radioRA multiplie-choice question, presented as radio buttons. At most, one answer may be selected.SINGLE *
checkboxCA multiple-choice question, presented as check boxes. Many answers may be selected.MULTIPLE *
weightedWA multiple-choice question, usually presented as radio buttons, in which a numerical value is relevant to each choice. For instance, "On a scale of 1 to 10, how do you rate this movie?" We recommend, but do not require, that the high end of the scale indicate a favorable response. SINGLE *
listsingleLSA multiplie-choice question, presented as a drop-down list. At most, one answer may be selected. This is equivelant to SELECT in HTML. SINGLE *
listmultiLMA multiple-choice question, presented as a list. Many answers may be selected. This is equivelant to SELECT MULTIPLE in HTML. MULTIPLE *
listcomboLCWrite-in text, with suggestions presented in a list. TEXT**
listrankLR"Rank the following choices in order of importance." MULTIPLE *
plaintextPTPlain-text instructions for filling out the form. (none)  

Validation

A feature common to many data entry systems is validation -- making sure data is in a proper, recognizable format. A data entry system may choose to implement validation attributes as part of an input tag.

AttributeComments
REQUIRED=0|1Indicates that a response is required.
CAPS=0|1Indicates that a response must be all caps. Numbers and punctuation are permitted.
RANGEMAXMaximum value for a numeric input.
RANGEMINMinimum value for a numeric input.
scheme:MASKA validation string against which data may be compared. Data entry systems may implement validation strings differently, so the validation tag should be prefixed by a scheme name. Recognized schemes are regxp, pdx (for Paradox), and js (for JavaScript code which operates on the variable value and returns true or false) and py for Python (which behaves the same as the JavaScript tag, but with Python instead).
VALUEIf a default value is appropriate, it may be specified with a VALUE attribute. If this value contains quotes or other characters which may interfere with an XML parser, it should be URL-encoded.

Here is an EFS Level 1 file that shows the use of response options.

File options.form
<?XML version="1.0" ?>
<FORM ACTION=http://www.raosoft.com/cgi-bin/raosoft/preview.cgi>
  <INPUT TYPE="checkbox" NAME="fruit">
    <TEXT>What types of fruit do you like?</TEXT>
    <OPTION VALUE="A">Apple</OPTION>
    <OPTION VALUE="B">Banana</OPTION>
    <OPTION VALUE="C">Cherry</OPTION>
    <OPTION VALUE="D">Durian</OPTION>
  </INPUT>
</FORM>

Input parameters

All input types must include these:

NameattributeField name
DataattributeThe type of data.
TypeattributeThe type of input. Abbreviations may be used.
TitleattributeA short description of the input ("Email Address")
TexttagFull text associated with the input ("What is your email address?")

Input types which, in the table above, have a star in the Size column must include these:

Size attributeMaximum response length

Input types which, in the table above, have a star in the Options column must include these:

OptiontagsResponse options.

Multiple selection inputs

Checkbox (and listmulti) questions are often implemented through separate database fields for each response. Therefore, it is understood that database field names for these types of questions may of the form field_value and the data values 1s or 0s. Alternately, the option values may be stored as a comma-separated list in a single field.

Field sets

HTML 4.0 defined <fieldset> ... </fieldset> tags to be used to group related inputs. Although no web browsers implemented it, we think that it was a good idea. In EFS, fieldset tags may be placed around groups of inputtags to denote questions that are conceptually related. Although a fieldset is intended to tell data analysis programs about related inputs, it commonly means that the inputs it includes should appear on the data entry screen together.

Extending Level 1

Vendors may extend the input definition with features that pertain to data entry and presentation of the question. A program which reads and writes EFS files must accept and preserve tags and attributes associated with inputs that are created by other programs. Some of these extensions are included for convenience in the EFS DTD.

Some recognized extensions to EFS are:

ACTIONAn attribute of the FORM tag.URL to a CGI script which handles the submission of data.
METHODAn attribute of the FORM tag.Method to use when submitting data -- usually GET or POST.
<HELP> ... </HELP>A tag belonging to an input.Help text that explains a question in detail.
<COPYRIGHT> ... </COPYRIGHT>A tag belonging to an input.Surveys and questions are often copyrighted. This applies to any question whose use in a survey is copyrighted. The copyright is free-form text.
OWNERAn atttribute of COPYRIGHT.A URL that one may use to find the owner of a copyrighted question

A copyright applies to each input, not to the entire form, because an form may be used as a question library. If a form designer mixes and matches questions from several libraries, then the file format will automatically keep track of the ownership of the questions.

In case two vendors happens to use the same name for different extensions, we recommend (but do not require) that proprietary tags be prefixed by the company name. For example, <raosoft:POPUP> and <pdc:AUTOJUMP> are good tag names.

EFS Level 2

Since we agree with the W3C's policy of separating form from function, but we recognize that formatting is an important part of electronic forms, we propose a simple, yet robust, set of rules for describing the appearance of an electronic form.

The best way to explain this is through an example. On top is the EFS form, and below it an appropriate rendering into HTML.

File style.form
<?XML version="1.0" ?>
<FORM NAME="preferences">
  <STYLE CONTEXT="HTML" NAME="question" FONT.FACE="sans-serif" FONT.SIZE="2" />
  <STYLE CONTEXT="HTML" NAME="choice" PARENT="question" DIV.ALIGN=CENTER />

  <INPUT NAME="email" STYLE="question" FONT.COLOR="blue">
    <TEXT>What is your email address?</TEXT>
  </INPUT>
  <INPUT TYPE="checkbox" NAME="fruit" STYLE="choice">
    <TEXT>What types of fruit do you like?</TEXT>
    <OPTION VALUE="A">Apple</OPTION>
    <OPTION VALUE="B">Banana</OPTION>
    <OPTION VALUE="C">Cherry</OPTION>
    <OPTION VALUE="D">Durian</OPTION>
  </INPUT>
</FORM>
What is your email address?
What types of fruit do you like? Apple Banana Cherry Durian

This should be an obvious implementation of cascading style sheets. To prevent circular references, parent styles must appear in the EFS file prior to child styles. The recognized contexts and their associated style attributes are:

ContextAttributesComments
HTMLFont.*, Div.*, Body.*Font and Div apply to any input. Body applies when the implementation of the form calls for each input to appear on a separate web page, or to a fieldset which groups several questions onto a single web page.
PaperFont.*, Div.*Font and Div apply to any input. We recommend that paper printouts of forms deal gracefully with markup tags which may appear in input text, such as <b> and <i>

EFS encourages vendors to add new style attributes.

Document definition

To include the DTD for EFS Level 1+2, include the following XML statement in your form:

<!DOCTYPE EFS SYSTEM "http://www.raosoft.com/xml/efs.dtd">

EFS Level 3

Level 3 concerns extensions which do not affect the form itself. These are XML tags which appear outside the <FORM> ... </FORM> section of the .form file. An application may create new Level 3 extensions to EFS. The extensions listed here are the ones we feel will be useful.

In cases where an extension object refers to another object in the same file, precede the object name with the # character. For most of these, a DTD is inappropriate, but the top-level tags are included in the EFS DTD.

Queries

A .form file may define many queries. EFS does not specify a format for queries -- they may be in any language that a program supports. Two defined query types are SQL and Raosoft.

<QUERY NAME="Commercial" TYPE="Raosoft" FORM="#preferences">
  email.endswith ".com"
</QUERY>

Summary data

A data analysis program may bundle summary information about the data in a form in a <SUMMARY> section. There may be any number of these in a .form file. The purpose of this is to allow comparison between data sets. For instance, a research firm could sell an employee survey along with industry-standard results for comparison, or one could compare summary results of a customer satisfaction survey from year to year.

Here is an example:
<SUMMARY CREATOR="EZSURVEY" QUERY="#Commercial" DATE="31-JAN-2000">
 <fruit count=100>
   <A count=91 />
   <B count=42 />
   <C count=73 />
   <D count=3 />
 </fruit>
 <age count=100 sum=3402 sum2=8315642 />

For multiple-choice questions, the summary includes the number of responses in the data set for each of the options. The summary could be extended to include crosstab information.

For a numeric question, the summary includes the number of responses and the moments of the data. This way, statistics such as standard deviation, standard error, skewness, and kurtosis may be generated from combined summary sets.

A SUMMARY tag may include a copyright statement if it contains proprietary information.

Data bundling

An EFS file may bundle a form and data in the same file. Data should appear outside of the <FORM> ... </FORM> section of the file, in a separate <DATA> ... <DATA> section. There may be more than one DATA section, corresponding to multiple records. The XML data should look like...

<DATA>
  <email>shanti@raosoft.com</email>
  <fruit>A,B,C</fruit>
</DATA>

Since the layout the DATA section depends on the FORM section, there is no DTD for EFS Level 3. Therefore, in order to make parsing straightforward, we require that field tags in the DATA section not be nested. If a relational database is to be represented in the DATA section, then tags belonging to each record should be encapsulated with a tags of the form

<table name=table_name> ... </table>

Reports

An .form file may contain a <REPORT> ... </REPORT> section. Since there are more reporting languages than there are software vendors, a program should set NAME and TYPE attributes in the REPORT tag to identify the report. The report section will then be ignored by programs which cannot interpret them.

A suitable behavior for working with a foreign REPORT section is to display it in a text editor. Standardization is a matter for a future revision of EFS.

Workflow

An .form file may contain a <WORKFLOW> ... </WORKFLOW> section. Since there is no standard workflow system, a program should set NAME and TYPE attributes in the WORKFLOW tag to identify the actions that involve the form. The workflow section will then be ignored by programs which cannot interpret it.

A suitable behavior for working with a foreign WORKFLOW section is to display it in a text editor. Standardization is a matter for a future revision of EFS.