# $Revision: 10444 $ # $Author: saulius $ # $Date: 2023-02-07 09:04:05 +0000 (Tue, 07 Feb 2023) $ Guidelines for writing projects of the CLI programs =================================================== Saulius Gražulis Vilnius, 2023 m. This document lays down the general rules how the projects (specifications) for CLI programs should be written. 1. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. 2. The project should have all necessary metadata for a published text: the metadata SHOULD author; they MUST contain document date (or publication year), revision or version number, document location or URL. SVN keywords are acceptable method to insert such metadata. 3. The program project MUST give a brief but clear and unambiguous description of the program function(s) that must be implemented. If necessary, mathematical formulae SHOULD be given to defined quantities, preferably in machine-readable form. This description MUST be unambiguous enough to write the working code; i.e. two programmers that would implement two programs from the specification MUST be able to produce compatible (drop-in replacement) code. 4. If the program performs calculations on a physical model, for those quantities that are not specified in the output and input formats, the project MUST specify all physical quantities that the program will read, process and output. It must specify what units will be used for each quantity, unless these units are mandated by file formats or can be specified at run-time according to these formats. 5. The program project MUST specify the program name. Rationale: the program name is a part of program's interface in the CLI use. If the program name changes, old scripts or pipelines that use it will no longer work. 6. The project SHOULD specify in what languages and systems the program will be implemented. 7. The project SHOULD specify the operating environment for the program – operating systems, operating system versions, interpreter versions with which the program "MUST" operate. 8. The project MUST specify formats and required data for all input and output data streams. Rationale: input and output formats and adherence to standards are essential for program interoperability. Only when standard or otherwise well-defined formats are used, we can easily compose Unix commands into efficient pipelines. 9. The program SHOULD use standard formats whenever possible (PDB, PDBx, CIF, JSON, XML, CML, SDF, Yaml, PNM, CSV, etc.); Rationale: developing and adequately describing your own format is a big complicated task and should not be undertaken lightly. Using your own format is only justified when: -- your program's output is so simple (e.g. one number) and so easy to describe formally that using a complicated carrier format is an overkill; -- *none* of the existing formats allow you to capture all information that you need to process and output (very unlikely, though, that this will happen...) -- the whole point of the exercise is developing a new format :). 10. The program MAY use it's own ad-hoc format. It SHOULD only be used when standard formats, as mentioned in the paragraph 4, are not suitable. If the program uses its own format for inputs and/or outputs, then the project MUST: -- specify precisely what the format is, by providing a machine-readable grammar of the format, or by providing the exact layout of bits in the file for binary formats. 11. If the specified format is a generic carrier format, like XML, JSON, CSV or similar, then the required data items (data elements) MUST be specified in the project. Ideally, the project SHOULD provide dictionary, schema or similar tool to validate the input and output data streams. 12. Unless the task at hand requires some specific design (behaviour) pattern (like, e.g., that of 'cp', 'mv' or 'tag' programs), the project SHOULD mandate that the program behaves like a Unix filter, i.e. the specification SHOULD say that: "the program MUST accept one or several input file names on the command line and process all of them; the output MUST be written to STDOUT by default. If input file names are not provided, the program MUST read from STDIN: 13. Required arguments that are not file names SHOULD be at the beginning of the program argument list (e.g. in 'grep' the first mandatory argument is a regular expression, and the remaining arguments are file names). 14. The project MUST specify what environment variables the program must recognise. When appropriate, the project SHOULD mandate the use of the "well-known" environment variables such as TMPDIR, PAGER, PATH, LD_LIBRARY_PATH and so on. 15. The project MUST specify all options that the program must accept. 16. Options SHOULD be optional. Default values SHOULD be mandated in the project for all options that can be omitted. 17. The program SHOULD NOT use options (like '-i' and/or '--input') to specify input file names. The program MAY use the '-o' ('--output') option to specify output file name(s) -- exception to this rule is justified when the program needs several distinct files for its operation and these files play different roles in the process (e.g. dictionaries, configuration tables, reference data, etc.) In that case special options SHOULD be used to specify file names, e.g.: command --config-file input.cfg --covalent-radii-table covalent.csv 18. Options SHOULD be mandated to follow the GNU convention: short options start with a single dash (e.g.: -h); long options start with two dashes (e.g. "--help"). For binary options, the pair of "--option" and "--no-option" SHOULD be used, or similar "--enable-.../--disable-..." or "--with-.../--without-" pairs SHOULD be mandated. 19. It SHOULD be possible to abbreviate options to their unique prefix, i.e. to write '--out' instead of '--output'. 20. If the program will have to output multiple files, the MUST be a mechanism to specify these names on the command line; this mechanism MUST include the possibility to specify the output directory. 21. The project MUST specify how the program will be documented; at least a comment at the beginning of the program describing program's function, input files, output files/streams, their formats, MUST be mandated. The short USAGE example MUST be mandated. For simple ad-hoc text formats, a short example of the input and output format MUST be mandated. 22. If options processing is implemented, the project MUST require the implementation of the "--help" option that prints a short usage message to STDOUT and exits; 23. If options processing is implemented, the project MUST require the implementation of the "--version" option that outputs program version to STDOUT and exits; 24. The project SHOULD specify that Semantic Versioning [2] is used for program versions. 25. The project MUST specify what status codes should be returned by the program on completion. The project MUST mandate to return 0 (zero) status code on successful completion. In case additional conditions are indicated by the return status of the program, the project SHOULD specify the values that can be returned and their meaning (semantics). The project MAY specify that certain return status codes are reserved for future use. The project MAY specify that additional status codes can be returned that are implementation dependent; in that case the project MUST specify the allowed range of such codes. 26. The project MUST specify the format of the error messages. It MUST insist that error messages are output to STDERR. The syntax of the error messages MUST be specified. The project MAY mandate an error message format that is native for the implementation system. For instance, in Perl, the native 'warn' and 'die' functions MAY be mandated. For Ada programs, the default layout of the Ada GNAT runtime exception message MAY be mandated. Otherwise, a well-documented error message format SHOULD be used, for example a GNU error message format [3] or the 'cod-tools' error message format [4,5]. A program specific error message format MAY be used. In the case, the project MUST precisely define the error message syntax. The syntax MUST be defined in a machine-readable form, e.g. by giving EBNF [6,7] syntax of the error messages or a regular expression (ERE [8], PCRE [9]) that the error message MUST conform to. It is PREFERABLE that each error message occupies one line of the text. 27. The project SHOULD specify which error conditions "SHOULD" or "MUST" be detected by the program. 28. The project SHOULD specify which error conditions are fatal and "SHOULD" be reported as "ERRORS", after which the program execution "MUST" terminate with a non-zero exit status, and which error conditions "SHOULD" be reported as "WARNING" or "NOTE", after which the program execution can continue. If WARNINGs were present in the program run, the project MAY mandate non-zero or zero exit status. If NOTEs were present in the program run, the project MUST mandate the zero (0) exit status. Error severity level ("ERROR", "WARNING" or "NOTE") SHOULD be selected based on the validity of the program output. The project MUST mandate the "ERROR" severity level for all conditions that do not allow to compute the correct and correctly formatted result. The "WARNING" error level SHOULD be mandated for those conditions when the output is physically, logically and syntactically correct but might be unusual or not expected by the user (e.g. empty or degenerate output when such output is permitted by the task and the output format). The "NOTE" level SHOULD be mandated for by conditions that do not interfere withe the generation of correct and expected results, but might be of interest for the user when interpreting the output data. 29. The project MAY suggest options that change status of certain or all errors, e.g. for converting "WARNINGS" to "ERRORS" or the other way round, for continuing on errors or stopping on warnings. Conversion of "ERROR" to "WARNING" levels SHOULD NOT violate the assumptions of physically and syntactically correct output. 30. The project SHOULD permit detection of implementation-dependent error conditions and their reporting with adequate severity levels. 31. The project MUST specify the minimum amount of information that "MUST" present in error messages. As a bare minimum, the project should mandate the output of the program name, the properly quoted name of the file that is being processed (with STDIN designated as an unquoted word "STDIN" or quoted single hyphen ('-'); in the latter case the file whose name is a lone dash MUST be reported as './'; i.e. with the path). The project SHOULD mandate reporting of input text line and character position (column) if the text input is processed. For binary files, a binary element identifier must be quoted (e.g. a tag for TIFF or PNG files). The project SHOULD mandate that a short text fragment enclosing error position, or a value of a binary file element, are cited, with appropriate quoting signs, in the error message. 32. The project MAY optionally mandate output of error messages in a structured format, like XML, JSON or similar. The default format, however, SHOULD be a human readable text. If the a structured format is mandated, an XML, JSON or another corresponding schema describing error message structure SHOULD be provided. References ========== 1. S. Bradner "Key words for use in RFCs to Indicate Requirement Levels", RFC 2119, URI: https://tools.ietf.org/html/rfc2119 2. Semantic Versioning, ver. 2.0.0 (2023) URI: https://semver.org/ [accessed 2023-02-02T19:58+02:00] 3. The GNU Project. Formatting Error Messages (1996) URL: https://www.math.utah.edu/docs/info/standards_14.html [accessed 2023-02-03T10:37+02:00] 4. The COD Tools Developers. The 'cod-tools' package (2023) URL: https://github.com/cod-developers/cod-tools [accessed 2023-02-03T10:39+02:00] 5. The COD Tools Developers. Error Message Conventions, rev. 9474 (2022) URL: https://github.com/cod-developers/cod-tools/blob/master/doc/error-messages-conventions.txt [accessed 2023-02-03T10:44+02:00] 6. ISO/IEC 14977 : 1996(E), "Information technology — Syntactic metalanguage — Extended BNF", http://www.cl.cam.ac.uk/~mgk25/iso-14977.pdf 7. Wikipedia. Extended Backus–Naur form (2022) URL: https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form [accessed: 2023-02-03T10:53+02:00; permalink: https://en.wikipedia.org/w/index.php?title=Extended_Backus%E2%80%93Naur_form&oldid=1123418738] 8. Wikipedia. Regular expression (2023) URL: https://en.wikipedia.org/wiki/Regular_expression [accessed 2023-02-03T10:51+02:00; permalink: https://en.wikipedia.org/w/index.php?title=Regular_expression&oldid=1136661034] 9. Wikipedia. Perl Compatible Regular Expressions (2023) URL: https://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions [accessed: 2023-02-03T10:55+02:00; permalink: https://en.wikipedia.org/w/index.php?title=Perl_Compatible_Regular_Expressions&oldid=1132903483] Colophon ======== $Id: guidelines-for-writing-CLI-program-projects.txt 10444 2023-02-07 09:04:05Z saulius $ $URL: file:///home/saulius/svn-repositories/paskaitos/VU/bioinformatika-III/u%C5%BEduotys-praktikai/projekto-gair%C4%97s/guidelines-for-writing-CLI-program-projects.txt $