Design of computation stream program (filter) ============================================= INTRO ===== Words MUST, MAY, SHALL, SHOULD, written in all capitals, should be interpreted as specified in the RFC 2119 [1]. PROGRAM ======= Write a Perl program to perform an assigned arithmetic or algebraic operation on an algebra system that was offered to you (vectors, hypercomplex numbers, etc.). The program MUST be named as specified in your specific assignment. The program name MUST NOT contain any extension (such as .pl, .py or .sh). Your program MUST behave as a Unix filter, i.e. it MUST read data from the files specified by the file names on the command line, or, if there are no command line arguments, it MUST read from STDIN. The program SHOULD support the '-' special file name to indicate STDIN among the specified file names on the command line (use "./-" to read from the file actually named "-"). Output data MUST be written to STDOUT. Programs MUST work from any working directory and from any installation directory; in fact, it MUST be possible to invoke the program as a Unix command using the PATH environment variable. DOCUMENTATION AND COMMENTS ========================== Brief comments in the beginning of the program should specify: - what is the intended function of the program; what mathematical computation it implements. If the algorithm is non-trivial, a reference or references to relevant publications describing that algorithm MUST be given; - An example of program invocation, with possible typical arguments and argument combinations MUST be given. Avoid using ad-hoc metasymbols (such as './progname [+number]'); instead, always give a working example that can be cut and pasted into a command line; - Description of the input and output formats. For formats that are standardised and described in separate documents, references to these documents MUST be given. For formats specific to the program, a description must be given that is detailed enough to use program's outputs and to generate it's inputs; it should be possible to write further programs to deal with these inputs and outputs, based on the comments provided; - If the formats are not describe elsewhere, or if they are simple and text-based, provide short examples of the relevant inputs and/or outputs. - When using CSV, TAB or similar tabular formats, column names and semantics (meaning: physical meaning, measurement units, constraints) MUST be specified. INPUT AND OUTPUT FORMATS ======================== When writing a program, always thing that its output will not only be viewed by humans but also processed by other programs! Make your output easy for such processing. CSV, TAB or space-separated columns in text lines should be preferred when such data representation is enough (this will always be the case in our exercises). More complicated CIF, XML or JSON formats SHOULD only be used for complex structured data, when line-oriented format is not possible (and when they are used, references to corresponding XML/JSON schemas or CIF dictionaries MUST be provided). For hypercomplex numbers, two simple formats can be used [2,3]. Use them if specified in your assignment. The program MUST NOT output any textual "decorations" (such as non-commented separator lines like "========", extra keywords, ASCII art or such) into the output stream. If an input of a program requires two vectors, or two numbers, one of the possibilities is to assume that these two data items are written on one line one after another; this SHOULD be the same structure as created by a Unix 'paste' utility from two valid input files. IMPLEMENTATION ============== The program MUST NOT contain "sealed" file names, i.e. file names explicitly specified in the program text. Temporary files, if they are needed, MUST be named in such a way that several simultaneous processes running in the same directory do not interfere with each other; also, bear in mind that the current working directory where the program is run may be not writable (or even readable!) for you; you program MUST work regardless of this. The program SHOULD NOT "slurp" data, i.e. read all its input data into RAM, if stream processing is possible, i.e. if it is possible to process a single line or record from the input, write result to the output and re-use the RAM. Storing all data in RAM is only permitted for a) small fixed size amounts of data b) data that has to be used multiple times when processing. An good example of such RAM storage candidate is a matrix from the first file by which all subsequent stream vectors or matrices must be multiplied: the multiplier matrix is stored in RAM, but then each multiplicand is read one at a time, the product is computed and written to STDOUT, and the RAM for the multiplicand reused. ERROR MESSAGES ============== Error messages MUST be polite, informative and concise. Error messages MUST be output to STDERR. Error message format as provided by the Perl die() and warn() functions MAY be used. Errors MUST indicate a) the quoted name of the input file; b) the number of the erroneous line and c) a quotation of the erroneous line, the whole line or the erroneous part. The total length of the error message, however, SHOULD NOT exceed 80 characters (not including the new line designators). In general, fatal error SHOULD be issued when a program encounters situation in which further computations and production of syntactically correct output are impossible. Perl subroutine 'die()' MUST be invoked in such case, and return status set to non-0. Warnings SHOULD be issued when further computations and production of syntactically correct output file is possible, but the results are very likely not the ones which the user would expect. Perl 'warn()' subroutine SHOULD be invoked in such cases. The vector/hypercomplex number processing programs MUST diagnose the following circumstances: - wrong number of input components; - numbers do not correspond to Perl number syntax (diagnostics emitted by Perl under "use warnings" is OK); - If column names are provided, the program MUST check if the column names correspond to the expected ones and output a warning if they do not. REFERENCES ========== 1. S. Bradner "Key words for use in RFCs to Indicate Requirement Levels" (1997) URL: https://tools.ietf.org/html/rfc2119 . 2. Saulius Gražulis (2020) Hiperkompleksinių skaičių vektorinis formatas. https://saulius.grazulis.lt/~saulius/paskaitos/VU/bioinformatika-III/užduotys-praktikai/1-užduotis/formatai/hiperkompleksinių-skaičių-vektorinis-formatas.txt [last accessed: 2021-02-09 07:30:28 EET] 3. Saulius Gražulis (2020) Hiperkompleksinių skaičių simbolinis formatas. https://saulius.grazulis.lt/~saulius/paskaitos/VU/bioinformatika-III/užduotys-praktikai/1-užduotis/formatai/hiperkompleksinių-skaičių-simbolinis-formatas.txt [last accessed: 2021-02-09 07:30:28 EET]