Table

table submodule

This submodule contains the File class, and it’s closely related Layout class.

Layout

class rumydata.table.Layout(_definition: Dict[str, Field], **kwargs)

Table Layout Class

This class contains a collection of Field objects which form a tabular data set. The initialized object is then used to check a row or header for validity.

There are two main ways that this class is intended to be used:
  1. to generate technical documentation for the tabular data set

  2. to use in concert with a File object to validate the contents

Parameters:
  • _definition – dictionary of column names with DataType definitions

  • skip_header – (optional) a boolean control to skip validation of the header in the file. Defaults to False.

  • no_header – (optional) a boolean control that indicates that there is no header in the layout, and that every row should be treated as data. Defaults to False.

  • header_mode – (optional) sets the mode for checking the header. Defaults to ‘exact’, but also accepts ‘startswith’ and ‘contains’. This allows optional partial matching of header values in a layout to the values obtained in tabular data.

  • empty_row_ok – (optional) a boolean control to skip validation of any row that is completely empty (i.e. every field is blank). Defaults to False.

  • title – (optional) a brief name for the layout, which is included in the technical digest.

  • use_excel_cell_format – (optional) a boolean control to specify whether to use Excel-style cell naming

(e.g. A1 to represent rix=1, cix=1, AA20 to represent rix=20, cix=27, etc.) when reporting validation errors.

documentation(doc_type='md')

Technical documentation

Generates detailed specification of the defined layout.

Parameters:

doc_type – format of returned technical documentation

Returns:

a Markdown formatted string describing the layout

check_header(row: List[str], rix=0)

Header Rule assertion

Perform an assertion of the provided row against the header rules defined for this layout. If the row fails the check for any of the rules, the assertion will raise a detailed exception message.

Parameters:
  • row – a list of strings which make up the header row

  • rix – row index number. Used to report position of the header row in the file. Defaults to 0.

check_row(row: List[str], rix=-1)

Row Rule assertion

Perform an assertion of the provided row against the row rules defined for this layout. If the row fails the check for any of the rules, the assertion will raise a detailed exception message.

Parameters:
  • row – a list of strings which make up the row

  • rix – row index number. Used to report position of row in file.

Files

class rumydata.table.CsvFile(layout: Layout | dict, skip_rows=0, max_errors=100, **kwargs)

CSV File class

This class provides a way to validate the contents of a file against a Layout, and report any rule violations that exist. This is the primary means of using this package.

Parameters:
  • layout – a Layout object which defines the fields that make up the data set, along with the various rules that should be applied to each one.

  • skip_rows – the number of rows to skip before starting evaluation.

  • max_errors – (optional) the maximum number of row errors to be collected before halting validation of rows and raising a FileError. This is used to prevent overly verbose (and mostly useless) validation reports from being generated. Defaults to 100. The error limit can overwritten (set to unlimited) by providing a value of -1.

  • dialect – (optional) Controls csv dialect parsing.

  • delimiter – (optional) Controls csv delimiter parsing.

  • quotechar – (optional) Controls csv quote character parsing.

check(file_path: str | Path, doc_type: str = None)

File check method

Perform a check of the layout in this object against a file.

Parameters:
  • file_path – a file path provided as a string, or a pathlib Path object.

  • doc_type – the type of output to return with exception details, rather than raising an exception. Valid options are [‘md’, ‘html’]

class rumydata.table.ExcelFile(layout: Layout | Dict, skip_rows=0, max_errors=100, **kwargs)

Excel File class

This class provides a way to validate the contents of an Excel file against a Layout, and report any rule violations that exist. This is the primary means of using this package.

Parameters:
  • layout – a Layout object which defines the fields that make up the data set, along with the various rules that should be applied to each one.

  • skip_rows – the number of rows to skip before starting evaluation.

  • max_errors – the maximum number of row errors to be collected before halting validation of rows and raising a FileError. This is used to prevent overly verbose (and mostly useless) validation reports from being generated. The error limit can be set to unlimited by providing a value of -1.

check(file_path: str | Path, doc_type: str = None)

File check method

Perform a check of the layout in this object against a file.

Parameters:
  • file_path – a file path provided as a string, or a pathlib Path object.

  • doc_type – the type of output to return with exception details, rather than raising an exception. Valid options are [‘md’, ‘html’]