Rules¶
data validation rules
This submodule contains the various rules that are used to validate the contents of a file, to determine if it meets the specifications of a layout.
- The rules are divided by the various types:
- cell: the most basic rule type. A cell rule generally receives a single value, but can also perform comparison to other value that exist in the same row.
- column: a more complex rule type which requires knowledge of the entire list of values present in a single column in order to determine validity.
- row: rules that are applied to an entire row. These are not generally meant to be extended by users of this package.
- header: rules that extend the concept of row rules, but are meant to apply specifically to the header row of the file.
- file: rules that apply to the file, including whether the file exists, or matches a particular naming convention.
Out of the Box¶
This package contains a number of field types which are already configured with rules to support that particular type of data. For example, the Text field includes rules for maximum length, minimum length (optional), and nullability.
However, this may not be enough for your purposes. Perhaps you need to ensure that your text field only includes ASCII characters. Fortunately, a rule for this already exists in the package, and the Field classes contain a convenient parameter for applying additional rules to a field:
from rumydata.rules import cell
from rumydata.field import Text
my_field = Text(
max_length=10, min_length=5, nullable=False, rules=[cell.AsciiChar()]
)
my_field.check_cell('ABCDE')
With this new rule applied, any data that is validated against this field will be checked for minimum and maximum length, nullability, and whether the characters are all ASCII. You can always test these fields using the check_cell or check_column methods, depending up on the kind of rule that you’re trying to test.
Extension¶
In the example above, we added a check for ASCII characters only. But what if we need a rule that doesn’t exist in the package? Let’s say that we cannot allow any vowels - A, E, I, O, U - in the cell that we are checking. This package makes it easy to develop custom rules and apply them to your fields:
from rumydata import field
from rumydata import rules
vowel_rule = rules.cell.make_static_cell_rule(
lambda x: all([c.lower() not in ['a', 'e', 'i', 'o', 'u'] for c in x]),
"must not have any vowels"
)
my_field = field.Text(
max_length=10, min_length=5, nullable=False,
rules=[rules.cell.AsciiChar(), vowel_rule]
)
my_field.check_cell('ABCDE')
With our custom vowel_rule, we will now identify any cells that contain values and call this out during validation.
Reference¶
Cell¶
cell validation rules
These rules make up the heart of what most users of the rumydata package will be interested in when attempting to extend the out-of-the box behavior. These rules are generally applied to a single value, in the case of the Rule class, but can also be used to compare the value in a cell to another value in the same row, in the case of the ColumnComparisonRule class.
These rules are intended to be used by adding them directly to rules argument in the constructor of the classes in the field submodule.
-
rumydata.rules.cell.make_static_cell_rule(func, assertion) → rumydata.rules.cell.Rule¶ Static cell rule factory
Return a factory generated Rule class. The function used by the rule must directly evaluate a single positional argument (i.e. x, but not x and y). Because the Rule cannot be passed a value on initialization, neither the evaluator or explain methods in the return class can be dynamic.
Parameters: - func – a function which takes a single positional argument
- assertion – a string describing the condition which must be met in order for the function to return True
Returns: a rumydata.rules.cell.Rule
-
class
rumydata.rules.cell.NotNull¶ Bases:
rumydata.rules.cell.RuleCell not null Rule
-
class
rumydata.rules.cell.ExactChar(exact_length)¶ Bases:
rumydata.rules.cell.RuleCell exact character length Rule
-
class
rumydata.rules.cell.MinChar(min_length)¶ Bases:
rumydata.rules.cell.RuleCell minimum character length Rule
-
class
rumydata.rules.cell.MaxChar(max_length)¶ Bases:
rumydata.rules.cell.RuleCell maximum character length Rule
-
class
rumydata.rules.cell.AsciiChar¶ Bases:
rumydata.rules.cell.RuleCell contains only ASCII character Rule
-
class
rumydata.rules.cell.NonTrim¶ Bases:
rumydata.rules.cell.RuleCell does not have whitespace characters at beginning or end
-
class
rumydata.rules.cell.Choice(choices: List[str], case_insensitive=False)¶ Bases:
rumydata.rules.cell.RuleCell choice Rule
-
class
rumydata.rules.cell.MinDigit(min_length)¶ Bases:
rumydata.rules.cell.RuleCell minimum digit character Rule
Check that count of characters, after removing all non-digits, meets or exceeds the specified minimum. Used to evaluate length of significant digits in numeric strings that might contain formatting.
-
class
rumydata.rules.cell.MaxDigit(max_length)¶ Bases:
rumydata.rules.cell.RuleCell maximum digit character Rule
Check that count of characters, after removing all non-digits, is less than or equal to the specified minimum. Used to evaluate length of significant digits in numeric strings that might contain formatting.
-
class
rumydata.rules.cell.OnlyNumbers¶ Bases:
rumydata.rules.cell.RuleCell only digit characters Rule
-
class
rumydata.rules.cell.NoLeadingZero¶ Bases:
rumydata.rules.cell.RuleCell no leading zero digit Rule
Ensure that there is no leading zero after removing all non-digit characters. A lone zero (0) will not raise an error.
-
class
rumydata.rules.cell.CanBeFloat¶ Bases:
rumydata.rules.cell.RuleCell can be float Rule
-
class
rumydata.rules.cell.CanBeInteger¶ Bases:
rumydata.rules.cell.RuleCell can be integer Rule
-
class
rumydata.rules.cell.NumericDecimals(max_decimals=2)¶ Bases:
rumydata.rules.cell.RuleCell has maximum decimals Rule
-
class
rumydata.rules.cell.LengthComparison(comparison_value)¶ Bases:
rumydata.rules.cell.RuleBase length comparison Rule
-
class
rumydata.rules.cell.LengthGT(comparison_value)¶ Bases:
rumydata.rules.cell.LengthComparisonLength greater than comparison Rule
-
class
rumydata.rules.cell.LengthGTE(comparison_value)¶ Bases:
rumydata.rules.cell.LengthComparisonLength greater than or equal to comparison Rule
-
class
rumydata.rules.cell.LengthET(comparison_value)¶ Bases:
rumydata.rules.cell.LengthComparisonLength equal to comparison Rule
-
class
rumydata.rules.cell.LengthLTE(comparison_value)¶ Bases:
rumydata.rules.cell.LengthComparisonLength less than or equal to comparison Rule
-
class
rumydata.rules.cell.LengthLT(comparison_value)¶ Bases:
rumydata.rules.cell.LengthComparisonLength less than comparison Rule
-
class
rumydata.rules.cell.NumericComparison(comparison_value)¶ Bases:
rumydata.rules.cell.RuleNumeric length comparison base Rule
Base float value comparison class. Requires that the value can be coerced to a float value.
-
class
rumydata.rules.cell.NumericGT(comparison_value)¶ Bases:
rumydata.rules.cell.NumericComparisonNumeric greater than comparison Rule
-
class
rumydata.rules.cell.NumericGTE(comparison_value)¶ Bases:
rumydata.rules.cell.NumericComparisonNumeric greater than or equal to comparison Rule
-
class
rumydata.rules.cell.NumericET(comparison_value)¶ Bases:
rumydata.rules.cell.NumericComparisonNumeric equal to comparison Rule
-
class
rumydata.rules.cell.NumericLTE(comparison_value)¶ Bases:
rumydata.rules.cell.NumericComparisonNumeric less than or equal to comparison Rule
-
class
rumydata.rules.cell.NumericLT(comparison_value)¶ Bases:
rumydata.rules.cell.NumericComparisonNumeric less than comparison Rule
-
class
rumydata.rules.cell.DateRule(**kwargs)¶ Bases:
rumydata.rules.cell.RuleBase date Rule
-
class
rumydata.rules.cell.CanBeDateIso(**kwargs)¶ Bases:
rumydata.rules.cell.DateRuleCan be ISO-8601 date Rule
-
class
rumydata.rules.cell.DateGT(comparison_value, date_format='%Y-%m-%d', **kwargs)¶ Bases:
rumydata.rules.cell.DateComparisonRuleDate greater than comparison Rule
-
class
rumydata.rules.cell.DateGTE(comparison_value, date_format='%Y-%m-%d', **kwargs)¶ Bases:
rumydata.rules.cell.DateComparisonRuleDate greater than or equal to comparison
-
class
rumydata.rules.cell.DateET(comparison_value, date_format='%Y-%m-%d', **kwargs)¶ Bases:
rumydata.rules.cell.DateComparisonRuleDate equal to comparison Rule
-
class
rumydata.rules.cell.DateLTE(comparison_value, date_format='%Y-%m-%d', **kwargs)¶ Bases:
rumydata.rules.cell.DateComparisonRuleDate less than or equal to comparison Rule
-
class
rumydata.rules.cell.DateLT(comparison_value, date_format='%Y-%m-%d', **kwargs)¶ Bases:
rumydata.rules.cell.DateComparisonRuleDate less than comparison Rule
-
class
rumydata.rules.cell.OtherCantExist(compare_to: Union[str, List[str]])¶ Bases:
rumydata.rules.cell.ColumnComparisonRule
-
class
rumydata.rules.cell.OtherMustExist(compare_to: Union[str, List[str]])¶ Bases:
rumydata.rules.cell.ColumnComparisonRule
-
class
rumydata.rules.cell.GreaterThanColumn(compare_to: Union[str, List[str]])¶ Bases:
rumydata.rules.cell.ColumnComparisonRuleGreater than compared column Rule
-
class
rumydata.rules.cell.GreaterThanOrEqualColumn(compare_to: Union[str, List[str]])¶ Bases:
rumydata.rules.cell.ColumnComparisonRuleGreater than compared column Rule
-
class
rumydata.rules.cell.LessThanColumn(compare_to: Union[str, List[str]])¶ Bases:
rumydata.rules.cell.ColumnComparisonRuleLess than compared column Rule
-
class
rumydata.rules.cell.LessThanOrEqualColumn(compare_to: Union[str, List[str]])¶ Bases:
rumydata.rules.cell.ColumnComparisonRuleLess than compared column Rule
-
class
rumydata.rules.cell.NotNullIfCompare(compare_to: [<class 'str'>, typing.List])¶ Bases:
rumydata.rules.cell.ColumnComparisonRule
-
class
rumydata.rules.cell.NotNullIfOtherEquals(compare_to: str, values: Union[str, List[str]])¶ Bases:
rumydata.rules.cell.NotNullIfCompareCell cannot be null if other has specified value(s)
Column¶
column validation rules
These rules capture a common, but much more complex use case for data validation, when it is necessary to compare the values of a single column across multiple rows. The most intuitive example of this is the Unique rule, which requires that every value in a column (excepting blanks) be unique/distinct.
These rules are intended to be used by adding them directly to rules argument in the constructor of the classes in the field submodule.
Users of this package should be aware that the introduction of a column rule can have a dramatic increase on the resources required to perform validation. If there are no column validation rules present in a Layout, then each row will be discarded from memory after validation is complete. However, each field that has one or more column rules will require the entire to be available for validation. In small data sets the impact will be minor, but larger data sets have the potential to introduce performance impacts.
-
class
rumydata.rules.column.Unique¶ Bases:
rumydata.rules.column.RuleColumn values unique Rule