Character Patterns

The Character Patterns analytic test identifies whether items such as telephone numbers or postal codes are in the expected pattern. The Character Patterns analytic test uses the Regexp language and must adhere to the Python Regexp syntax.

This analytic test can be used to identify character patterns that can help validate data integrity. For example, validating phone numbers, email addresses or postal codes against expected patterns can help to identify potential errors in the data.

Fields used for analysis

The following fields are required for this analysis:

  • Reference field(s) - Unique field(s) that are used to create a unique transaction ID such as the Entry ID field for the general ledger dataset. These columns are not part of the result but are used to identify the transactions that are part of the result. This field is already defined in the test and cannot be modified.

  • Core field(s) - One or more fields with string values which are used to run the test. If multiple fields are selected, the test searches for the same character pattern for each test.

Parameters

The following parameters must be set to run this test:

  • Include or exclude pattern - Select whether to identify fields that follow the specified pattern or that do not follow the specified pattern.

  • Character pattern - Specify the proper python regex pattern to use to analyze the selected fields.

Test configurations

The following configuration is available for this test:

Character pattern - User defined character pattern based on a python regex pattern.

Technical specifications

When you run the Character Patterns analytic test, the following steps are performed to run the test:

  1. If needed place any filters on the data in order that a subset is used for the analysis. If no filter is placed, the analysis will be run on the entire data file. This step can also be performed as the last step instead of the first. Note that the ability to set filters is not currently available and will be available in later versions of the test.

  2. Validate that the necessary reference fields have been selected. If fields have not been selected, then create a unique reference field. This step is only performed if specific fields have been selected. If all the fields are available, this step is not necessary.

  3. Validate that one or more character fields have been selected for the analytic. If multiple fields have been selected, the same regexp is used on each one.

  4. Validate that the user has indicated if they are looking for lines which follow the pattern or do not follow the pattern.

  5. Obtain the regexp. This should be formatted as a regular expression that can be used in python. See re — Regular expression operations to learn more.

  6. Depending on whether the user has selected to identify values that follow or do not follow the pattern, extract the values that meet the criteria based on the regexp supplied.

    1. If there are multiple fields and the user has selected to identify values that do not follow the pattern, then the transaction line is extracted if at least one of the columns is not true.

    2. If there are multiple fields and the user has selected to identify values that do follow the pattern, the transaction line is extracted only if all of the fields follow the pattern.

  7. Extract the result fields selected by the user. All fields are extracted by default. Note that the ability to select result fields is not currently available and will be available in later versions of the test.