utils module

Convenience functions for the CrossVA module, which help to provide a more user-friendly experience with inputs and error messages across different files.

utils.detect_format(output_format, data)[source]

Detects the format of the input data, determining the closest match

Parameters:
  • output_format (string) – The output format, needed for loading the configuration files to test each
  • data (Pandas DataFrame) – The data being processed where we wish to determine the most likely format
Returns:

the best matching format for the input data

Return type:

str

Examples: Can determine the format of a data file: >>> detect_format(“InSilicoVA”, flexible_read(“resources/sample_data/2016WHO_mock_data_1.csv”)) ‘2016WHOv141’

utils.english_relationship(rel)[source]

Returns abbreviated relationship as full english phrase.

Parameters:rel (str) – a string with the relationship being translated, e.g., “gt”
Returns:
a string with the relationship as a longer english phrase e.g.,
”greater than”. If relationship not defined in the dict english, then this method returns rel without modification.
Return type:str

Raises: TODO

Examples
>>> english_relationship("gt")
'is greater than'
>>> english_relationship("unknown")
'unknown'
utils.flexible_read(path_or_df)[source]

Takes either a path or a Pandas DataFrame, if path, read in as a pandas dataframe. Convenience method to add input flexibility for main transform method.

Parameters:path_or_df (string or Pandas DataFrame) – Either a string representing a path to the file containing the data, or a dataframe that has already been read into Python.
Returns:
either the data at the given path as read by pandas,
or the DataFrame constructor used on the path_or_df argument
Return type:Pandas DataFrame

Examples: Can return a dataframe from a string: >>> flexible_read(“resources/sample_data/2016WHO_mock_data_1.csv”).iloc[:5,:5]

ID -Id10004 -Id10019 -Id10059 -Id10077

0 0 wet dk married dk 1 1 wet female NaN dk 2 2 dry male dk NaN 3 3 dk dk dk dk 4 4 dry NaN married dk

Or apply the pandas dataframe constructor to the input: >>> flexible_read(np.arange(9).reshape(3,3))

0 1 2

0 0 1 2 1 3 4 5 2 6 7 8

utils.report_list(alist, limit=10, paren=True)[source]

Converts alist into a user-friendly string for clearer error messages. Each element is reported single quotes and seperated by commas, with the last element preceded by ” and “. When limit is shorter than the list, cuts the list at the limit, omits the ‘and’, and ends with ‘etc’ to indicate incompleteness.

Parameters:
  • alist (list) – Description of parameter alist.
  • limit (int) – The maximum number of items to report. If more than limit, the list is reported without conjunction and ends with “etc.” Defaults to 10.
  • paren (boolean) – Encloses string in parentheses if true. Defaults to True.
Returns:

human-friendly sentence describing the items in alist

Return type:

str

Examples >>> report_list([“A”,”B”,”C”]) “(‘A’, ‘B’, and ‘C’)”

>>> report_list(["A","B","C"], limit=2)
"('A', 'B', etc)"
>>> report_list(["A","B","C"], limit=2, paren=False)
"'A', 'B', etc"
>>> report_list([])
''