utils module¶
Convenience functions for the CrossVA module, which help to provide a more user-friendly experience with inputs and error messages across different files.
-
utils.
detect_format
(output_format, data)[source]¶ Detects the format of the input data, determining the closest match
Parameters: - output_format (string) – The output format, needed for loading the configuration files to test each
- data (Pandas DataFrame) – The data being processed where we wish to determine the most likely format
Returns: the best matching format for the input data
Return type: str
Examples: Can determine the format of a data file: >>> detect_format(“InSilicoVA”, flexible_read(“resources/sample_data/2016WHO_mock_data_1.csv”)) ‘2016WHOv141’
-
utils.
english_relationship
(rel)[source]¶ Returns abbreviated relationship as full english phrase.
Parameters: rel (str) – a string with the relationship being translated, e.g., “gt” Returns: - a string with the relationship as a longer english phrase e.g.,
- ”greater than”. If relationship not defined in the dict english, then this method returns rel without modification.
Return type: str Raises: TODO
- Examples
>>> english_relationship("gt") 'is greater than'
>>> english_relationship("unknown") 'unknown'
-
utils.
flexible_read
(path_or_df)[source]¶ Takes either a path or a Pandas DataFrame, if path, read in as a pandas dataframe. Convenience method to add input flexibility for main transform method.
Parameters: path_or_df (string or Pandas DataFrame) – Either a string representing a path to the file containing the data, or a dataframe that has already been read into Python. Returns: - either the data at the given path as read by pandas,
- or the DataFrame constructor used on the path_or_df argument
Return type: Pandas DataFrame Examples: Can return a dataframe from a string: >>> flexible_read(“resources/sample_data/2016WHO_mock_data_1.csv”).iloc[:5,:5]
ID -Id10004 -Id10019 -Id10059 -Id100770 0 wet dk married dk 1 1 wet female NaN dk 2 2 dry male dk NaN 3 3 dk dk dk dk 4 4 dry NaN married dk
Or apply the pandas dataframe constructor to the input: >>> flexible_read(np.arange(9).reshape(3,3))
0 1 20 0 1 2 1 3 4 5 2 6 7 8
-
utils.
report_list
(alist, limit=10, paren=True)[source]¶ Converts alist into a user-friendly string for clearer error messages. Each element is reported single quotes and seperated by commas, with the last element preceded by ” and “. When limit is shorter than the list, cuts the list at the limit, omits the ‘and’, and ends with ‘etc’ to indicate incompleteness.
Parameters: - alist (list) – Description of parameter alist.
- limit (int) – The maximum number of items to report. If more than limit, the list is reported without conjunction and ends with “etc.” Defaults to 10.
- paren (boolean) – Encloses string in parentheses if true. Defaults to True.
Returns: human-friendly sentence describing the items in alist
Return type: str
Examples >>> report_list([“A”,”B”,”C”]) “(‘A’, ‘B’, and ‘C’)”
>>> report_list(["A","B","C"], limit=2) "('A', 'B', etc)"
>>> report_list(["A","B","C"], limit=2, paren=False) "'A', 'B', etc"
>>> report_list([]) ''