configuration module¶
Structure for Configuration class
-
class
configuration.
Configuration
(config_data, verbose=1, process_strings=True)[source]¶ Bases:
object
Configuration class details the relationship between a set of input data and output data. It is composed of MapConditions that transform an input data source (2012 WHO, 2016 WHO 141, 2016 WHO 151, PHRMC SHORT) into a different data form (PHRMC SHORT, InSilicoVA, InterVA4, InterVA5, or Tarrif2) for verbal autopsy.
Variables: - given_columns (Pandas Series) – columns of mapping dataframe.
- required_columns (Pandas Series) – required columns in mapping data.
- main_columns (list) – the four main columns required in config_data.
- valid_relationships (Pandas Series) – contains list of valid relationships to use in comparisons. Relationships should be an attr of Pandas Series object, or be defined as a subclass of MapCondition.
- config_data (Pandas DataFrame) – dataframe containing mapping relationships written out.
- given_prereq (Pandas Series) – lists pre-requisites referenced in config data.
- new_columns (Pandas Series) – lists the new columns to be created with config data.
- source_columns (Pandas Series) – lists the source columns required in the raw input data.
- verbose (int) – controls default verbosity of printing to console.
- process_strings (boolean) – whether or not to remove whitespace and non-alphanumeric characters from strings in condition field and in raw_data during mapping.
- validation (Validation) – a validation object containing the validation checks made
-
describe
()[source]¶ Prints the mapping relationships in the Configuration object to console.
Parameters: None – Returns: None Examples
>>> MAP_PATH = "resources/mapping_configuration_files/" >>> EX_MAP_1 = pd.read_csv(MAP_PATH + "example_config_1.csv") >>> Configuration(EX_MAP_1).describe() MAPPING STATS <BLANKLINE> - 16 new columns produced ('AB_POSIT', 'AB_SIZE', 'AC_BRL', 'AC_CONV', 'AC_COUGH', etc) - 12 source columns required ('Id10403', 'Id10362', 'Id10169', 'Id10221', 'Id10154', etc) - 7 relationships invoked ('eq', 'lt', 'between', 'ge', 'contains', etc) - 13 conditions listed ('yes', '14', '10', '21', '15 to 49', etc) - 1 prerequisites checked ('FEMALE')
-
list_conditions
()[source]¶ Lists the final mapping conditions contained in Configuration object
Returns: list of MapConditions, where each MapConditions is created from a row of processed mapping data. Return type: list Examples
>>> MAP_PATH = "resources/mapping_configuration_files/" >>> EX_MAP_1 = pd.read_csv(MAP_PATH + "example_config_1.csv") >>> c = Configuration(EX_MAP_1) >>> c.list_conditions()[:5] [<StrMapCondition: AB_POSIT = [column Id10403].eq(yes)>, <StrMapCondition: AB_SIZE = [column Id10362].eq(yes)>, <NumMapCondition: AC_BRL = [column Id10169].lt(14.0)>, <NumMapCondition: AC_CONV = [column Id10221].lt(10.0)>, <NumMapCondition: AC_COUGH = [column Id10154].lt(21.0)>]
-
main_columns
= ['New Column Name', 'Source Column ID', 'Relationship', 'Condition']¶
-
required_columns
= 0 New Column Name 1 New Column Documentation 2 Source Column ID 3 Source Column Documentation 4 Relationship 5 Condition 6 Prerequisite Name: expected columns, dtype: object¶
-
valid_relationships
= 0 gt 1 ge 2 lt 3 le 4 between 5 eq 6 ne 7 contains Name: valid relationships, dtype: object¶
-
validate
(verbose=None)[source]¶ Prepares and validates the Configuration object’s mapping conditions. Validation fails if there are any inoperable errors. Problems that can be fixed in place are processed and flagged as warnings.
Parameters: verbose (int) – controls print output, should be in range 0-5, each higher level includes the messages of each level below it. Where verbose = 0, nothing will be printed to console. Where verbose = 1, print only errors to console, where verbose = 2, also print warnings, where verbose = 3, also print suggestions and status checks, where verbose = 4, also print passing validation checks, where verbose = 5, also print description of configuration conditions. Defaults to None; if none, replace with self.verbose attribute Returns: - boolean representing whether there are any errors that
- prevent validation
Return type: Boolean Examples
>>> MAP_PATH = "resources/mapping_configuration_files/" >>> EX_MAP_2 = pd.read_csv(MAP_PATH + "example_config_2.csv") >>> c = Configuration(EX_MAP_2) >>> c.validate(verbose=4) Validating Mapping Configuration . . . <BLANKLINE> CHECKS PASSED [X] All expected columns ('New Column Name', 'New Column Documentation', 'Source Column ID', 'Source Column Documentation', 'Relationship', 'Condition', and 'Prerequisite') accounted for in configuration file. [X] No leading/trailing spaces column New Column Name detected. [X] No leading/trailing spaces column Relationship detected. [X] No leading/trailing spaces column Prerequisite detected. [X] No leading/trailing spaces column Condition detected. [X] No whitespace in column Condition detected. [X] No upper case value(s) in column Relationship detected. [X] No upper case value(s) in column Condition detected. [X] No non-alphanumeric value(s) in column Source Column ID detected. [X] No non-alphanumeric value(s) in column Relationship detected. [X] No non-alphanumeric value(s) in column Condition detected. [X] No new column(s) listed but not defined in Mapping Configuration detected. [X] No NA's in column New Column Name detected. [X] No NA's in column Source Column ID detected. <BLANKLINE> ERRORS [!] 3 values in Relationship column were invalid ('eqqqq', 'another fake', and 'gee'). These must be a valid method of pd.Series, e.g. ('gt', 'ge', 'lt', 'le', 'between', 'eq', 'ne', and 'contains') to be valid. [!] 2 row(s) containing a numerical relationship with non-number condition detected in row(s) #8, and #9. [!] 2 values in Prerequisite column were invalid ('ABDOMM', and 'Placeholder here'). These must be defined in the 'new column name' column of the config file to be valid. <BLANKLINE> WARNINGS [?] 2 whitespace in column New Column Name detected in row(s) #6, and #8. Whitespace will be converted to '_' [?] 1 whitespace in column Relationship detected in row(s) #4. Whitespace will be converted to '_' [?] 1 whitespace in column Prerequisite detected in row(s) #9. Whitespace will be converted to '_' [?] 1 non-alphanumeric value(s) in column New Column Name detected in row(s) #6. This text should be alphanumeric. Non-alphanumeric characters will be removed. [?] 2 duplicate row(s) detected in row(s) #1, and #14. Duplicates will be dropped. [?] 1 NA's in column Relationship detected in row(s) #3. [?] 1 NA's in column Condition detected in row(s) #6. False
-
class
configuration.
CrossVA
(raw_data, mapping_config, na_values=['dk', 'ref', ''], verbose=2)[source]¶ Bases:
object
Class representing raw VA data, and how to map it to an algorithm
Variables: - mapping (type) – a validated Configuration object that details how to transform the type of data in raw_data to the desired output.
- data (Pandas DataFrame) – a Pandas DataFrame containing the raw VA data
- prepared_data (Pandas DataFrame) – a Pandas DataFrame containing a prepared form of the VA data to use with the Configuration object.
- validation (Validation) – Validation object containing the validation checks that have been made on the raw data and between the raw data and mapping Configuration.
- verbose (int) – Controls verbosity of printing to console, 0-5 where 0 is silent.
-
process
()[source]¶ Applies the given configuration object’s mappings to the given raw data.
Args: None
Returns: a dataframe where the transformations specified have been applied to the raw data, resulting Return type: Pandas DataFrame
-
validate
(verbose=None)[source]¶ Validates that RawVAData’s raw input data and its mapping configuration object are compatible and prepares input data for use.
Parameters: verbose (int) – int from 0 to 5, representing verbosity of printing to console. Defaults to None; if None, replaced with self.verbose attribute. Returns: True if valid, False if not. Return type: boolean Examples
>>> MAP_PATH = "resources/mapping_configuration_files/" >>> EX_MAP_1 = pd.read_csv(MAP_PATH + "example_config_1.csv") >>> EX_DATA_1 = pd.read_csv("resources/sample_data/mock_data_2016WHO151.csv") >>> CrossVA(EX_DATA_1, Configuration(EX_MAP_1)).validate(verbose=0) True