configuration module

Structure for Configuration class

class configuration.Configuration(config_data, verbose=1, process_strings=True)[source]

Bases: object

Configuration class details the relationship between a set of input data and output data. It is composed of MapConditions that transform an input data source (2012 WHO, 2016 WHO 141, 2016 WHO 151, PHRMC SHORT) into a different data form (PHRMC SHORT, InSilicoVA, InterVA4, InterVA5, or Tarrif2) for verbal autopsy.

  • given_columns (Pandas Series) – columns of mapping dataframe.
  • required_columns (Pandas Series) – required columns in mapping data.
  • main_columns (list) – the four main columns required in config_data.
  • valid_relationships (Pandas Series) – contains list of valid relationships to use in comparisons. Relationships should be an attr of Pandas Series object, or be defined as a subclass of MapCondition.
  • config_data (Pandas DataFrame) – dataframe containing mapping relationships written out.
  • given_prereq (Pandas Series) – lists pre-requisites referenced in config data.
  • new_columns (Pandas Series) – lists the new columns to be created with config data.
  • source_columns (Pandas Series) – lists the source columns required in the raw input data.
  • verbose (int) – controls default verbosity of printing to console.
  • process_strings (boolean) – whether or not to remove whitespace and non-alphanumeric characters from strings in condition field and in raw_data during mapping.
  • validation (Validation) – a validation object containing the validation checks made

Prints the mapping relationships in the Configuration object to console.



>>> MAP_PATH = "resources/mapping_configuration_files/"
>>> EX_MAP_1 = pd.read_csv(MAP_PATH + "example_config_1.csv")
>>> Configuration(EX_MAP_1).describe()
 -   16 new columns produced ('AB_POSIT', 'AB_SIZE', 'AC_BRL', 'AC_CONV', 'AC_COUGH', etc)
 -   12 source columns required ('Id10403', 'Id10362', 'Id10169', 'Id10221', 'Id10154', etc)
 -   7 relationships invoked ('eq', 'lt', 'between', 'ge', 'contains', etc)
 -   13 conditions listed ('yes', '14', '10', '21', '15 to 49', etc)
 -   1 prerequisites checked ('FEMALE')

Lists the final mapping conditions contained in Configuration object

Returns:list of MapConditions, where each MapConditions is created from a row of processed mapping data.
Return type:list


>>> MAP_PATH = "resources/mapping_configuration_files/"
>>> EX_MAP_1 = pd.read_csv(MAP_PATH + "example_config_1.csv")
>>> c = Configuration(EX_MAP_1)
>>> c.list_conditions()[:5]
[<StrMapCondition:     AB_POSIT = [column Id10403].eq(yes)>,
 <StrMapCondition:     AB_SIZE = [column Id10362].eq(yes)>,
 <NumMapCondition:     AC_BRL = [column Id10169].lt(14.0)>,
 <NumMapCondition:     AC_CONV = [column Id10221].lt(10.0)>,
 <NumMapCondition:     AC_COUGH = [column Id10154].lt(21.0)>]
main_columns = ['New Column Name', 'Source Column ID', 'Relationship', 'Condition']
required_columns = 0 New Column Name 1 New Column Documentation 2 Source Column ID 3 Source Column Documentation 4 Relationship 5 Condition 6 Prerequisite Name: expected columns, dtype: object
valid_relationships = 0 gt 1 ge 2 lt 3 le 4 between 5 eq 6 ne 7 contains Name: valid relationships, dtype: object

Prepares and validates the Configuration object’s mapping conditions. Validation fails if there are any inoperable errors. Problems that can be fixed in place are processed and flagged as warnings.

Parameters:verbose (int) – controls print output, should be in range 0-5, each higher level includes the messages of each level below it. Where verbose = 0, nothing will be printed to console. Where verbose = 1, print only errors to console, where verbose = 2, also print warnings, where verbose = 3, also print suggestions and status checks, where verbose = 4, also print passing validation checks, where verbose = 5, also print description of configuration conditions. Defaults to None; if none, replace with self.verbose attribute
boolean representing whether there are any errors that
prevent validation
Return type:Boolean


>>> MAP_PATH = "resources/mapping_configuration_files/"
>>> EX_MAP_2 = pd.read_csv(MAP_PATH + "example_config_2.csv")
>>> c = Configuration(EX_MAP_2)
>>> c.validate(verbose=4)
Validating Mapping Configuration . . .
[X]          All expected columns ('New Column Name', 'New Column Documentation', 'Source Column ID', 'Source Column Documentation', 'Relationship', 'Condition', and 'Prerequisite') accounted for in configuration file.
[X]          No leading/trailing spaces column New Column Name detected.
[X]          No leading/trailing spaces column Relationship detected.
[X]          No leading/trailing spaces column Prerequisite detected.
[X]          No leading/trailing spaces column Condition detected.
[X]          No whitespace in column Condition detected.
[X]          No upper case value(s) in column Relationship detected.
[X]          No upper case value(s) in column Condition detected.
[X]          No non-alphanumeric value(s) in column Source Column ID detected.
[X]          No non-alphanumeric value(s) in column Relationship detected.
[X]          No non-alphanumeric value(s) in column Condition detected.
[X]          No new column(s) listed but not defined in Mapping Configuration detected.
[X]          No NA's in column New Column Name detected.
[X]          No NA's in column Source Column ID detected.
[!]          3 values in Relationship column were invalid ('eqqqq', 'another fake', and 'gee'). These must be a valid method of pd.Series, e.g. ('gt', 'ge', 'lt', 'le', 'between', 'eq', 'ne', and 'contains') to be valid.
[!]          2 row(s) containing a numerical relationship with non-number condition detected in row(s) #8, and #9.
[!]          2 values in Prerequisite column were invalid ('ABDOMM', and 'Placeholder here'). These must be defined in the 'new column name' column of the config file to be valid.
[?]          2 whitespace in column New Column Name detected in row(s) #6, and #8. Whitespace will be converted to '_'
[?]          1 whitespace in column Relationship detected in row(s) #4. Whitespace will be converted to '_'
[?]          1 whitespace in column Prerequisite detected in row(s) #9. Whitespace will be converted to '_'
[?]          1 non-alphanumeric value(s) in column New Column Name detected in row(s) #6. This text should be alphanumeric. Non-alphanumeric characters will be removed.
[?]          2 duplicate row(s) detected in row(s) #1, and #14. Duplicates will be dropped.
[?]          1 NA's in column Relationship detected in row(s) #3.
[?]          1 NA's in column Condition detected in row(s) #6.
class configuration.CrossVA(raw_data, mapping_config, na_values=['dk', 'ref', ''], verbose=2)[source]

Bases: object

Class representing raw VA data, and how to map it to an algorithm

  • mapping (type) – a validated Configuration object that details how to transform the type of data in raw_data to the desired output.
  • data (Pandas DataFrame) – a Pandas DataFrame containing the raw VA data
  • prepared_data (Pandas DataFrame) – a Pandas DataFrame containing a prepared form of the VA data to use with the Configuration object.
  • validation (Validation) – Validation object containing the validation checks that have been made on the raw data and between the raw data and mapping Configuration.
  • verbose (int) – Controls verbosity of printing to console, 0-5 where 0 is silent.

Applies the given configuration object’s mappings to the given raw data.

Args: None

Returns:a dataframe where the transformations specified have been applied to the raw data, resulting
Return type:Pandas DataFrame

Validates that RawVAData’s raw input data and its mapping configuration object are compatible and prepares input data for use.

Parameters:verbose (int) – int from 0 to 5, representing verbosity of printing to console. Defaults to None; if None, replaced with self.verbose attribute.
Returns:True if valid, False if not.
Return type:boolean


>>> MAP_PATH = "resources/mapping_configuration_files/"
>>> EX_MAP_1 = pd.read_csv(MAP_PATH + "example_config_1.csv")
>>> EX_DATA_1 = pd.read_csv("resources/sample_data/mock_data_2016WHO151.csv")
>>> CrossVA(EX_DATA_1, Configuration(EX_MAP_1)).validate(verbose=0)