configuration module

Structure for Configuration class

class configuration.Configuration(config_data, verbose=1, process_strings=True)[source]

Bases: object

Configuration class details the relationship between a set of input data and output data. It is composed of MapConditions that transform an input data source (2012 WHO, 2016 WHO 141, 2016 WHO 151, PHRMC SHORT) into a different data form (PHRMC SHORT, InSilicoVA, InterVA4, InterVA5, or Tarrif2) for verbal autopsy.

Variables:
  • given_columns (Pandas Series) – columns of mapping dataframe.
  • required_columns (Pandas Series) – required columns in mapping data.
  • main_columns (list) – the four main columns required in config_data.
  • valid_relationships (Pandas Series) – contains list of valid relationships to use in comparisons. Relationships should be an attr of Pandas Series object, or be defined as a subclass of MapCondition.
  • config_data (Pandas DataFrame) – dataframe containing mapping relationships written out.
  • given_prereq (Pandas Series) – lists pre-requisites referenced in config data.
  • new_columns (Pandas Series) – lists the new columns to be created with config data.
  • source_columns (Pandas Series) – lists the source columns required in the raw input data.
  • verbose (int) – controls default verbosity of printing to console.
  • process_strings (boolean) – whether or not to remove whitespace and non-alphanumeric characters from strings in condition field and in raw_data during mapping.
  • validation (Validation) – a validation object containing the validation checks made
describe()[source]

Prints the mapping relationships in the Configuration object to console.

Parameters:None
Returns:None

Examples

>>> MAP_PATH = "resources/mapping_configuration_files/"
>>> EX_MAP_1 = pd.read_csv(MAP_PATH + "example_config_1.csv")
>>> Configuration(EX_MAP_1).describe()
MAPPING STATS
<BLANKLINE>
 -   16 new columns produced ('AB_POSIT', 'AB_SIZE', 'AC_BRL', 'AC_CONV', 'AC_COUGH', etc)
 -   12 source columns required ('Id10403', 'Id10362', 'Id10169', 'Id10221', 'Id10154', etc)
 -   7 relationships invoked ('eq', 'lt', 'between', 'ge', 'contains', etc)
 -   13 conditions listed ('yes', '14', '10', '21', '15 to 49', etc)
 -   1 prerequisites checked ('FEMALE')
list_conditions()[source]

Lists the final mapping conditions contained in Configuration object

Returns:list of MapConditions, where each MapConditions is created from a row of processed mapping data.
Return type:list

Examples

>>> MAP_PATH = "resources/mapping_configuration_files/"
>>> EX_MAP_1 = pd.read_csv(MAP_PATH + "example_config_1.csv")
>>> c = Configuration(EX_MAP_1)
>>> c.list_conditions()[:5]
[<StrMapCondition:     AB_POSIT = [column Id10403].eq(yes)>,
 <StrMapCondition:     AB_SIZE = [column Id10362].eq(yes)>,
 <NumMapCondition:     AC_BRL = [column Id10169].lt(14.0)>,
 <NumMapCondition:     AC_CONV = [column Id10221].lt(10.0)>,
 <NumMapCondition:     AC_COUGH = [column Id10154].lt(21.0)>]
main_columns = ['New Column Name', 'Source Column ID', 'Relationship', 'Condition']
required_columns = 0 New Column Name 1 New Column Documentation 2 Source Column ID 3 Source Column Documentation 4 Relationship 5 Condition 6 Prerequisite Name: expected columns, dtype: object
valid_relationships = 0 gt 1 ge 2 lt 3 le 4 between 5 eq 6 ne 7 contains Name: valid relationships, dtype: object
validate(verbose=None)[source]

Prepares and validates the Configuration object’s mapping conditions. Validation fails if there are any inoperable errors. Problems that can be fixed in place are processed and flagged as warnings.

Parameters:verbose (int) – controls print output, should be in range 0-5, each higher level includes the messages of each level below it. Where verbose = 0, nothing will be printed to console. Where verbose = 1, print only errors to console, where verbose = 2, also print warnings, where verbose = 3, also print suggestions and status checks, where verbose = 4, also print passing validation checks, where verbose = 5, also print description of configuration conditions. Defaults to None; if none, replace with self.verbose attribute
Returns:
boolean representing whether there are any errors that
prevent validation
Return type:Boolean

Examples

>>> MAP_PATH = "resources/mapping_configuration_files/"
>>> EX_MAP_2 = pd.read_csv(MAP_PATH + "example_config_2.csv")
>>> c = Configuration(EX_MAP_2)
>>> c.validate(verbose=4)
Validating Mapping Configuration . . .
<BLANKLINE>
 CHECKS PASSED
[X]          All expected columns ('New Column Name', 'New Column Documentation', 'Source Column ID', 'Source Column Documentation', 'Relationship', 'Condition', and 'Prerequisite') accounted for in configuration file.
[X]          No leading/trailing spaces column New Column Name detected.
[X]          No leading/trailing spaces column Relationship detected.
[X]          No leading/trailing spaces column Prerequisite detected.
[X]          No leading/trailing spaces column Condition detected.
[X]          No whitespace in column Condition detected.
[X]          No upper case value(s) in column Relationship detected.
[X]          No upper case value(s) in column Condition detected.
[X]          No non-alphanumeric value(s) in column Source Column ID detected.
[X]          No non-alphanumeric value(s) in column Relationship detected.
[X]          No non-alphanumeric value(s) in column Condition detected.
[X]          No new column(s) listed but not defined in Mapping Configuration detected.
[X]          No NA's in column New Column Name detected.
[X]          No NA's in column Source Column ID detected.
<BLANKLINE>
 ERRORS
[!]          3 values in Relationship column were invalid ('eqqqq', 'another fake', and 'gee'). These must be a valid method of pd.Series, e.g. ('gt', 'ge', 'lt', 'le', 'between', 'eq', 'ne', and 'contains') to be valid.
[!]          2 row(s) containing a numerical relationship with non-number condition detected in row(s) #8, and #9.
[!]          2 values in Prerequisite column were invalid ('ABDOMM', and 'Placeholder here'). These must be defined in the 'new column name' column of the config file to be valid.
<BLANKLINE>
 WARNINGS
[?]          2 whitespace in column New Column Name detected in row(s) #6, and #8. Whitespace will be converted to '_'
[?]          1 whitespace in column Relationship detected in row(s) #4. Whitespace will be converted to '_'
[?]          1 whitespace in column Prerequisite detected in row(s) #9. Whitespace will be converted to '_'
[?]          1 non-alphanumeric value(s) in column New Column Name detected in row(s) #6. This text should be alphanumeric. Non-alphanumeric characters will be removed.
[?]          2 duplicate row(s) detected in row(s) #1, and #14. Duplicates will be dropped.
[?]          1 NA's in column Relationship detected in row(s) #3.
[?]          1 NA's in column Condition detected in row(s) #6.
False
class configuration.CrossVA(raw_data, mapping_config, na_values=['dk', 'ref', ''], verbose=2)[source]

Bases: object

Class representing raw VA data, and how to map it to an algorithm

Variables:
  • mapping (type) – a validated Configuration object that details how to transform the type of data in raw_data to the desired output.
  • data (Pandas DataFrame) – a Pandas DataFrame containing the raw VA data
  • prepared_data (Pandas DataFrame) – a Pandas DataFrame containing a prepared form of the VA data to use with the Configuration object.
  • validation (Validation) – Validation object containing the validation checks that have been made on the raw data and between the raw data and mapping Configuration.
  • verbose (int) – Controls verbosity of printing to console, 0-5 where 0 is silent.
process()[source]

Applies the given configuration object’s mappings to the given raw data.

Args: None

Returns:a dataframe where the transformations specified have been applied to the raw data, resulting
Return type:Pandas DataFrame
validate(verbose=None)[source]

Validates that RawVAData’s raw input data and its mapping configuration object are compatible and prepares input data for use.

Parameters:verbose (int) – int from 0 to 5, representing verbosity of printing to console. Defaults to None; if None, replaced with self.verbose attribute.
Returns:True if valid, False if not.
Return type:boolean

Examples

>>> MAP_PATH = "resources/mapping_configuration_files/"
>>> EX_MAP_1 = pd.read_csv(MAP_PATH + "example_config_1.csv")
>>> EX_DATA_1 = pd.read_csv("resources/sample_data/mock_data_2016WHO151.csv")
>>> CrossVA(EX_DATA_1, Configuration(EX_MAP_1)).validate(verbose=0)
True