mappings module

Defines MapCondition class and its subclasses, each represent a single condition that uses a relationship to transform raw data into a boolean column while preserving the NA values.

class mappings.BetweenCondition(condition_row)[source]

Bases: mappings.NumMapCondition

Subclass of NumMapCondition that overrides __init__ and .check() methods for the between relationship

Variables:
  • low (float) – a float representing the lowest acceptable value (incl)
  • high (float) – a float representing the highest acceptable value (incl)
possible_values()[source]

generate a non-exhaustive list of possible values implied by the condition

Args: None

Returns:a list of integers between self.low - 1 and self.high + 2
Return type:list

Examples

>>> BetweenCondition({"Condition" : "3 to 5",  "New Column Name" : "test new column name",  "Relationship" : "between",  "Prerequisite" : None,  "Source Column ID" : "source_test_2"}  ).possible_values()
[2.0, 3.0, 4.0, 5.0, 6.0]
class mappings.ContainsCondition(condition_row)[source]

Bases: mappings.StrMapCondition

Subclass of StrMapCondition that overrides ._run_check() method for the contains relationship

class mappings.MapCondition(condition_row)[source]

Bases: abc.ABC

Abstract class representing a single mapped condition in the mapping data, which gives instructions to transform the raw input data into the form needed for a VA instrument. The main configuration class is composed of these.

Variables:
  • name (str) – the name of the new column to be created
  • relationship (str) – the relationship of the input data to the condition Should be one of “ge” (greater than or equal to), “gt” (greater than), “le” (less than or equal to), “lt” (less than), “eq” (equal to), “ne” (not equal to), “contains” (if string contains) or “between” (between the two numbers, inclusive).
  • preq_column (str or None) – name of the pre-requisite column if it exists, or None if no pre-requisite
  • source (str) – the name of the column to be checked
check(prepared_data)[source]

Checks the condition against dataframe. Do not check NAs, just add them back afterward.

Parameters:prepared_data (Pandas DataFrame) – a dataframe containing a created column with the name specified in self.source_dtype
Returns:returns a boolean array where the condition is met (as float)
Return type:Array

Examples

>>> test_df = pd.DataFrame({"source_test_str": ["test condition", "test condition 2", np.nan], "source_test_num": [4, 5, np.nan]})
>>> StrMapCondition({"Condition" : "test condition", "New Column Name" : "test new column name", "Relationship" : "eq", "Prerequisite" : None, "Source Column ID" : "source_test"}).check(test_df)
array([ 1., 0., nan])
>>> NumMapCondition({"Condition" : 4.5, "New Column Name" : "test new column name", "Relationship" : "ge", "Prerequisite" : None, "Source Column ID" : "source_test"}).check(test_df)
array([ 0., 1., nan])
check_prereq(transformed_data)[source]

checks for pre-req column status; if there is no pre-req, returns true, else looks up values of pre-req column from transformed_data

Parameters:transformed_data (Pandas DataFrame) – the new dataframe being created, which contains any pre-req columns
Returns:
representing whether pre-req is
satisfied
Return type:boolean or boolean pd.series

Examples

>>> test_df = pd.DataFrame({"preq_one": np.repeat(True,5),  "preq_two": np.repeat(False, 5)})

If there is no pre-req, simply returns True (1) Pandas can interpret this in boolean indexing.

>>> NumMapCondition({"Condition" : 4.5,  "New Column Name" : "test new column name",  "Relationship" : "ge",  "Prerequisite" : None,  "Source Column ID" : "source_test"}  ).check_prereq(test_df)
1

If there is a pre-req, then returns the value of transformed_data with that column.

>>> NumMapCondition({"Condition" : 4.5,  "New Column Name" : "test new column name",  "Relationship" : "ge",  "Prerequisite" : "preq_one",  "Source Column ID" : "source_test"}  ).check_prereq(test_df)
0    True
1    True
2    True
3    True
4    True
Name: preq_one, dtype: bool
>>> NumMapCondition({"Condition" : 4.5,  "New Column Name" : "test new column name",  "Relationship" : "ge",  "Prerequisite" : "preq_two",  "Source Column ID" : "source_test"}  ).check_prereq(test_df)
0    False
1    False
2    False
3    False
4    False
Name: preq_two, dtype: bool
describe()[source]

just a wrapper for the __str__ function

factory(condition='')[source]

static class factory method, which determines which subclass to return

Parameters:
  • relationship (str) – a relationship in (gt, ge, lt, le, ne, eq, contains, between) that represents a comparison to be made to the raw data
  • condition (str or int) – the condition being matched. if relationship is ambiguous, then this determins if condition is numerical or string. Defaults to empty string.
Returns:

returns specific subclass that corresponds to the correct relationship

Return type:

MapCondition

Examples

>>> MapCondition.factory("ge") #doctest: +ELLIPSIS
<class '...NumMapCondition'>
>>> MapCondition.factory("eq", 0) #doctest: +ELLIPSIS
<class '...NumMapCondition'>
>>> MapCondition.factory("eq") #doctest: +ELLIPSIS
<class '...StrMapCondition'>
>>> MapCondition.factory("contains") #doctest: +ELLIPSIS
<class '...ContainsCondition'>
>>> MapCondition.factory("between") #doctest: +ELLIPSIS
<class '...BetweenCondition'>
>>> MapCondition.factory("eqq") #doctest: +ELLIPSIS
Traceback (most recent call last):
AssertionError: No defined Condition class for eqq type
possible_values

abstract method stub generate a non-exhaustive list possible values implied by condition

prepare_data(raw_data)[source]

prepares raw_data by ensuring dtypes are correct for each comparison

Parameters:raw_data (dataframe) – a data frame containing raw data, including the column given in self.source_name.
Returns:the column in raw_data named in self.source_name, with the attribute self.prep_func applied to it.
Return type:Pandas Series
class mappings.NumMapCondition(condition_row, cast_cond=True)[source]

Bases: mappings.MapCondition

class representing a numerical condition, inherits from MapCondition

Variables:
  • source_dtype (str) – a copy of the instance attribute self.source_name with “_num” appended, to represent the expected dtype
  • prep_func (function) – class attr, a function to apply before making a numerical-based comparison. pd.to_numeric() coerces non-number data to NaN.
possible_values()[source]

generate a non-exhaustive list of possible values implied by condition

Args: None

Returns:
list containing range of possible values. If a greater than
relationship, the list will include ints from self.condition + 1 to self.condition*2. If a less than relationship, it will include values from 0 to self.condition. If the condition includes “equal to”, then self.condition will also be included.
Return type:list

Examples

>>> NumMapCondition({"Condition" : 3,  "New Column Name" : "test new name",  "Relationship" : "ge",  "Prerequisite" : None,  "Source Column ID" : "source_test"}).possible_values()
[4.0, 5.0, 3.0]
>>> NumMapCondition({"Condition" : 3,  "New Column Name" : "test new name",  "Relationship" : "lt",  "Prerequisite" : None,  "Source Column ID" : "source_test"}).possible_values()
[0.0, 1.0, 2.0]
class mappings.StrMapCondition(condition_row)[source]

Bases: mappings.MapCondition

class representing a str condition, inherits from MapCondition

Variables:
  • source_dtype (str) – instance attribute, a copy of the instance attribute self.source_name with “_str” appended, to represent the expected dtype
  • prep_func (function) – class attribute, a function to apply before making a string-based comparison. It preserves null values but changes all else to str.
possible_values()[source]

generate a non-exhaustive list possible values implied by condition

Args: None

Returns:
list containing 4 possible values (empty string, NA, None,
and the self.condition attribute) that might be expected by this condition
Return type:list

Examples

>>> StrMapCondition({"Condition" : "test condition",  "New Column Name" : "test new column name",  "Relationship" : "eq",  "Prerequisite" : None,  "Source Column ID" : "source_test"}  ).possible_values()
['', nan, None, 'test condition', 'yes', 'no', 'dk', 'ref']