Skip to content

medcat.utils.training_utils

Functions:

cheating_component

Creates and uses a cheating component within the pipe.

This component will "predict" entities as per the predictor it is given.

Parameters:

Source code in medcat-v2/medcat/utils/training_utils.py
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
@contextmanager
def cheating_component(
        cat: CAT,
        comp_type: CoreComponentType,
        predictor: Callable[[MutableDocument], list[MutableEntity]]):
    """Creates and uses a cheating component within the pipe.

    This component will "predict" entities as per the predictor it is given.

    Args:
        cat (CAT): The model pack.
        comp_type (CoreComponentType): The component type (generally NER or linker).
        predictor (Callable[[MutableDocument], list[MutableEntity]]):
            The predictor to use.
    """
    comps_list = cat.pipe._components
    # find original index
    original_comp = cat.pipe.get_component(comp_type)
    replace_index = comps_list.index(original_comp)
    # create and replace
    cheater = _CheatingComponent(comp_type, predictor)
    comps_list[replace_index] = cheater
    try:
        yield
    finally:
        # restore original component
        comps_list[replace_index] = original_comp

dataset_aware_component

Creates and uses a dataset aware component within the pipe.

This simplfies trainin for and evaluating one component at a time by swapping out the other component for one that has perfect performance since it knows the dataset.

Parameters:

Source code in medcat-v2/medcat/utils/training_utils.py
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
@contextmanager
def dataset_aware_component(
        cat: CAT,
        comp_type: CoreComponentType,
        dataset: MedCATTrainerExport):
    """Creates and uses a dataset aware component within the pipe.

    This simplfies trainin for and evaluating one component at
    a time by swapping out the other component for one that has
    perfect performance since it knows the dataset.

    Args:
        cat (CAT): The model pack.
        comp_type (CoreComponentType): The component type.
        dataset (MedCATTrainerExport): The dataset in question.
    """
    _check_dataset(dataset)
    tokens2entity = cat.pipe.tokenizer.entity_from_tokens_in_doc
    predictor = _create_predictor(comp_type, dataset, tokens2entity)
    with cheating_component(cat, comp_type, predictor):
        yield