get_lr_penalty(X, y, alpha=1)

The function takes in a matrix of predictors and a vector of outcomes, and returns the minimum and maximum values of the regularization parameter lambda that should be used in a logistic regression model.

The function is based on the paper Regularization Paths for Generalized Linear Models via Coordinate Descent by Friedman, Hastie, and Tibshirani.

Parameters:
  • X

    the data matrix

  • y

    the target variable

  • alpha

    the elastic net mixing parameter. Defaults to 1

Returns:
  • the minimum and maximum values of the regularization parameter lambda.

Source code in postpacu/utils.py
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
def get_lr_penalty(X, y, alpha=1):
    """
    The function takes in a matrix of predictors and a vector of outcomes, and returns the minimum and
    maximum values of the regularization parameter lambda that should be used in a logistic regression
    model.

    The function is based on the paper [Regularization Paths for Generalized Linear Models via
    Coordinate Descent](http://www.jstatsoft.org/v33/i01/paper) by Friedman, Hastie, and Tibshirani.

    Args:
      X: the data matrix
      y: the target variable
      alpha: the elastic net mixing parameter. Defaults to 1

    Returns:
      the minimum and maximum values of the regularization parameter lambda.
    """
    # recreate lambda.max from R glmnet according to paper description
    # http://www.jstatsoft.org/v33/i01/paper
    # lambda.min in glmnet does not work as paper describes;
    # however, I'm sticking with the paper's description here for lambda_min
    # returns C_min, C_max for use in sklearn logistic regression

    # ensure numeric stability -- matching what glmnet does
    if alpha < 0.000001:
        alpha = 0.000001

    # paper says scale y, but that's for regression
    # I can only recreate glmnet binomial lambda.max result by setting proportions of classes
    # see discussion:
    # https://stackoverflow.com/questions/25257780/how-does-glmnet-compute-the-maximal-lambda-value
    counter = Counter(y)
    negative_prop = counter[0] / (counter[1] + counter[0])
    positive_prop = 1 - negative_prop

    if y.ndim == 1:
        y = np.asarray(y)
        y = y[:, np.newaxis]

    if X.shape[0] < X.shape[1]:
        lambda_min_ratio = 0.01
    else:
        lambda_min_ratio = 1e-04

    sdX = np.apply_along_axis(mysd, 0, X)
    sX = scale(X, with_std=False) / sdX[None, :]
    sy = np.where(y == 0, -1 * positive_prop, negative_prop)[:]
    lambda_max = max(abs(np.sum(sX * sy, axis=0))) / (len(sy) * alpha)
    lambda_min = lambda_min_ratio * lambda_max

    # sklearn "C" and glmnet "lambda" are inverse of one another
    return 1 / lambda_max, 1 / lambda_min

load_dict(filepath)

Load a dictionary from a JSON's filepath.

Parameters:
  • filepath

    The filepath to the JSON file.

Returns:
  • A dictionary.

Source code in postpacu/utils.py
25
26
27
28
29
30
31
32
33
34
35
36
37
38
def load_dict(filepath):
    """
    Load a dictionary from a JSON's filepath.

    Args:
      filepath: The filepath to the JSON file.

    Returns:
      A dictionary.
    """

    with open(filepath, "r") as fp:
        d = json.load(fp)
    return d

load_json_from_url(url)

It takes a URL as input, opens the URL, reads the data, loads the data into a JSON object, and returns the JSON object

Parameters:
  • url

    the url of the API endpoint

Returns:
  • A dictionary

Source code in postpacu/utils.py
10
11
12
13
14
15
16
17
18
19
20
21
22
def load_json_from_url(url):
    """
    It takes a URL as input, opens the URL, reads the data, loads the data into a JSON object, and
    returns the JSON object

    Args:
      url: the url of the API endpoint

    Returns:
      A dictionary
    """
    data = json.loads(urlopen(url).read())
    return data

mysd(y)

The function mysd takes a vector y as input and returns the standard deviation of y.

Parameters:
  • y

    the array of values

Returns:
  • The standard deviation of the data set.

Source code in postpacu/utils.py
70
71
72
73
74
75
76
77
78
79
80
81
def mysd(y):
    """
    The function mysd takes a vector y as input and returns the standard deviation of y.

    Args:
      y: the array of values

    Returns:
      The standard deviation of the data set.
    """

    return np.sqrt(np.sum(np.square((y - np.mean(y)))) / len(y))

print_experiment_info(experiment)

This function prints the name, experiment ID, artifact location, and lifecycle stage of an experiment.

Parameters:
  • experiment

    The experiment object that contains the mlflow experiment information.

Source code in postpacu/utils.py
138
139
140
141
142
143
144
145
146
147
148
149
150
def print_experiment_info(experiment):
    """
    This function prints the name, experiment ID, artifact location, and lifecycle stage of an
    experiment.

    Args:
      experiment: The experiment object that contains the mlflow experiment information.
    """

    print("Name: {}".format(experiment.name))
    print("Experiment ID: {}".format(experiment.experiment_id))
    print("Artifact Location: {}".format(experiment.artifact_location))
    print("Lifecycle_stage: {}".format(experiment.lifecycle_stage))

print_run_info(run)

It prints out the run_id, experiment_id, params, artifact_uri, and status of a run

Parameters:
  • run

    The run object that contains the mlflow run information.

Source code in postpacu/utils.py
153
154
155
156
157
158
159
160
161
162
163
164
165
def print_run_info(run):
    """
    It prints out the run_id, experiment_id, params, artifact_uri, and status of a run

    Args:
      run: The run object that contains the mlflow run information.
    """

    print("run_id: {}".format(run.info.run_id))
    print("experiment_id: {}".format(run.info.experiment_id))
    print("params: {}".format(run.data.params))
    print("artifact_uri: {}".format(run.info.artifact_uri))
    print("status: {}".format(run.info.status))

save_dict(d, filepath, cls=None, sortkeys=False)

It saves a dictionary to a specific location.

Parameters:
  • d

    The dictionary to save.

  • filepath

    The path to the file to save the dictionary to.

  • cls

    A custom JSONEncoder subclass. If specified, the object will use this encoder instead of the

default. sortkeys: If True, the keys of the dictionary are sorted before writing. Defaults to False

Source code in postpacu/utils.py
41
42
43
44
45
46
47
48
49
50
51
52
53
54
def save_dict(d, filepath, cls=None, sortkeys=False):
    """
    It saves a dictionary to a specific location.

    Args:
      d: The dictionary to save.
      filepath: The path to the file to save the dictionary to.
      cls: A custom JSONEncoder subclass. If specified, the object will use this encoder instead of the
    default.
      sortkeys: If True, the keys of the dictionary are sorted before writing. Defaults to False
    """

    with open(filepath, "w") as fp:
        json.dump(d, indent=2, fp=fp, cls=cls, sort_keys=sortkeys)

set_seeds(seed=42)

set_seeds sets the seed for reproducibility

Parameters:
  • seed

    The seed for the random number generator. Defaults to 42

Source code in postpacu/utils.py
57
58
59
60
61
62
63
64
65
66
67
def set_seeds(seed=42):
    """
    `set_seeds` sets the seed for reproducibility

    Args:
      seed: The seed for the random number generator. Defaults to 42
    """

    # Set seeds
    np.random.seed(seed)
    random.seed(seed)