Evaluation Handlers
This module provides three classes
• BaseEvaluateHandler: Contains core evaluation methods (data loading, model evaluation, metric calculation, bootstrap confidence interval computation, confusion matrix plotting). • StreamlitEvaluateHandler: Inherits from BaseEvaluateHandler and integrates with a Streamlit UI. • FastAPIEvaluateHandler: Inherits from BaseEvaluateHandler and exposes a FastAPI–friendly method.
Note
This module assumes that the underlying evaluation components (DataLoader, Evaluator, compute_bootstrap_confidence_intervals, etc.) are available.
BaseEvaluateHandler
Provides core evaluation functionality.
Source code in LabeLMaker/evaluate_handler.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
|
compare_methods(df, ground_truth_col, selected_methods)
Compare prediction methods (e.g. Zero Shot, Few Shot, Many Shot) by evaluating predictions in multiple columns. Returns (common_df, results, confusion_matrices)
Source code in LabeLMaker/evaluate_handler.py
101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
|
evaluate_model(df, pred_col, ground_truth_col, n_bootstraps=1000, alpha=0.05)
Evaluate model predictions and compute associated metrics. Returns a tuple: (metrics_df, classification_report_df, bootstrap_df, confusion_matrix_fig)
Source code in LabeLMaker/evaluate_handler.py
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
|
FastAPIEvaluateHandler
Bases: BaseEvaluateHandler
Provides a FastAPI–friendly evaluation method.
Source code in LabeLMaker/evaluate_handler.py
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 |
|
fastapi_evaluate(data, request)
Execute evaluation and return results as a JSON–serializable dictionary. Expects that request defines: • ground_truth_column • pred_column • Optional: n_bootstraps, alpha
Source code in LabeLMaker/evaluate_handler.py
149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 |
|
StreamlitEvaluateHandler
Bases: BaseEvaluateHandler
Integrates the evaluation workflow with a Streamlit UI.
Source code in LabeLMaker/evaluate_handler.py
131 132 133 134 135 136 137 138 |
|