eland.ml.MLModel.import_model#
- classmethod MLModel.import_model(es_client: Union[str, List[str], Tuple[str, ...], Elasticsearch], model_id: str, model: Union[DecisionTreeClassifier, DecisionTreeRegressor, RandomForestRegressor, RandomForestClassifier, XGBClassifier, XGBRanker, XGBRegressor, LGBMRegressor, LGBMClassifier], feature_names: List[str], classification_labels: Optional[List[str]] = None, classification_weights: Optional[List[float]] = None, es_if_exists: Optional[str] = None, es_compress_model_definition: bool = True) MLModel #
Transform and serialize a trained 3rd party model into Elasticsearch. This model can then be used for inference in the Elastic Stack.
Parameters#
- es_client: Elasticsearch client argument(s)
elasticsearch-py parameters or
elasticsearch-py instance
- model_id: str
The unique identifier of the trained inference model in Elasticsearch.
- model: An instance of a supported python model. We support the following model types:
- sklearn.tree.DecisionTreeClassifier
- NOTE: When calculating the probabilities of a given classification label, Elasticsearch utilizes
softMax. SKLearn instead normalizes the results. We try to account for this during model serialization, but probabilities may be slightly different in the predictions.
sklearn.tree.DecisionTreeRegressor
sklearn.ensemble.RandomForestRegressor
sklearn.ensemble.RandomForestClassifier
- lightgbm.LGBMRegressor
Categorical fields are expected to already be processed
- Only the following objectives are supported
“regression”
“regression_l1”
“huber”
“fair”
“quantile”
“mape”
- lightgbm.LGBMClassifier
Categorical fields are expected to already be processed
- Only the following objectives are supported
“binary”
“multiclass”
“multiclassova”
- xgboost.XGBClassifier
- only the following objectives are supported:
“binary:logistic”
“multi:softmax”
“multi:softprob”
- xgboost.XGBRanker
- only the following objectives are supported:
“rank:map”
“rank:ndcg”
“rank:pairwise”
- xgboost.XGBRegressor
- only the following objectives are supported:
“reg:squarederror”
“reg:linear”
“reg:squaredlogerror”
“reg:logistic”
“reg:pseudohubererror”
- feature_names: List[str]
Names of the features (required)
- classification_labels: List[str]
Labels of the classification targets
- classification_weights: List[str]
Weights of the classification targets
- es_if_exists: {‘fail’, ‘replace’} default ‘fail’
How to behave if model already exists
fail: Raise a Value Error
replace: Overwrite existing model
- es_compress_model_definition: bool
If True will use ‘compressed_definition’ which uses gzipped JSON instead of raw JSON to reduce the amount of data sent over the wire in HTTP requests. Defaults to ‘True’.
Examples#
>>> from sklearn import datasets >>> from sklearn.tree import DecisionTreeClassifier >>> from eland.ml import MLModel
>>> # Train model >>> training_data = datasets.make_classification(n_features=5, random_state=0) >>> test_data = [[-50.1, 0.2, 0.3, -0.5, 1.0], [1.6, 2.1, -10, 50, -1.0]] >>> classifier = DecisionTreeClassifier() >>> classifier = classifier.fit(training_data[0], training_data[1])
>>> # Get some test results >>> classifier.predict(test_data) array([0, 1])
>>> # Serialise the model to Elasticsearch >>> feature_names = ["f0", "f1", "f2", "f3", "f4"] >>> model_id = "test_decision_tree_classifier" >>> es_model = MLModel.import_model( ... 'http://localhost:9200', ... model_id=model_id, ... model=classifier, ... feature_names=feature_names, ... es_if_exists='replace' ... )
>>> # Get some test results from Elasticsearch model >>> es_model.predict(test_data) array([0, 1])
>>> # Delete model from Elasticsearch >>> es_model.delete_model()