eland.ml.MLModel.import_model#

classmethod MLModel.import_model(es_client: Union[str, List[str], Tuple[str, ...], Elasticsearch], model_id: str, model: Union[DecisionTreeClassifier, DecisionTreeRegressor, RandomForestRegressor, RandomForestClassifier, XGBClassifier, XGBRanker, XGBRegressor, LGBMRegressor, LGBMClassifier], feature_names: List[str], classification_labels: Optional[List[str]] = None, classification_weights: Optional[List[float]] = None, es_if_exists: Optional[str] = None, es_compress_model_definition: bool = True) MLModel#

Transform and serialize a trained 3rd party model into Elasticsearch. This model can then be used for inference in the Elastic Stack.

Parameters#

es_client: Elasticsearch client argument(s)
  • elasticsearch-py parameters or

  • elasticsearch-py instance

model_id: str

The unique identifier of the trained inference model in Elasticsearch.

model: An instance of a supported python model. We support the following model types:
  • sklearn.tree.DecisionTreeClassifier
    • NOTE: When calculating the probabilities of a given classification label, Elasticsearch utilizes

      softMax. SKLearn instead normalizes the results. We try to account for this during model serialization, but probabilities may be slightly different in the predictions.

  • sklearn.tree.DecisionTreeRegressor

  • sklearn.ensemble.RandomForestRegressor

  • sklearn.ensemble.RandomForestClassifier

  • lightgbm.LGBMRegressor
    • Categorical fields are expected to already be processed

    • Only the following objectives are supported
      • “regression”

      • “regression_l1”

      • “huber”

      • “fair”

      • “quantile”

      • “mape”

  • lightgbm.LGBMClassifier
    • Categorical fields are expected to already be processed

    • Only the following objectives are supported
      • “binary”

      • “multiclass”

      • “multiclassova”

  • xgboost.XGBClassifier
    • only the following objectives are supported:
      • “binary:logistic”

      • “multi:softmax”

      • “multi:softprob”

  • xgboost.XGBRanker
    • only the following objectives are supported:
      • “rank:map”

      • “rank:ndcg”

      • “rank:pairwise”

  • xgboost.XGBRegressor
    • only the following objectives are supported:
      • “reg:squarederror”

      • “reg:linear”

      • “reg:squaredlogerror”

      • “reg:logistic”

      • “reg:pseudohubererror”

feature_names: List[str]

Names of the features (required)

classification_labels: List[str]

Labels of the classification targets

classification_weights: List[str]

Weights of the classification targets

es_if_exists: {‘fail’, ‘replace’} default ‘fail’

How to behave if model already exists

  • fail: Raise a Value Error

  • replace: Overwrite existing model

es_compress_model_definition: bool

If True will use ‘compressed_definition’ which uses gzipped JSON instead of raw JSON to reduce the amount of data sent over the wire in HTTP requests. Defaults to ‘True’.

Examples#

>>> from sklearn import datasets
>>> from sklearn.tree import DecisionTreeClassifier
>>> from eland.ml import MLModel
>>> # Train model
>>> training_data = datasets.make_classification(n_features=5, random_state=0)
>>> test_data = [[-50.1, 0.2, 0.3, -0.5, 1.0], [1.6, 2.1, -10, 50, -1.0]]
>>> classifier = DecisionTreeClassifier()
>>> classifier = classifier.fit(training_data[0], training_data[1])
>>> # Get some test results
>>> classifier.predict(test_data)
array([0, 1])
>>> # Serialise the model to Elasticsearch
>>> feature_names = ["f0", "f1", "f2", "f3", "f4"]
>>> model_id = "test_decision_tree_classifier"
>>> es_model = MLModel.import_model(
...   'http://localhost:9200',
...   model_id=model_id,
...   model=classifier,
...   feature_names=feature_names,
...   es_if_exists='replace'
... )
>>> # Get some test results from Elasticsearch model
>>> es_model.predict(test_data)
array([0, 1])
>>> # Delete model from Elasticsearch
>>> es_model.delete_model()