eland.pandas_to_eland

eland.pandas_to_eland(pd_df, es_client, es_dest_index, es_if_exists='fail', es_refresh=False, es_dropna=False, es_geo_points=None, chunksize=None)

Append a pandas DataFrame to an Elasticsearch index. Mainly used in testing. Modifies the elasticsearch destination index

Parameters:
es_client: Elasticsearch client argument(s)
  • elasticsearch-py parameters or
  • elasticsearch-py instance or
  • eland.Client instance
es_dest_index: str

Name of Elasticsearch index to be appended to

es_if_exists : {‘fail’, ‘replace’, ‘append’}, default ‘fail’

How to behave if the index already exists.

  • fail: Raise a ValueError.
  • replace: Delete the index before inserting new values.
  • append: Insert new values to the existing index. Create if does not exist.
es_refresh: bool, default ‘False’

Refresh es_dest_index after bulk index

es_dropna: bool, default ‘False’
  • True: Remove missing values (see pandas.Series.dropna)
  • False: Include missing values - may cause bulk to fail
es_geo_points: list, default None

List of columns to map to geo_point data type

chunksize: int, default None

number of pandas.DataFrame rows to read before bulk index into Elasticsearch

Returns:
eland.Dataframe

eland.DataFrame referencing data in destination_index

See also

eland.read_es
Create an eland.Dataframe from an Elasticsearch index
eland.eland_to_pandas
Create a pandas.Dataframe from eland.DataFrame

Examples

>>> pd_df = pd.DataFrame(data={'A': 3.141,
...                            'B': 1,
...                            'C': 'foo',
...                            'D': pd.Timestamp('20190102'),
...                            'E': [1.0, 2.0, 3.0],
...                            'F': False,
...                            'G': [1, 2, 3]},
...                      index=['0', '1', '2'])
>>> type(pd_df)
<class 'pandas.core.frame.DataFrame'>
>>> pd_df
       A  B  ...      F  G
0  3.141  1  ...  False  1
1  3.141  1  ...  False  2
2  3.141  1  ...  False  3
<BLANKLINE>
[3 rows x 7 columns]
>>> pd_df.dtypes
A           float64
B             int64
C            object
D    datetime64[ns]
E           float64
F              bool
G             int64
dtype: object

Convert pandas.DataFrame to eland.DataFrame - this creates an Elasticsearch index called pandas_to_eland. Overwrite existing Elasticsearch index if it exists if_exists=”replace”, and sync index so it is readable on return refresh=True

>>> ed_df = ed.pandas_to_eland(pd_df,
...                            'localhost',
...                            'pandas_to_eland',
...                            es_if_exists="replace",
...                            es_refresh=True)
>>> type(ed_df)
<class 'eland.dataframe.DataFrame'>
>>> ed_df
       A  B  ...      F  G
0  3.141  1  ...  False  1
1  3.141  1  ...  False  2
2  3.141  1  ...  False  3
<BLANKLINE>
[3 rows x 7 columns]
>>> ed_df.dtypes
A           float64
B             int64
C            object
D    datetime64[ns]
E           float64
F              bool
G             int64
dtype: object