eland.eland_to_pandas

eland.eland_to_pandas(ed_df, show_progress=False)

Convert an eland.Dataframe to a pandas.DataFrame

Note: this loads the entire Elasticsearch index into in core pandas.DataFrame structures. For large indices this can create significant load on the Elasticsearch cluster and require signficant memory

Parameters:
ed_df: eland.DataFrame

The source eland.Dataframe referencing the Elasticsearch index

show_progress: bool

Output progress of option to stdout? By default False.

Returns:
pandas.Dataframe

pandas.DataFrame contains all rows and columns in eland.DataFrame

See also

eland.read_es
Create an eland.Dataframe from an Elasticsearch index
eland.pandas_to_eland
Create an eland.Dataframe from pandas.DataFrame

Examples

>>> ed_df = ed.DataFrame('localhost', 'flights').head()
>>> type(ed_df)
<class 'eland.dataframe.DataFrame'>
>>> ed_df
   AvgTicketPrice  Cancelled  ... dayOfWeek           timestamp
0      841.265642      False  ...         0 2018-01-01 00:00:00
1      882.982662      False  ...         0 2018-01-01 18:27:00
2      190.636904      False  ...         0 2018-01-01 17:11:14
3      181.694216       True  ...         0 2018-01-01 10:33:28
4      730.041778      False  ...         0 2018-01-01 05:13:00
<BLANKLINE>
[5 rows x 27 columns]

Convert eland.DataFrame to pandas.DataFrame (Note: this loads entire Elasticsearch index into core memory)

>>> pd_df = ed.eland_to_pandas(ed_df)
>>> type(pd_df)
<class 'pandas.core.frame.DataFrame'>
>>> pd_df
   AvgTicketPrice  Cancelled  ... dayOfWeek           timestamp
0      841.265642      False  ...         0 2018-01-01 00:00:00
1      882.982662      False  ...         0 2018-01-01 18:27:00
2      190.636904      False  ...         0 2018-01-01 17:11:14
3      181.694216       True  ...         0 2018-01-01 10:33:28
4      730.041778      False  ...         0 2018-01-01 05:13:00
<BLANKLINE>
[5 rows x 27 columns]

Convert eland.DataFrame to pandas.DataFrame and show progress every 10000 rows

>>> pd_df = ed.eland_to_pandas(ed.DataFrame('localhost', 'flights'), show_progress=True) # doctest: +SKIP
2020-01-29 12:43:36.572395: read 10000 rows
2020-01-29 12:43:37.309031: read 13059 rows