eland.
eland_to_pandas
Convert an eland.Dataframe to a pandas.DataFrame
Note: this loads the entire Elasticsearch index into in core pandas.DataFrame structures. For large indices this can create significant load on the Elasticsearch cluster and require signficant memory
The source eland.Dataframe referencing the Elasticsearch index
Output progress of option to stdout? By default False.
pandas.DataFrame contains all rows and columns in eland.DataFrame
See also
eland.pandas_to_eland
Create an eland.Dataframe from pandas.DataFrame
Examples
>>> ed_df = ed.DataFrame('localhost', 'flights').head() >>> type(ed_df) <class 'eland.dataframe.DataFrame'> >>> ed_df AvgTicketPrice Cancelled ... dayOfWeek timestamp 0 841.265642 False ... 0 2018-01-01 00:00:00 1 882.982662 False ... 0 2018-01-01 18:27:00 2 190.636904 False ... 0 2018-01-01 17:11:14 3 181.694216 True ... 0 2018-01-01 10:33:28 4 730.041778 False ... 0 2018-01-01 05:13:00 <BLANKLINE> [5 rows x 27 columns]
Convert eland.DataFrame to pandas.DataFrame (Note: this loads entire Elasticsearch index into core memory)
>>> pd_df = ed.eland_to_pandas(ed_df) >>> type(pd_df) <class 'pandas.core.frame.DataFrame'> >>> pd_df AvgTicketPrice Cancelled ... dayOfWeek timestamp 0 841.265642 False ... 0 2018-01-01 00:00:00 1 882.982662 False ... 0 2018-01-01 18:27:00 2 190.636904 False ... 0 2018-01-01 17:11:14 3 181.694216 True ... 0 2018-01-01 10:33:28 4 730.041778 False ... 0 2018-01-01 05:13:00 <BLANKLINE> [5 rows x 27 columns]
Convert eland.DataFrame to pandas.DataFrame and show progress every 10000 rows
>>> pd_df = ed.eland_to_pandas(ed.DataFrame('localhost', 'flights'), show_progress=True) # doctest: +SKIP 2020-01-29 12:43:36.572395: read 10000 rows 2020-01-29 12:43:37.309031: read 13059 rows