eland.DataFrame

class eland.DataFrame(es_client: Union[str, List[str], Tuple[str, ...], Elasticsearch, None] = None, es_index_pattern: Optional[str] = None, columns: Optional[List[str]] = None, es_index_field: Optional[str] = None, _query_compiler: Optional[QueryCompiler] = None)

Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns) referencing data stored in Elasticsearch indices. Where possible APIs mirror pandas.DataFrame APIs. The underlying data is stored in Elasticsearch rather than core memory.

Parameters
es_client: Elasticsearch client argument(s) (e.g. ‘localhost:9200’)
  • elasticsearch-py parameters or

  • elasticsearch-py instance

es_index_pattern: str

Elasticsearch index pattern. This can contain wildcards. (e.g. ‘flights’)

columns: list of str, optional

List of DataFrame columns. A subset of the Elasticsearch index’s fields.

es_index_field: str, optional

The Elasticsearch index field to use as the DataFrame index. Defaults to _id if None is used.

See also

pandas.DataFrame

Examples

Constructing DataFrame from an Elasticsearch configuration arguments and an Elasticsearch index

>>> df = ed.DataFrame('localhost:9200', 'flights')
>>> df.head()
   AvgTicketPrice  Cancelled  ... dayOfWeek           timestamp
0      841.265642      False  ...         0 2018-01-01 00:00:00
1      882.982662      False  ...         0 2018-01-01 18:27:00
2      190.636904      False  ...         0 2018-01-01 17:11:14
3      181.694216       True  ...         0 2018-01-01 10:33:28
4      730.041778      False  ...         0 2018-01-01 05:13:00
<BLANKLINE>
[5 rows x 27 columns]

Constructing DataFrame from an Elasticsearch client and an Elasticsearch index

>>> from elasticsearch import Elasticsearch
>>> es = Elasticsearch("localhost:9200")
>>> df = ed.DataFrame(es_client=es, es_index_pattern='flights', columns=['AvgTicketPrice', 'Cancelled'])
>>> df.head()
   AvgTicketPrice  Cancelled
0      841.265642      False
1      882.982662      False
2      190.636904      False
3      181.694216       True
4      730.041778      False
<BLANKLINE>
[5 rows x 2 columns]

Constructing DataFrame from an Elasticsearch client and an Elasticsearch index, with ‘timestamp’ as the DataFrame index field (TODO - currently index_field must also be a field if not _id)

>>> df = ed.DataFrame(
...     es_client='localhost',
...     es_index_pattern='flights',
...     columns=['AvgTicketPrice', 'timestamp'],
...     es_index_field='timestamp'
... )
>>> df.head()
                     AvgTicketPrice           timestamp
2018-01-01T00:00:00      841.265642 2018-01-01 00:00:00
2018-01-01T00:02:06      772.100846 2018-01-01 00:02:06
2018-01-01T00:06:27      159.990962 2018-01-01 00:06:27
2018-01-01T00:33:31      800.217104 2018-01-01 00:33:31
2018-01-01T00:36:51      803.015200 2018-01-01 00:36:51
<BLANKLINE>
[5 rows x 2 columns]