eland.DataFrame#
- class eland.DataFrame(es_client: Optional[Union[str, List[str], Tuple[str, ...], Elasticsearch]] = None, es_index_pattern: Optional[str] = None, columns: Optional[List[str]] = None, es_index_field: Optional[str] = None, _query_compiler: Optional[QueryCompiler] = None)#
Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns) referencing data stored in Elasticsearch indices. Where possible APIs mirror pandas.DataFrame APIs. The underlying data is stored in Elasticsearch rather than core memory.
Parameters#
- es_client: Elasticsearch client argument(s) (e.g. ‘http://localhost:9200’)
elasticsearch-py parameters or
elasticsearch-py instance
- es_index_pattern: str
Elasticsearch index pattern. This can contain wildcards. (e.g. ‘flights’)
- columns: list of str, optional
List of DataFrame columns. A subset of the Elasticsearch index’s fields.
- es_index_field: str, optional
The Elasticsearch index field to use as the DataFrame index. Defaults to _id if None is used.
See Also#
Examples#
Constructing DataFrame from an Elasticsearch configuration arguments and an Elasticsearch index
>>> df = ed.DataFrame('http://localhost:9200', 'flights') >>> df.head() AvgTicketPrice Cancelled ... dayOfWeek timestamp 0 841.265642 False ... 0 2018-01-01 00:00:00 1 882.982662 False ... 0 2018-01-01 18:27:00 2 190.636904 False ... 0 2018-01-01 17:11:14 3 181.694216 True ... 0 2018-01-01 10:33:28 4 730.041778 False ... 0 2018-01-01 05:13:00 [5 rows x 27 columns]
Constructing DataFrame from an Elasticsearch client and an Elasticsearch index
>>> from elasticsearch import Elasticsearch >>> es = Elasticsearch("http://localhost:9200") >>> df = ed.DataFrame(es_client=es, es_index_pattern='flights', columns=['AvgTicketPrice', 'Cancelled']) >>> df.head() AvgTicketPrice Cancelled 0 841.265642 False 1 882.982662 False 2 190.636904 False 3 181.694216 True 4 730.041778 False [5 rows x 2 columns]
Constructing DataFrame from an Elasticsearch client and an Elasticsearch index, with ‘timestamp’ as the DataFrame index field (TODO - currently index_field must also be a field if not _id)
>>> df = ed.DataFrame( ... es_client='http://localhost:9200', ... es_index_pattern='flights', ... columns=['AvgTicketPrice', 'timestamp'], ... es_index_field='timestamp' ... ) >>> df.head() AvgTicketPrice timestamp 2018-01-01T00:00:00 841.265642 2018-01-01 00:00:00 2018-01-01T00:02:06 772.100846 2018-01-01 00:02:06 2018-01-01T00:06:27 159.990962 2018-01-01 00:06:27 2018-01-01T00:33:31 800.217104 2018-01-01 00:33:31 2018-01-01T00:36:51 803.015200 2018-01-01 00:36:51 [5 rows x 2 columns]