eland.DataFrame.count¶

DataFrame.count() → pandas.core.series.Series¶

Count non-NA cells for each column.

Counts are based on exists queries against ES.

This is inefficient, as it creates N queries (N is number of fields). An alternative approach is to use value_count aggregations. However, they have issues in that:

They can only be used with aggregatable fields (e.g. keyword not text)
For list fields they return multiple counts. E.g. tags=[‘elastic’, ‘ml’] returns value_count=2 for a single document.

TODO - add additional pandas.DataFrame.count features

Returns

pandas.Series:: Summary of column counts

See also

pandas.DataFrame.count

Examples

>>> df = ed.DataFrame('http://localhost:9200', 'ecommerce', columns=['customer_first_name', 'geoip.city_name'])
>>> df.count()
customer_first_name    4675
geoip.city_name        4094
dtype: int64

eland.groupby.GroupBy

eland.DataFrame.describe