eland.DataFrame.count¶
-
DataFrame.
count
() → pandas.core.series.Series¶ Count non-NA cells for each column.
Counts are based on exists queries against ES.
This is inefficient, as it creates N queries (N is number of fields). An alternative approach is to use value_count aggregations. However, they have issues in that:
They can only be used with aggregatable fields (e.g. keyword not text)
For list fields they return multiple counts. E.g. tags=[‘elastic’, ‘ml’] returns value_count=2 for a single document.
TODO - add additional pandas.DataFrame.count features
- Returns
- pandas.Series:
Summary of column counts
See also
Examples
>>> df = ed.DataFrame('localhost', 'ecommerce', columns=['customer_first_name', 'geoip.city_name']) >>> df.count() customer_first_name 4675 geoip.city_name 4094 dtype: int64