DataFrame#

Constructor#

DataFrame([es_client, es_index_pattern, ...])

Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns) referencing data stored in Elasticsearch indices.

Attributes and Underlying Data#

`DataFrame.index`	Return eland index referencing Elasticsearch field to index a DataFrame/Series
`DataFrame.columns`	The column labels of the DataFrame.
`DataFrame.dtypes`	Return the pandas dtypes in the DataFrame.
`DataFrame.select_dtypes`([include, exclude])	Return a subset of the DataFrame's columns based on the column dtypes.
`DataFrame.values`	Not implemented.
`DataFrame.empty`	Determines if the DataFrame is empty.
`DataFrame.shape`	Return a tuple representing the dimensionality of the DataFrame.
`DataFrame.ndim`	Returns 2 by definition of a DataFrame
`DataFrame.size`	Return an int representing the number of elements in this object.

Indexing, Iteration#

`DataFrame.head`([n])	Return the first n rows.
`DataFrame.keys`()	Return columns
`DataFrame.tail`([n])	Return the last n rows.
`DataFrame.get`(key[, default])	Get item from object for given key (ex: DataFrame column).
`DataFrame.query`(expr)	Query the columns of a DataFrame with a boolean expression.
`DataFrame.sample`([n, frac, random_state])	Return n randomly sample rows or the specify fraction of rows
`DataFrame.iterrows`()	Iterate over eland.DataFrame rows as (index, pandas.Series) pairs.
`DataFrame.itertuples`([index, name])	Iterate over eland.DataFrame rows as namedtuples.

Function Application, GroupBy & Window#

Note

Elasticsearch aggregations using cardinality (count) are accurate approximations using the HyperLogLog++ algorithm so may not be exact.

`DataFrame.agg`(func[, axis, numeric_only])	Aggregate using one or more operations over the specified axis.
`DataFrame.aggregate`(func[, axis, numeric_only])	Aggregate using one or more operations over the specified axis.
`DataFrame.groupby`([by, dropna])	Used to perform groupby operations

`DataFrameGroupBy`(by, query_compiler[, dropna])	This holds all the groupby methods for `eland.DataFrame.groupby()`
`DataFrameGroupBy.agg`(func[, numeric_only])	Used to groupby and aggregate
`DataFrameGroupBy.aggregate`(func[, numeric_only])	Used to groupby and aggregate
`DataFrameGroupBy.count`()	Compute the count value for each group.
`DataFrameGroupBy.mad`([numeric_only])	Compute the median absolute deviation value for each group.
`DataFrameGroupBy.max`([numeric_only])	Compute the max value for each group.
`DataFrameGroupBy.mean`([numeric_only])	Compute the mean value for each group.
`DataFrameGroupBy.median`([numeric_only])	Compute the median value for each group.
`DataFrameGroupBy.min`([numeric_only])	Compute the min value for each group.
`DataFrameGroupBy.nunique`()	Compute the nunique value for each group.
`DataFrameGroupBy.std`([numeric_only])	Compute the standard deviation value for each group.
`DataFrameGroupBy.sum`([numeric_only])	Compute the sum value for each group.
`DataFrameGroupBy.var`([numeric_only])	Compute the variance value for each group.
`DataFrameGroupBy.quantile`([q])	Used to groupby and calculate quantile for a given DataFrame.
`GroupBy`(by, query_compiler[, dropna])	Base class for calls to `eland.DataFrame.groupby()`

Computations / Descriptive Stats#

`DataFrame.count`()	Count non-NA cells for each column.
`DataFrame.describe`()	Generate descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values.
`DataFrame.info`([verbose, buf, max_cols, ...])	Print a concise summary of a DataFrame.
`DataFrame.max`([numeric_only])	Return the maximum value for each numeric column
`DataFrame.mean`([numeric_only])	Return mean value for each numeric column
`DataFrame.min`([numeric_only])	Return the minimum value for each numeric column
`DataFrame.median`([numeric_only])	Return the median value for each numeric column
`DataFrame.mad`([numeric_only])	Return standard deviation for each numeric column
`DataFrame.std`([numeric_only])	Return standard deviation for each numeric column
`DataFrame.var`([numeric_only])	Return variance for each numeric column
`DataFrame.sum`([numeric_only])	Return sum for each numeric column
`DataFrame.nunique`()	Return cardinality of each field.
`DataFrame.mode`([numeric_only, dropna, es_size])	Calculate mode of a DataFrame
`DataFrame.quantile`([q, numeric_only])	Used to calculate quantile for a given DataFrame.
`DataFrame.idxmax`([axis])	Return index of first occurrence of maximum over requested axis.
`DataFrame.idxmin`([axis])	Return index of first occurrence of minimum over requested axis.

Reindexing / Selection / Label Manipulation#

`DataFrame.drop`([labels, axis, index, ...])	Return new object with labels in requested axis removed.
`DataFrame.filter`([items, like, regex, axis])	Subset the dataframe rows or columns according to the specified index labels.

Plotting#

DataFrame.hist([column, by, grid, ...])

Make a histogram of the DataFrame's.

Elasticsearch Functions#

`DataFrame.es_info`()	A debug summary of an eland DataFrame internals.
`DataFrame.es_match`(text, *[, columns, ...])	Filters data with an Elasticsearch `match`, `match_phrase`, or `multi_match` query depending on the given parameters and columns.
`DataFrame.es_query`(query)	Applies an Elasticsearch DSL query to the current DataFrame.
`DataFrame.es_dtypes`	Return the Elasticsearch dtypes in the index

Serialization / IO / Conversion#

`DataFrame.info`([verbose, buf, max_cols, ...])	Print a concise summary of a DataFrame.
`DataFrame.to_numpy`()	Not implemented.
`DataFrame.to_csv`([path_or_buf, sep, na_rep, ...])	Write Elasticsearch data to a comma-separated values (csv) file.
`DataFrame.to_html`([buf, columns, col_space, ...])	Render a Elasticsearch data as an HTML table.
`DataFrame.to_string`([buf, columns, ...])	Render a DataFrame to a console-friendly tabular output.
`DataFrame.to_pandas`([show_progress])	Utility method to convert eland.Dataframe to pandas.Dataframe

Supported Pandas APIs

eland.DataFrame