DataFrame¶

Constructor¶

DataFrame(es_client, List[str], Tuple[str, …)

Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns) referencing data stored in Elasticsearch indices.

Attributes and Underlying Data¶

`DataFrame.index`	Return eland index referencing Elasticsearch field to index a DataFrame/Series
`DataFrame.columns`	The column labels of the DataFrame.
`DataFrame.dtypes`	Return the pandas dtypes in the DataFrame.
`DataFrame.select_dtypes`([include, exclude])	Return a subset of the DataFrame’s columns based on the column dtypes.
`DataFrame.values`	Not implemented.
`DataFrame.empty`	Determines if the DataFrame is empty.
`DataFrame.shape`	Return a tuple representing the dimensionality of the DataFrame.
`DataFrame.ndim`	Returns 2 by definition of a DataFrame
`DataFrame.size`	Return an int representing the number of elements in this object.

Indexing, Iteration¶

`DataFrame.head`(n)	Return the first n rows.
`DataFrame.keys`()	Return columns
`DataFrame.tail`(n)	Return the last n rows.
`DataFrame.get`(key, default)	Get item from object for given key (ex: DataFrame column).
`DataFrame.query`(expr)	Query the columns of a DataFrame with a boolean expression.
`DataFrame.sample`(n, frac, random_state)	Return n randomly sample rows or the specify fraction of rows
`DataFrame.iterrows`()	Iterate over eland.DataFrame rows as (index, pandas.Series) pairs.
`DataFrame.itertuples`(index, name)	Iterate over eland.DataFrame rows as namedtuples.

Function Application, GroupBy & Window¶

Note

Elasticsearch aggregations using cardinality (count) are accurate approximations using the HyperLogLog++ algorithm so may not be exact.

`DataFrame.agg`(func, List[str]], axis, …)	Aggregate using one or more operations over the specified axis.
`DataFrame.aggregate`(func, List[str]], axis, …)	Aggregate using one or more operations over the specified axis.
`DataFrame.groupby`(by, List[str], …)	Used to perform groupby operations

`DataFrameGroupBy`(by, query_compiler, dropna)	This holds all the groupby methods for `eland.DataFrame.groupby()`
`DataFrameGroupBy.agg`(func, List[str]], …)	Used to groupby and aggregate
`DataFrameGroupBy.aggregate`(func, List[str]], …)	Used to groupby and aggregate
`DataFrameGroupBy.count`()	Compute the count value for each group.
`DataFrameGroupBy.mad`(numeric_only)	Compute the median absolute deviation value for each group.
`DataFrameGroupBy.max`(numeric_only)	Compute the max value for each group.
`DataFrameGroupBy.mean`(numeric_only)	Compute the mean value for each group.
`DataFrameGroupBy.median`(numeric_only)	Compute the median value for each group.
`DataFrameGroupBy.min`(numeric_only)	Compute the min value for each group.
`DataFrameGroupBy.nunique`()	Compute the nunique value for each group.
`DataFrameGroupBy.std`(numeric_only)	Compute the standard deviation value for each group.
`DataFrameGroupBy.sum`(numeric_only)	Compute the sum value for each group.
`DataFrameGroupBy.var`(numeric_only)	Compute the variance value for each group.
`DataFrameGroupBy.quantile`(q, float, …)	Used to groupby and calculate quantile for a given DataFrame.
`GroupBy`(by, query_compiler, dropna)	Base class for calls to `eland.DataFrame.groupby()`

Computations / Descriptive Stats¶

`DataFrame.count`()	Count non-NA cells for each column.
`DataFrame.describe`()	Generate descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values.
`DataFrame.info`([verbose, buf, max_cols, …])	Print a concise summary of a DataFrame.
`DataFrame.max`(numeric_only)	Return the maximum value for each numeric column
`DataFrame.mean`(numeric_only)	Return mean value for each numeric column
`DataFrame.min`(numeric_only)	Return the minimum value for each numeric column
`DataFrame.median`(numeric_only)	Return the median value for each numeric column
`DataFrame.mad`(numeric_only)	Return standard deviation for each numeric column
`DataFrame.std`(numeric_only)	Return standard deviation for each numeric column
`DataFrame.var`(numeric_only)	Return variance for each numeric column
`DataFrame.sum`(numeric_only)	Return sum for each numeric column
`DataFrame.nunique`()	Return cardinality of each field.
`DataFrame.mode`(numeric_only, dropna, es_size)	Calculate mode of a DataFrame
`DataFrame.quantile`(q, float, List[int], …)	Used to calculate quantile for a given DataFrame.
`DataFrame.idxmax`(axis)	Return index of first occurrence of maximum over requested axis.
`DataFrame.idxmin`(axis)	Return index of first occurrence of minimum over requested axis.

Reindexing / Selection / Label Manipulation¶

`DataFrame.drop`([labels, axis, index, …])	Return new object with labels in requested axis removed.
`DataFrame.filter`(items, like, regex, axis, …)	Subset the dataframe rows or columns according to the specified index labels.

Plotting¶

DataFrame.hist([column, by, grid, …])

Make a histogram of the DataFrame’s.

Elasticsearch Functions¶

`DataFrame.es_info`()	A debug summary of an eland DataFrame internals.
`DataFrame.es_match`(text, *, columns, …)	Filters data with an Elasticsearch `match`, `match_phrase`, or `multi_match` query depending on the given parameters and columns.
`DataFrame.es_query`(query)	Applies an Elasticsearch DSL query to the current DataFrame.
`DataFrame.es_dtypes`	Return the Elasticsearch dtypes in the index

Serialization / IO / Conversion¶

`DataFrame.info`([verbose, buf, max_cols, …])	Print a concise summary of a DataFrame.
`DataFrame.to_numpy`()	Not implemented.
`DataFrame.to_csv`([path_or_buf, sep, na_rep, …])	Write Elasticsearch data to a comma-separated values (csv) file.
`DataFrame.to_html`([buf, columns, col_space, …])	Render a Elasticsearch data as an HTML table.
`DataFrame.to_string`([buf, columns, …])	Render a DataFrame to a console-friendly tabular output.
`DataFrame.to_pandas`(show_progress)	Utility method to convert eland.Dataframe to pandas.Dataframe

Supported Pandas APIs eland.DataFrame