DataFrame

Constructor

DataFrame(es_client, List[str], Tuple[str, …)

Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns) referencing data stored in Elasticsearch indices.

Attributes and Underlying Data

DataFrame.index

Return eland index referencing Elasticsearch field to index a DataFrame/Series

DataFrame.columns

The column labels of the DataFrame.

DataFrame.dtypes

Return the pandas dtypes in the DataFrame.

DataFrame.select_dtypes([include, exclude])

Return a subset of the DataFrame’s columns based on the column dtypes.

DataFrame.values

Not implemented.

DataFrame.empty

Determines if the DataFrame is empty.

DataFrame.shape

Return a tuple representing the dimensionality of the DataFrame.

DataFrame.ndim

Returns 2 by definition of a DataFrame

DataFrame.size

Return an int representing the number of elements in this object.

Indexing, Iteration

DataFrame.head(n)

Return the first n rows.

DataFrame.keys()

Return columns

DataFrame.tail(n)

Return the last n rows.

DataFrame.get(key, default)

Get item from object for given key (ex: DataFrame column).

DataFrame.query(expr)

Query the columns of a DataFrame with a boolean expression.

DataFrame.sample(n, frac, random_state)

Return n randomly sample rows or the specify fraction of rows

DataFrame.iterrows()

Iterate over eland.DataFrame rows as (index, pandas.Series) pairs.

DataFrame.itertuples(index, name)

Iterate over eland.DataFrame rows as namedtuples.

Function Application, GroupBy & Window

Note

Elasticsearch aggregations using cardinality (count) are accurate approximations using the HyperLogLog++ algorithm so may not be exact.

DataFrame.agg(func, List[str]], axis, …)

Aggregate using one or more operations over the specified axis.

DataFrame.aggregate(func, List[str]], axis, …)

Aggregate using one or more operations over the specified axis.

DataFrame.groupby(by, List[str], …)

Used to perform groupby operations

DataFrameGroupBy(by, query_compiler, dropna)

This holds all the groupby methods for eland.DataFrame.groupby()

DataFrameGroupBy.agg(func, List[str]], …)

Used to groupby and aggregate

DataFrameGroupBy.aggregate(func, List[str]], …)

Used to groupby and aggregate

DataFrameGroupBy.count()

Compute the count value for each group.

DataFrameGroupBy.mad(numeric_only)

Compute the median absolute deviation value for each group.

DataFrameGroupBy.max(numeric_only)

Compute the max value for each group.

DataFrameGroupBy.mean(numeric_only)

Compute the mean value for each group.

DataFrameGroupBy.median(numeric_only)

Compute the median value for each group.

DataFrameGroupBy.min(numeric_only)

Compute the min value for each group.

DataFrameGroupBy.nunique()

Compute the nunique value for each group.

DataFrameGroupBy.std(numeric_only)

Compute the standard deviation value for each group.

DataFrameGroupBy.sum(numeric_only)

Compute the sum value for each group.

DataFrameGroupBy.var(numeric_only)

Compute the variance value for each group.

DataFrameGroupBy.quantile(q, float, …)

Used to groupby and calculate quantile for a given DataFrame.

GroupBy(by, query_compiler, dropna)

Base class for calls to eland.DataFrame.groupby()

Computations / Descriptive Stats

DataFrame.count()

Count non-NA cells for each column.

DataFrame.describe()

Generate descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values.

DataFrame.info([verbose, buf, max_cols, …])

Print a concise summary of a DataFrame.

DataFrame.max(numeric_only)

Return the maximum value for each numeric column

DataFrame.mean(numeric_only)

Return mean value for each numeric column

DataFrame.min(numeric_only)

Return the minimum value for each numeric column

DataFrame.median(numeric_only)

Return the median value for each numeric column

DataFrame.mad(numeric_only)

Return standard deviation for each numeric column

DataFrame.std(numeric_only)

Return standard deviation for each numeric column

DataFrame.var(numeric_only)

Return variance for each numeric column

DataFrame.sum(numeric_only)

Return sum for each numeric column

DataFrame.nunique()

Return cardinality of each field.

DataFrame.mode(numeric_only, dropna, es_size)

Calculate mode of a DataFrame

DataFrame.quantile(q, float, List[int], …)

Used to calculate quantile for a given DataFrame.

DataFrame.idxmax(axis)

Return index of first occurrence of maximum over requested axis.

DataFrame.idxmin(axis)

Return index of first occurrence of minimum over requested axis.

Reindexing / Selection / Label Manipulation

DataFrame.drop([labels, axis, index, …])

Return new object with labels in requested axis removed.

DataFrame.filter(items, like, regex, axis, …)

Subset the dataframe rows or columns according to the specified index labels.

Plotting

DataFrame.hist([column, by, grid, …])

Make a histogram of the DataFrame’s.

Elasticsearch Functions

DataFrame.es_info()

A debug summary of an eland DataFrame internals.

DataFrame.es_match(text, *, columns, …)

Filters data with an Elasticsearch match, match_phrase, or multi_match query depending on the given parameters and columns.

DataFrame.es_query(query)

Applies an Elasticsearch DSL query to the current DataFrame.

DataFrame.es_dtypes

Return the Elasticsearch dtypes in the index

Serialization / IO / Conversion

DataFrame.info([verbose, buf, max_cols, …])

Print a concise summary of a DataFrame.

DataFrame.to_numpy()

Not implemented.

DataFrame.to_csv([path_or_buf, sep, na_rep, …])

Write Elasticsearch data to a comma-separated values (csv) file.

DataFrame.to_html([buf, columns, col_space, …])

Render a Elasticsearch data as an HTML table.

DataFrame.to_string([buf, columns, …])

Render a DataFrame to a console-friendly tabular output.

DataFrame.to_pandas(show_progress)

Utility method to convert eland.Dataframe to pandas.Dataframe