DataFrame#

Constructor#

DataFrame([es_client, es_index_pattern, ...])

Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns) referencing data stored in Elasticsearch indices.

Attributes and Underlying Data#

DataFrame.index

Return eland index referencing Elasticsearch field to index a DataFrame/Series

DataFrame.columns

The column labels of the DataFrame.

DataFrame.dtypes

Return the pandas dtypes in the DataFrame.

DataFrame.select_dtypes([include, exclude])

Return a subset of the DataFrame's columns based on the column dtypes.

DataFrame.values

Not implemented.

DataFrame.empty

Determines if the DataFrame is empty.

DataFrame.shape

Return a tuple representing the dimensionality of the DataFrame.

DataFrame.ndim

Returns 2 by definition of a DataFrame

DataFrame.size

Return an int representing the number of elements in this object.

Indexing, Iteration#

DataFrame.head([n])

Return the first n rows.

DataFrame.keys()

Return columns

DataFrame.tail([n])

Return the last n rows.

DataFrame.get(key[, default])

Get item from object for given key (ex: DataFrame column).

DataFrame.query(expr)

Query the columns of a DataFrame with a boolean expression.

DataFrame.sample([n, frac, random_state])

Return n randomly sample rows or the specify fraction of rows

DataFrame.iterrows()

Iterate over eland.DataFrame rows as (index, pandas.Series) pairs.

DataFrame.itertuples([index, name])

Iterate over eland.DataFrame rows as namedtuples.

Function Application, GroupBy & Window#

Note

Elasticsearch aggregations using cardinality (count) are accurate approximations using the HyperLogLog++ algorithm so may not be exact.

DataFrame.agg(func[, axis, numeric_only])

Aggregate using one or more operations over the specified axis.

DataFrame.aggregate(func[, axis, numeric_only])

Aggregate using one or more operations over the specified axis.

DataFrame.groupby([by, dropna])

Used to perform groupby operations

DataFrameGroupBy(by, query_compiler[, dropna])

This holds all the groupby methods for eland.DataFrame.groupby()

DataFrameGroupBy.agg(func[, numeric_only])

Used to groupby and aggregate

DataFrameGroupBy.aggregate(func[, numeric_only])

Used to groupby and aggregate

DataFrameGroupBy.count()

Compute the count value for each group.

DataFrameGroupBy.mad([numeric_only])

Compute the median absolute deviation value for each group.

DataFrameGroupBy.max([numeric_only])

Compute the max value for each group.

DataFrameGroupBy.mean([numeric_only])

Compute the mean value for each group.

DataFrameGroupBy.median([numeric_only])

Compute the median value for each group.

DataFrameGroupBy.min([numeric_only])

Compute the min value for each group.

DataFrameGroupBy.nunique()

Compute the nunique value for each group.

DataFrameGroupBy.std([numeric_only])

Compute the standard deviation value for each group.

DataFrameGroupBy.sum([numeric_only])

Compute the sum value for each group.

DataFrameGroupBy.var([numeric_only])

Compute the variance value for each group.

DataFrameGroupBy.quantile([q])

Used to groupby and calculate quantile for a given DataFrame.

GroupBy(by, query_compiler[, dropna])

Base class for calls to eland.DataFrame.groupby()

Computations / Descriptive Stats#

DataFrame.count()

Count non-NA cells for each column.

DataFrame.describe()

Generate descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values.

DataFrame.info([verbose, buf, max_cols, ...])

Print a concise summary of a DataFrame.

DataFrame.max([numeric_only])

Return the maximum value for each numeric column

DataFrame.mean([numeric_only])

Return mean value for each numeric column

DataFrame.min([numeric_only])

Return the minimum value for each numeric column

DataFrame.median([numeric_only])

Return the median value for each numeric column

DataFrame.mad([numeric_only])

Return standard deviation for each numeric column

DataFrame.std([numeric_only])

Return standard deviation for each numeric column

DataFrame.var([numeric_only])

Return variance for each numeric column

DataFrame.sum([numeric_only])

Return sum for each numeric column

DataFrame.nunique()

Return cardinality of each field.

DataFrame.mode([numeric_only, dropna, es_size])

Calculate mode of a DataFrame

DataFrame.quantile([q, numeric_only])

Used to calculate quantile for a given DataFrame.

DataFrame.idxmax([axis])

Return index of first occurrence of maximum over requested axis.

DataFrame.idxmin([axis])

Return index of first occurrence of minimum over requested axis.

Reindexing / Selection / Label Manipulation#

DataFrame.drop([labels, axis, index, ...])

Return new object with labels in requested axis removed.

DataFrame.filter([items, like, regex, axis])

Subset the dataframe rows or columns according to the specified index labels.

Plotting#

DataFrame.hist([column, by, grid, ...])

Make a histogram of the DataFrame's.

Elasticsearch Functions#

DataFrame.es_info()

A debug summary of an eland DataFrame internals.

DataFrame.es_match(text, *[, columns, ...])

Filters data with an Elasticsearch match, match_phrase, or multi_match query depending on the given parameters and columns.

DataFrame.es_query(query)

Applies an Elasticsearch DSL query to the current DataFrame.

DataFrame.es_dtypes

Return the Elasticsearch dtypes in the index

Serialization / IO / Conversion#

DataFrame.info([verbose, buf, max_cols, ...])

Print a concise summary of a DataFrame.

DataFrame.to_numpy()

Not implemented.

DataFrame.to_csv([path_or_buf, sep, na_rep, ...])

Write Elasticsearch data to a comma-separated values (csv) file.

DataFrame.to_html([buf, columns, col_space, ...])

Render a Elasticsearch data as an HTML table.

DataFrame.to_string([buf, columns, ...])

Render a DataFrame to a console-friendly tabular output.

DataFrame.to_pandas([show_progress])

Utility method to convert eland.Dataframe to pandas.Dataframe