DataFrame

Constructor

DataFrame([es_client, es_index_pattern, …])

Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns) referencing data stored in Elasticsearch indices.

Attributes and underlying data

DataFrame.index

Return eland index referencing Elasticsearch field to index a DataFrame/Series

DataFrame.columns

The column labels of the DataFrame.

DataFrame.dtypes

Return the pandas dtypes in the DataFrame.

DataFrame.select_dtypes([include, exclude])

Return a subset of the DataFrame’s columns based on the column dtypes.

DataFrame.values

Not implemented.

DataFrame.empty

Determines if the DataFrame is empty.

DataFrame.shape

Return a tuple representing the dimensionality of the DataFrame.

DataFrame.ndim

Returns 2 by definition of a DataFrame

DataFrame.size

Return an int representing the number of elements in this object.

Indexing, iteration

DataFrame.head(n)

Return the first n rows.

DataFrame.keys()

Return columns

DataFrame.tail(n)

Return the last n rows.

DataFrame.get(key[, default])

Get item from object for given key (ex: DataFrame column).

DataFrame.query(expr)

Query the columns of a DataFrame with a boolean expression.

DataFrame.sample(n, frac, random_state)

Return n randomly sample rows or the specify fraction of rows

Function application, GroupBy & window

DataFrame.agg(func[, axis])

Aggregate using one or more operations over the specified axis.

DataFrame.aggregate(func[, axis])

Aggregate using one or more operations over the specified axis.

Computations / descriptive stats

DataFrame.count()

Count non-NA cells for each column.

DataFrame.describe()

Generate descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values.

DataFrame.info([verbose, buf, max_cols, …])

Print a concise summary of a DataFrame.

DataFrame.max(numeric_only)

Return the maximum value for each numeric column

DataFrame.mean(numeric_only)

Return mean value for each numeric column

DataFrame.min(numeric_only)

Return the minimum value for each numeric column

DataFrame.median(numeric_only)

Return the median value for each numeric column

DataFrame.mad(numeric_only)

Return standard deviation for each numeric column

DataFrame.std(numeric_only)

Return standard deviation for each numeric column

DataFrame.var(numeric_only)

Return variance for each numeric column

DataFrame.sum(numeric_only)

Return sum for each numeric column

DataFrame.nunique()

Return cardinality of each field.

Reindexing / selection / label manipulation

DataFrame.drop([labels, axis, index, …])

Return new object with labels in requested axis removed.

DataFrame.filter(items, like, regex, axis, …)

Subset the dataframe rows or columns according to the specified index labels.

Plotting

DataFrame.hist([column, by, grid, …])

Make a histogram of the DataFrame’s.

Elasticsearch Functions

DataFrame.es_info()

A debug summary of an eland DataFrame internals.

DataFrame.es_query(query)

Applies an Elasticsearch DSL query to the current DataFrame.

Serialization / IO / conversion

DataFrame.info([verbose, buf, max_cols, …])

Print a concise summary of a DataFrame.

DataFrame.to_numpy()

Not implemented.

DataFrame.to_csv([path_or_buf, sep, na_rep, …])

Write Elasticsearch data to a comma-separated values (csv) file.

DataFrame.to_html([buf, columns, col_space, …])

Render a Elasticsearch data as an HTML table.

DataFrame.to_string([buf, columns, …])

Render a DataFrame to a console-friendly tabular output.

DataFrame.to_pandas(show_progress)

Utility method to convert eland.Dataframe to pandas.Dataframe