eland.DataFrame.aggregate

DataFrame.aggregate(func: Union[str, List[str]], axis: int = 0, numeric_only: Optional[bool] = None, *args, **kwargs) → Union[pandas.core.series.Series, pandas.core.frame.DataFrame]

Aggregate using one or more operations over the specified axis.

Parameters
func: function, str, list or dict

Function to use for aggregating the data. If a function, must either work when passed a %(klass)s or when passed to %(klass)s.apply.

Accepted combinations are:

  • function

  • string function name

  • list of functions and/or function names, e.g. [np.sum, 'mean']

  • dict of axis labels -> functions, function names or list of such.

Currently, we only support ['count', 'mad', 'max', 'mean', 'median', 'min', 'mode', 'quantile', 'rank', 'sem', 'skew', 'sum', 'std', 'var']

axis: int

Currently, we only support axis=0 (index)

numeric_only: {True, False, None} Default is None

Which datatype to be returned - True: returns all values with float64, NaN/NaT are ignored. - False: returns all values with float64. - None: returns all values with default datatype.

*args

Positional arguments to pass to func

**kwargs

Keyword arguments to pass to func

Returns
DataFrame, Series or scalar

if DataFrame.agg is called with a single function, returns a Series if DataFrame.agg is called with several functions, returns a DataFrame if Series.agg is called with single function, returns a scalar if Series.agg is called with several functions, returns a Series

Examples

>>> df = ed.DataFrame('localhost', 'flights', columns=['AvgTicketPrice', 'DistanceKilometers', 'timestamp', 'DestCountry'])
>>> df.aggregate(['sum', 'min', 'std'], numeric_only=True).astype(int)
     AvgTicketPrice  DistanceKilometers
sum         8204364            92616288
min             100                   0
std             266                4578
>>> df.aggregate(['sum', 'min', 'std'], numeric_only=True)
     AvgTicketPrice  DistanceKilometers
sum    8.204365e+06        9.261629e+07
min    1.000205e+02        0.000000e+00
std    2.664071e+02        4.578614e+03
>>> df.aggregate(['sum', 'min', 'std'], numeric_only=False)
     AvgTicketPrice  DistanceKilometers  timestamp  DestCountry
sum    8.204365e+06        9.261629e+07        NaT          NaN
min    1.000205e+02        0.000000e+00 2018-01-01          NaN
std    2.664071e+02        4.578614e+03        NaT          NaN
>>> df.aggregate(['sum', 'min', 'std'], numeric_only=None)
     AvgTicketPrice  DistanceKilometers  timestamp  DestCountry
sum    8.204365e+06        9.261629e+07        NaT          NaN
min    1.000205e+02        0.000000e+00 2018-01-01          NaN
std    2.664071e+02        4.578614e+03        NaT          NaN