pyspark.pandas.DataFrame.median#
- DataFrame.median(axis=None, skipna=True, numeric_only=None, accuracy=10000)#
- Return the median of the values for the requested axis. - Note - Unlike pandas’, the median in pandas-on-Spark is an approximated median based upon approximate percentile computation because computing median across a large dataset is extremely expensive. - Parameters
- axis: {index (0), columns (1)}
- Axis for the function to be applied on. 
- skipna: bool, default True
- Exclude NA/null values when computing the result. - Changed in version 3.4.0: Supported including NA/null values. 
- numeric_only: bool, default None
- Include only float, int, boolean columns. False is not supported. This parameter is mainly for pandas compatibility. 
- accuracy: int, optional
- Default accuracy of approximation. Larger value means better accuracy. The relative error can be deduced by 1.0 / accuracy. 
 
- Returns
- median: scalar or Series
 
 - Examples - >>> df = ps.DataFrame({ ... 'a': [24., 21., 25., 33., 26.], 'b': [1, 2, 3, 4, 5]}, columns=['a', 'b']) >>> df a b 0 24.0 1 1 21.0 2 2 25.0 3 3 33.0 4 4 26.0 5 - On a DataFrame: - >>> df.median() a 25.0 b 3.0 dtype: float64 - On a Series: - >>> df['a'].median() 25.0 >>> (df['b'] + 100).median() 103.0 - For multi-index columns, - >>> df.columns = pd.MultiIndex.from_tuples([('x', 'a'), ('y', 'b')]) >>> df x y a b 0 24.0 1 1 21.0 2 2 25.0 3 3 33.0 4 4 26.0 5 - On a DataFrame: - >>> df.median() x a 25.0 y b 3.0 dtype: float64 - >>> df.median(axis=1) 0 12.5 1 11.5 2 14.0 3 18.5 4 15.5 dtype: float64 - On a Series: - >>> df[('x', 'a')].median() 25.0 >>> (df[('y', 'b')] + 100).median() 103.0