在Pandas中使用GROUPBY和Mean()保留带有类别变量的列

素材狗 2022-01-01 20:29:29 文章分类：Python问题点击数：次

文章标签

Keep a column with a categorical variable in Pandas with groupby and mean()(在Pandas中使用GROUPBY和Mean()保留带有类别变量的列)

本文介绍了在Pandas中使用GROUPBY和Mean()保留带有类别变量的列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有没有办法在groupby和mean()之后保留类别变量？例如，给定数据帧df：

              ratio    Metadata_A      Metadata_B   treatment
0      54265.937500           B10               1  AB_cmpd_01
11    107364.750000           B10               2  AB_cmpd_01
22     95766.500000           B10               3  AB_cmpd_01
24     64346.250000           B10               4  AB_cmpd_01
25     52726.333333           B10               5  AB_cmpd_01
30     65056.600000           B11               1          UT
41     78409.600000           B11               2          UT
52    133533.000000           B11               3          UT
54    102433.571429           B11               4          UT
55     82217.588235           B11               5          UT
60     89843.600000            B2               1          UT
71     98544.000000            B2               2          UT
82    179330.000000            B2               3          UT
84    107132.400000            B2               4          UT
85     73096.909091            B2               5          UT

我需要在Metadata_A内取ratio的平均值，但在末尾保留列treatment：

理论上，类似于：

df.groupby(by='Metadata_A').mean().reset_index()

              ratio    Metadata_A      Metadata_B   treatment
 0     54265.937500           B10             2.5  AB_cmpd_01
 1     78409.600000           B11             2.5          UT
 2    107132.400000            B2             2.5          UT

但是，平均化后treatment列消失。

推荐答案

您可以将groupby与agg

配合使用

df.groupby(['Metadata_A','treatment'],as_index=False).agg({'Metadata_B':'mean','ratio':'first'})
Out[358]: 
  Metadata_A   treatment  Metadata_B       ratio
0        B10  AB_cmpd_01           3  54265.9375
1        B11          UT           3  65056.6000
2         B2          UT           3  89843.6000

这篇关于在Pandas中使用GROUPBY和Mean()保留带有类别变量的列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持编程学习网！

上一篇：我如何计算 pandas 中每组的行数？

下一篇：具有GROUPBY的多列上的VALUE_COUNTS