当前位置：首页 > news >正文

Pandas十大练习题，掌握常用方法

news 2025/7/24 5:19:04

文章目录

Pandas分析练习题
- 1. 获取并了解数据
- 2. 数据过滤与排序
- 3. 数据分组
- 4. Apply函数
- 5. 合并数据
- 6. 数据统计
- 7. 数据可视化
- 8. 创建数据框
- 9. 时间序列
- 10. 删除数据

代码均在Jupter Notebook上完成

Pandas分析练习题

数据集可从此获取：

链接: https://pan.baidu.com/s/1YGwh3pqxW4OlrQXt-5wgFg?pwd=3znx 提取码: 3znx

简介	数据集
1.分析Chipotle快餐数据	chipotle.tsv
2.分析2012欧洲杯数据	Euro2012_stats.csv
3.分析酒类消费数据	drinks.csv
4.分析1960 - 2014 美国犯罪数据	US_Crime_Rates_1960_2014.csv
5.分析虚拟姓名数据	题内构造数据
6.分析风速数据	wind.data
7.分析泰坦尼克灾难数据	train.csv
8.分析Pokemon数据	练习中手动内置的数据
9.分析Apple公司股价数据	Apple_stock.csv
10.分析Iris纸鸢花数据	iris.csv

1. 获取并了解数据

import pandas as pd
csv_path='./pandas_data/chipotle.tsv'
#1.加载数据
chipo=pd.read_csv(csv_path,sep='\t')
#2.查看数据的前10行
print(chipo.head(10))
print('----------1----------')
#3.查看数据有多少列
print(chipo.shape[1])
print('----------2----------')
#4.打印全部列名
print(chipo.columns)
print('----------3----------')
#5. 查看数据集索引
print(chipo.index)
print('----------4----------')
#6. 查看下单数量最多的商品
c = chipo[['item_name', 'quantity']].groupby(['item_name'], as_index=False).agg({'quantity': sum})
c.sort_values(by='quantity',ascending=False,inplace=True)
print(c.head(1))
print('----------5----------')#7. 查看有多少种商品 中已经对商品名称进行去重，因此只需要记录商品名称个数即可
print(c['quantity'].count())
#7.1 方法2
print(chipo['item_name'].nunique())
print('----------6----------')
#8. 在choice_description中，下单次数最多的商品是什么?
print(chipo['choice_description'].value_counts().head(1))
print('----------7----------')
#9. 下单商品总量
print(chipo['quantity'].sum())
#10. 将价格iten_priceabs转换为浮点数
d=lambda x: float(x[1:])
chipo['item_price']=chipo['item_price'].apply(d)
print(chipo['item_price'].dtype)
print('----------8----------')
#11. 计算总收入
chipo['sub_total']=chipo['item_price']*chipo['quantity']
print(chipo['sub_total'].sum())
print('----------9----------')
# 12: 订单总量
print(chipo['order_id'].nunique())

   order_id  quantity                              item_name  \
0         1         1           Chips and Fresh Tomato Salsa   
1         1         1                                   Izze   
2         1         1                       Nantucket Nectar   
3         1         1  Chips and Tomatillo-Green Chili Salsa   
4         2         2                           Chicken Bowl   
5         3         1                           Chicken Bowl   
6         3         1                          Side of Chips   
7         4         1                          Steak Burrito   
8         4         1                       Steak Soft Tacos   
9         5         1                          Steak Burrito   choice_description item_price  
0                                                NaN     $2.39   
1                                       [Clementine]     $3.39   
2                                            [Apple]     $3.39   
3                                                NaN     $2.39   
4  [Tomatillo-Red Chili Salsa (Hot), [Black Beans...    $16.98   
5  [Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...    $10.98   
6                                                NaN     $1.69   
7  [Tomatillo Red Chili Salsa, [Fajita Vegetables...    $11.75   
8  [Tomatillo Green Chili Salsa, [Pinto Beans, Ch...     $9.25   
9  [Fresh Tomato Salsa, [Rice, Black Beans, Pinto...     $9.25   
----------1----------
5
----------2----------
Index(['order_id', 'quantity', 'item_name', 'choice_description','item_price'],dtype='object')
----------3----------
RangeIndex(start=0, stop=4622, step=1)
----------4----------item_name  quantity
17  Chicken Bowl       761
----------5----------
50
50
----------6----------
[Diet Coke]    134
Name: choice_description, dtype: int64
----------7----------
4972
float64
----------8----------
39237.02
1834

2. 数据过滤与排序

csv_path2="./pandas_data/Euro2012_stats.csv"
#1:加载数据
euro=pd.read_csv(csv_path2)
print(euro.head())
print('----------1----------')
#2.读取Goals列
print(euro['Goals'])
print('----------2----------')
#3.统计球队数量
print(euro.shape[0])
print('----------3----------')
#4.查看数据集信息
print(euro.info())
print('----------4----------')
#5.将Team、Yellow Cards、Red Cards单独存储到一个数据集
subset=euro[['Team','Yellow Cards','Red Cards']]
print(subset.head())
print('----------5----------')
#6. 对数据集5按Red Cards、Yellow Cards排序
sorted_subset=subset.sort_values(['Red Cards','Yellow Cards'],ascending=False)
print(sorted_subset)
print('----------6----------')
#7.计算黄牌平均值
print(round(subset['Yellow Cards'].mean()))
print('----------7----------')
#8. 找出进球数大于6的球队
print(euro[euro['Goals']>6][['Team','Goals']])
print('----------8----------')
#9. 选取G开头的球队
#方法1 contains方法加正则表达式
print(euro[euro['Team'].str.contains('^G')]['Team'])
#方法2 
print(euro[euro.Team.str.startswith('G')]['Team'])
print('----------9----------')
#10. 选取前7列
print(euro.iloc[:,0:7])
#11. 选取除了最后3列之外的全部列
print(euro.iloc[:,:-3])
#12. 找到英格兰(England)、意大利(Italy)和俄罗斯(Russia)的射正率(Shooting Accuracy)
print(euro.loc[euro['Team'].isin(['England', 'Italy', 'Russia']),['Team', 'Shooting Accuracy']])

             Team  Goals  Shots on target  Shots off target Shooting Accuracy  \
0         Croatia      4               13                12             51.9%   
1  Czech Republic      4               13                18             41.9%   
2         Denmark      4               10                10             50.0%   
3         England      5               11                18             50.0%   
4          France      3               22                24             37.9%   % Goals-to-shots  Total shots (inc. Blocked)  Hit Woodwork  Penalty goals  \
0            16.0%                          32             0              0   
1            12.9%                          39             0              0   
2            20.0%                          27             1              0   
3            17.2%                          40             0              0   
4             6.5%                          65             1              0   Penalties not scored  ...  Saves made  Saves-to-shots ratio  Fouls Won  \
0                     0  ...          13                 81.3%         41   
1                     0  ...           9                 60.1%         53   
2                     0  ...          10                 66.7%         25   
3                     0  ...          22                 88.1%         43   
4                     0  ...           6                 54.6%         36   Fouls Conceded  Offsides  Yellow Cards  Red Cards  Subs on  Subs off  \
0             62         2             9          0        9         9   
1             73         8             7          0       11        11   
2             38         8             4          0        7         7   
3             45         6             5          0       11        11   
4             51         5             6          0       11        11   Players Used  
0            16  
1            19  
2            15  
3            16  
4            19  [5 rows x 35 columns]
----------1----------
0      4
1      4
2      4
3      5
4      3
5     10
6      5
7      6
8      2
9      2
10     6
11     1
12     5
13    12
14     5
15     2
Name: Goals, dtype: int64
----------2----------
16
----------3----------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16 entries, 0 to 15
Data columns (total 35 columns):#   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  0   Team                        16 non-null     object 1   Goals                       16 non-null     int64  2   Shots on target             16 non-null     int64  3   Shots off target            16 non-null     int64  4   Shooting Accuracy           16 non-null     object 5   % Goals-to-shots            16 non-null     object 6   Total shots (inc. Blocked)  16 non-null     int64  7   Hit Woodwork                16 non-null     int64  8   Penalty goals               16 non-null     int64  9   Penalties not scored        16 non-null     int64  10  Headed goals                16 non-null     int64  11  Passes                      16 non-null     int64  12  Passes completed            16 non-null     int64  13  Passing Accuracy            16 non-null     object 14  Touches                     16 non-null     int64  15  Crosses                     16 non-null     int64  16  Dribbles                    16 non-null     int64  17  Corners Taken               16 non-null     int64  18  Tackles                     16 non-null     int64  19  Clearances                  16 non-null     int64  20  Interceptions               16 non-null     int64  21  Clearances off line         15 non-null     float6422  Clean Sheets                16 non-null     int64  23  Blocks                      16 non-null     int64  24  Goals conceded              16 non-null     int64  25  Saves made                  16 non-null     int64  26  Saves-to-shots ratio        16 non-null     object 27  Fouls Won                   16 non-null     int64  28  Fouls Conceded              16 non-null     int64  29  Offsides                    16 non-null     int64  30  Yellow Cards                16 non-null     int64  31  Red Cards                   16 non-null     int64  32  Subs on                     16 non-null     int64  33  Subs off                    16 non-null     int64  34  Players Used                16 non-null     int64  
dtypes: float64(1), int64(29), object(5)
memory usage: 4.5+ KB
None
----------4----------Team  Yellow Cards  Red Cards
0         Croatia             9          0
1  Czech Republic             7          0
2         Denmark             4          0
3         England             5          0
4          France             6          0
----------5----------Team  Yellow Cards  Red Cards
6                Greece             9          1
9                Poland             7          1
11  Republic of Ireland             6          1
7                 Italy            16          0
10             Portugal            12          0
13                Spain            11          0
0               Croatia             9          0
1        Czech Republic             7          0
14               Sweden             7          0
4                France             6          0
12               Russia             6          0
3               England             5          0
8           Netherlands             5          0
15              Ukraine             5          0
2               Denmark             4          0
5               Germany             4          0
----------6----------
7
----------7----------Team  Goals
5   Germany     10
13    Spain     12
----------8----------
5    Germany
6     Greece
Name: Team, dtype: object
5    Germany
6     Greece
Name: Team, dtype: object
----------9----------Team  Goals  Shots on target  Shots off target  \
0               Croatia      4               13                12   
1        Czech Republic      4               13                18   
2               Denmark      4               10                10   
3               England      5               11                18   
4                France      3               22                24   
5               Germany     10               32                32   
6                Greece      5                8                18   
7                 Italy      6               34                45   
8           Netherlands      2               12                36   
9                Poland      2               15                23   
10             Portugal      6               22                42   
11  Republic of Ireland      1                7                12   
12               Russia      5                9                31   
13                Spain     12               42                33   
14               Sweden      5               17                19   
15              Ukraine      2                7                26   Shooting Accuracy % Goals-to-shots  Total shots (inc. Blocked)  
0              51.9%            16.0%                          32  
1              41.9%            12.9%                          39  
2              50.0%            20.0%                          27  
3              50.0%            17.2%                          40  
4              37.9%             6.5%                          65  
5              47.8%            15.6%                          80  
6              30.7%            19.2%                          32  
7              43.0%             7.5%                         110  
8              25.0%             4.1%                          60  
9              39.4%             5.2%                          48  
10             34.3%             9.3%                          82  
11             36.8%             5.2%                          28  
12             22.5%            12.5%                          59  
13             55.9%            16.0%                         100  
14             47.2%            13.8%                          39  
15             21.2%             6.0%                          38  Team  Goals  Shots on target  Shots off target  \
0               Croatia      4               13                12   
1        Czech Republic      4               13                18   
2               Denmark      4               10                10   
3               England      5               11                18   
4                France      3               22                24   
5               Germany     10               32                32   
6                Greece      5                8                18   
7                 Italy      6               34                45   
8           Netherlands      2               12                36   
9                Poland      2               15                23   
10             Portugal      6               22                42   
11  Republic of Ireland      1                7                12   
12               Russia      5                9                31   
13                Spain     12               42                33   
14               Sweden      5               17                19   
15              Ukraine      2                7                26   Shooting Accuracy % Goals-to-shots  Total shots (inc. Blocked)  \
0              51.9%            16.0%                          32   
1              41.9%            12.9%                          39   
2              50.0%            20.0%                          27   
3              50.0%            17.2%                          40   
4              37.9%             6.5%                          65   
5              47.8%            15.6%                          80   
6              30.7%            19.2%                          32   
7              43.0%             7.5%                         110   
8              25.0%             4.1%                          60   
9              39.4%             5.2%                          48   
10             34.3%             9.3%                          82   
11             36.8%             5.2%                          28   
12             22.5%            12.5%                          59   
13             55.9%            16.0%                         100   
14             47.2%            13.8%                          39   
15             21.2%             6.0%                          38   Hit Woodwork  Penalty goals  Penalties not scored  ...  Clean Sheets  \
0              0              0                     0  ...             0   
1              0              0                     0  ...             1   
2              1              0                     0  ...             1   
3              0              0                     0  ...             2   
4              1              0                     0  ...             1   
5              2              1                     0  ...             1   
6              1              1                     1  ...             1   
7              2              0                     0  ...             2   
8              2              0                     0  ...             0   
9              0              0                     0  ...             0   
10             6              0                     0  ...             2   
11             0              0                     0  ...             0   
12             2              0                     0  ...             0   
13             0              1                     0  ...             5   
14             3              0                     0  ...             1   
15             0              0                     0  ...             0   Blocks  Goals conceded Saves made  Saves-to-shots ratio  Fouls Won  \
0       10               3         13                 81.3%         41   
1       10               6          9                 60.1%         53   
2       10               5         10                 66.7%         25   
3       29               3         22                 88.1%         43   
4        7               5          6                 54.6%         36   
5       11               6         10                 62.6%         63   
6       23               7         13                 65.1%         67   
7       18               7         20                 74.1%        101   
8        9               5         12                 70.6%         35   
9        8               3          6                 66.7%         48   
10      11               4         10                 71.5%         73   
11      23               9         17                 65.4%         43   
12       8               3         10                 77.0%         34   
13       8               1         15                 93.8%        102   
14      12               5          8                 61.6%         35   
15       4               4         13                 76.5%         48   Fouls Conceded  Offsides  Yellow Cards  Red Cards  
0               62         2             9          0  
1               73         8             7          0  
2               38         8             4          0  
3               45         6             5          0  
4               51         5             6          0  
5               49        12             4          0  
6               48        12             9          1  
7               89        16            16          0  
8               30         3             5          0  
9               56         3             7          1  
10              90        10            12          0  
11              51        11             6          1  
12              43         4             6          0  
13              83        19            11          0  
14              51         7             7          0  
15              31         4             5          0  [16 rows x 32 columns]Team Shooting Accuracy
3   England             50.0%
7     Italy             43.0%
12   Russia             22.5%

3. 数据分组

csv_path3="./pandas_data/drinks.csv"
#1:加载数据
drinks=pd.read_csv(csv_path3)
print(drinks)
print('----------1----------')
#2.计算各大洲啤酒平均消耗量
print(drinks.groupby('continent')['beer_servings'].mean())
print('----------2----------')
#3.计算各大洲红酒平均消耗量
print(drinks.groupby('continent')['wine_servings'].mean())
print('----------3----------')
#4.打印出各大洲每种酒类别的消耗平均值
print(drinks.groupby('continent')['beer_servings','spirit_servings','wine_servings'].mean())
print('----------4----------')
#5.打印出各大洲每种酒类别的消耗中位数
print(drinks.groupby('continent')['beer_servings','spirit_servings','wine_servings'].median())
print('----------5----------')
#6. 打印出各大洲对spirit饮品消耗的平均值，最大值和最小值
print(drinks.groupby('continent')['spirit_servings'].agg(['mean', 'min', 'max']))

         country  beer_servings  spirit_servings  wine_servings  \
0    Afghanistan              0                0              0   
1        Albania             89              132             54   
2        Algeria             25                0             14   
3        Andorra            245              138            312   
4         Angola            217               57             45   
..           ...            ...              ...            ...   
188    Venezuela            333              100              3   
189      Vietnam            111                2              1   
190        Yemen              6                0              0   
191       Zambia             32               19              4   
192     Zimbabwe             64               18              4   total_litres_of_pure_alcohol continent  
0                             0.0        AS  
1                             4.9        EU  
2                             0.7        AF  
3                            12.4        EU  
4                             5.9        AF  
..                            ...       ...  
188                           7.7        SA  
189                           2.0        AS  
190                           0.1        AS  
191                           2.5        AF  
192                           4.7        AF  [193 rows x 6 columns]
----------1----------
continent
AF     61.471698
AS     37.045455
EU    193.777778
OC     89.687500
SA    175.083333
Name: beer_servings, dtype: float64
----------2----------
continent
AF     16.264151
AS      9.068182
EU    142.222222
OC     35.625000
SA     62.416667
Name: wine_servings, dtype: float64
----------3----------beer_servings  spirit_servings  wine_servings
continent                                               
AF             61.471698        16.339623      16.264151
AS             37.045455        60.840909       9.068182
EU            193.777778       132.555556     142.222222
OC             89.687500        58.437500      35.625000
SA            175.083333       114.750000      62.416667
----------4----------beer_servings  spirit_servings  wine_servings
continent                                               
AF                  32.0              3.0            2.0
AS                  17.5             16.0            1.0
EU                 219.0            122.0          128.0
OC                  52.5             37.0            8.5
SA                 162.5            108.5           12.0
----------5----------mean  min  max
continent                      
AF          16.339623    0  152
AS          60.840909    0  326
EU         132.555556    0  373
OC          58.437500    0  254
SA         114.750000   25  302/var/folders/cr/2fpn8__12377w89ml3mv5ksw0000gn/T/ipykernel_74870/3785898223.py:13: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.print(drinks.groupby('continent')['beer_servings','spirit_servings','wine_servings'].mean())
/var/folders/cr/2fpn8__12377w89ml3mv5ksw0000gn/T/ipykernel_74870/3785898223.py:16: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.print(drinks.groupby('continent')['beer_servings','spirit_servings','wine_servings'].median())

4. Apply函数

注意：在 Pandas 中，你可以使用 pd.to_datetime 函数将一个包含日期或时间信息的列转换为 datetime64 数据类型。
pd.to_datetime 函数用于将输入的日期、时间、字符串或类似对象转换为 Pandas 中的 datetime64[ns] 类型。以下是该函数的主要参数说明：
语法:
pd.to_datetime(arg, errors='raise', dayfirst=False, yearfirst=False, utc=None, format=None, exact=True, unit=None, infer_datetime_format=False, origin='unix', cache=False)
主要参数:
arg: 要转换的日期、时间、字符串或类似对象。
errors: 指定在转换失败时的处理方式，可以是 ‘raise’（默认，抛出异常）、‘coerce’（将无法转换的值设为 NaT）或 ‘ignore’（忽略错误）。
dayfirst: 如果为 True，解析的字符串中的日期在前，月份在后。默认为 False。
yearfirst: 如果为 True，解析的字符串中的年份在前，月份在后。默认为 False。
utc: 如果为 True，则返回的时间是 UTC 标准时间。默认为 None。
format: 指定日期字符串的格式，可以提高解析速度。如果未指定，则尝试使用通用解析器。
exact: 如果为 False，允许近似解析，例如将日期范围扩大到有效范围内。默认为 True。
unit: 控制解析结果的时间单位，可以是 ‘D’（日）、‘s’（秒）、‘ms’（毫秒）、‘us’（微秒）、‘ns’（纳秒）。
infer_datetime_format: 如果为 True，尝试推断日期字符串的格式以提高解析速度。默认为 False。
origin: 设置日期的起始点，可以是 ‘unix’（默认，1970-01-01），‘epoch’（1970-01-01），或一个具体的日期字符串。
cache: 如果为 True，则缓存解析后的日期，提高性能。默认为 False。

set_index 是 Pandas 中用于设置 DataFrame 索引的函数。该函数可以将一个或多个列设置为 DataFrame 的索引，或者通过设置 drop 参数保留原始列并将其从 DataFrame 中移除。
作用：设置 DataFrame 的索引，可以根据指定的列或多列构建一个新的索引。
语法：
DataFrame.set_index(keys, drop=True, append=False, inplace=False, verify_integrity=False)
主要参数说明：
keys: 用于设置索引的列名，可以是单个列名或列名的列表。
drop: 如果为 True，则将设置为索引的列从 DataFrame 中删除，默认为 True。
append: 如果为 True，则将新索引添加到现有索引的末尾，形成多级索引，默认为 False。
inplace: 如果为 True，则在原地修改 DataFrame，否则返回一个新的 DataFrame，默认为 False。
verify_integrity: 如果为 True，则检查新的索引是否唯一。如果新索引中存在重复值，将引发 ValueError，默认为 False。

resample 函数是 Pandas 中用于对时间序列数据进行重新采样的重要工具。它允许你按照指定的时间频率对数据进行聚合、转换或者采样。
主要作用：
聚合和汇总：将时间序列数据按照指定的时间频率进行分组，然后进行聚合操作，比如求和、平均值等。
转换：可以对时间序列数据进行转换操作，例如插值、填充缺失值等。
降采样和升采样：降采样是指将高频率的数据聚合为低频率，而升采样是指将低频率的数据转换为高频率。
语法：
DataFrame.resample(rule, how=None, axis=0, fill_method=None, closed=None, label=None, convention='start', kind=None, loffset=None, limit=None, base=0, on=None, level=None)
主要参数说明：
rule: 重新采样的规则，可以是字符串（如 ‘D’ 表示日，‘M’ 表示月）或者 Timedelta 对象。
‘D’: 每天
‘W’: 每周
‘M’: 每月
‘Q’: 每季度
‘A’: 每年
‘AS’: 每年的开始（Annual Start）
how: 聚合函数，例如 ‘sum’、‘mean’ 等。默认为 None，表示使用每个时间窗口的第一个数据。
axis: 指定要操作的轴，默认为 0。
fill_method: 用于升采样时填充缺失值的方法，比如 ‘ffill’（向前填充）或 ‘bfill’（向后填充）。
closed: 控制区间的闭合方式，‘right’ 表示右闭合，‘left’ 表示左闭合，默认为 None。
label: 控制标签的选择，可以是 ‘left’（使用左边界标签）或 ‘right’（使用右边界标签），默认为 None。
convention: 用于区间的开合方式，可以是 ‘start’（默认，表示左闭右开）或 ‘end’（表示左开右闭）。
kind: 指定采样的类型，可以是 ‘timestamp’（时间戳，默认）或 ‘period’（周期）。
loffset: 用于调整采样结果的时间偏移。
limit: 用于降采样时限制填充的连续 NaN 的个数。
base: 用于设置相对周期的基准值。
on: 用于对 DataFrame 进行按列重采样时指定用于采样的列。
level: 用于 MultiIndex 的级别。

idxmax() 是 Pandas 中的一个函数，它返回 Series 或 DataFrame 中最大值所在的索引位置。具体作用如下：
作用：返回最大值所在的索引位置。
语法：
Series.idxmax(axis=0, skipna=True, *args, **kwargs)
axis: 用于指定轴方向，对于 Series，只能是 0；对于 DataFrame，可以是 0 或 1，默认为 0。
skipna: 控制是否忽略 NaN 值，默认为 True。

csv_path4="./pandas_data/US_Crime_Rates_1960_2014.csv"
#1:加载数据
crime=pd.read_csv(csv_path4)
print(crime.head())
print('----------1----------')
#2.查看数据集信息
print(crime.info())
print('----------2----------')
#3.将Year列数据类型转为datetime64
print(crime['Year'].dtype)
crime['Year']=pd.to_datetime(crime['Year'],format='%Y')
print(crime['Year'].dtype)
print('----------3----------')
#4.将Year设置为数据集索引
crime=crime.set_index('Year',drop=True)
print(crime.head())
print('----------4----------')
#5.删除Total列
#方法1
crime.drop('Total',axis=1,inplace=True)
#方法2
#del crime['Total']
print(crime.head())
print('----------5----------')
#6. 按照Year对数据进行分组求和
crimes=crime.resample('10AS').sum()
population = crime['Population'].resample('10AS').max()
crimes['Population'] = population
print(crimes)
print('----------6----------')
#7.  打印历史最危险的时代
print(crime.idxmax())

   Year  Population    Total  Violent  Property  Murder  Forcible_Rape  \
0  1960   179323175  3384200   288460   3095700    9110          17190   
1  1961   182992000  3488000   289390   3198600    8740          17220   
2  1962   185771000  3752200   301510   3450700    8530          17550   
3  1963   188483000  4109500   316970   3792500    8640          17650   
4  1964   191141000  4564600   364220   4200400    9360          21420   Robbery  Aggravated_assault  Burglary  Larceny_Theft  Vehicle_Theft  
0   107840              154320    912100        1855400         328200  
1   106670              156760    949600        1913000         336000  
2   110860              164570    994300        2089600         366800  
3   116470              174210   1086400        2297800         408300  
4   130390              203050   1213200        2514400         472800  
----------1----------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 55 entries, 0 to 54
Data columns (total 12 columns):#   Column              Non-Null Count  Dtype
---  ------              --------------  -----0   Year                55 non-null     int641   Population          55 non-null     int642   Total               55 non-null     int643   Violent             55 non-null     int644   Property            55 non-null     int645   Murder              55 non-null     int646   Forcible_Rape       55 non-null     int647   Robbery             55 non-null     int648   Aggravated_assault  55 non-null     int649   Burglary            55 non-null     int6410  Larceny_Theft       55 non-null     int6411  Vehicle_Theft       55 non-null     int64
dtypes: int64(12)
memory usage: 5.3 KB
None
----------2----------
int64
datetime64[ns]
----------3----------Population    Total  Violent  Property  Murder  Forcible_Rape  \
Year                                                                        
1960-01-01   179323175  3384200   288460   3095700    9110          17190   
1961-01-01   182992000  3488000   289390   3198600    8740          17220   
1962-01-01   185771000  3752200   301510   3450700    8530          17550   
1963-01-01   188483000  4109500   316970   3792500    8640          17650   
1964-01-01   191141000  4564600   364220   4200400    9360          21420   Robbery  Aggravated_assault  Burglary  Larceny_Theft  \
Year                                                               
1960-01-01   107840              154320    912100        1855400   
1961-01-01   106670              156760    949600        1913000   
1962-01-01   110860              164570    994300        2089600   
1963-01-01   116470              174210   1086400        2297800   
1964-01-01   130390              203050   1213200        2514400   Vehicle_Theft  
Year                       
1960-01-01         328200  
1961-01-01         336000  
1962-01-01         366800  
1963-01-01         408300  
1964-01-01         472800  
----------4----------Population  Violent  Property  Murder  Forcible_Rape  Robbery  \
Year                                                                        
1960-01-01   179323175   288460   3095700    9110          17190   107840   
1961-01-01   182992000   289390   3198600    8740          17220   106670   
1962-01-01   185771000   301510   3450700    8530          17550   110860   
1963-01-01   188483000   316970   3792500    8640          17650   116470   
1964-01-01   191141000   364220   4200400    9360          21420   130390   Aggravated_assault  Burglary  Larceny_Theft  Vehicle_Theft  
Year                                                                    
1960-01-01              154320    912100        1855400         328200  
1961-01-01              156760    949600        1913000         336000  
1962-01-01              164570    994300        2089600         366800  
1963-01-01              174210   1086400        2297800         408300  
1964-01-01              203050   1213200        2514400         472800  
----------5----------Population   Violent   Property  Murder  Forcible_Rape  Robbery  \
Year                                                                          
1960-01-01   201385000   4134930   45160900  106180         236720  1633510   
1970-01-01   220099000   9607930   91383800  192230         554570  4159020   
1980-01-01   248239000  14074328  117048900  206439         865639  5383109   
1990-01-01   272690813  17527048  119053499  211664         998827  5748930   
2000-01-01   307006550  13968056  100944369  163068         922499  4230366   
2010-01-01   318857056   6072017   44095950   72867         421059  1749809   Aggravated_assault  Burglary  Larceny_Theft  Vehicle_Theft  
Year                                                                    
1960-01-01             2158520  13321100       26547700        5292100  
1970-01-01             4702120  28486000       53157800        9739900  
1980-01-01             7619130  33073494       72040253       11935411  
1990-01-01            10568963  26750015       77679366       14624418  
2000-01-01             8652124  21565176       67970291       11412834  
2010-01-01             3764142  10125170       30401698        3569080  
----------6----------
Population           2014-01-01
Violent              1992-01-01
Property             1991-01-01
Murder               1991-01-01
Forcible_Rape        1992-01-01
Robbery              1991-01-01
Aggravated_assault   1993-01-01
Burglary             1980-01-01
Larceny_Theft        1991-01-01
Vehicle_Theft        1991-01-01
dtype: datetime64[ns]

5. 合并数据

#1:构造测试数据
raw_data_1 = {'subject_id': ['1', '2', '3', '4', '5'],'first_name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],'last_name': ['Anderson', 'Ackerman', 'Ali', 'Aoni', 'Atiches']}raw_data_2 = {'subject_id': ['4', '5', '6', '7', '8'],'first_name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],'last_name': ['Bonder', 'Black', 'Balwner', 'Brice', 'Btisan']}raw_data_3 = {'subject_id': ['1', '2', '3', '4', '5', '7', '8', '9', '10', '11'],'test_id': [51, 15, 15, 61, 16, 14, 15, 1, 61, 16]}print('----------1----------')
#2.装载数据
data1=pd.DataFrame(raw_data_1,columns=['subject_id', 'first_name', 'last_name'])
print(data1)
data2=pd.DataFrame(raw_data_2,columns=['subject_id', 'first_name', 'last_name'])
print('---------------------')
print(data2)
data3 = pd.DataFrame(raw_data_3, columns=['subject_id', 'test_id'])
print('---------------------')
print(data3)
print('----------2----------')
#3.行维度合并data1、data2
all_data=pd.concat([data1,data2])
print(all_data)
print('----------3----------')
#4.列维度合并data1、data2
all_data2=pd.concat([data1,data2],axis=1)
print(all_data2)
print('----------4----------')
#5.按照subject_id，合并data_all和data3
print(pd.merge(all_data, data3, on='subject_id'))
print('----------5----------')
#6. 按照subject_id，合并data1、data2
print(pd.merge(data1,data2,on='subject_id',how='inner'))
print('----------6----------')
#7. 按照subject_id，合并data1、data2
print(pd.merge(data1, data2, on='subject_id', how='outer'))

----------1----------subject_id first_name last_name
0          1       Alex  Anderson
1          2        Amy  Ackerman
2          3      Allen       Ali
3          4      Alice      Aoni
4          5     Ayoung   Atiches
---------------------subject_id first_name last_name
0          4      Billy    Bonder
1          5      Brian     Black
2          6       Bran   Balwner
3          7      Bryce     Brice
4          8      Betty    Btisan
---------------------subject_id  test_id
0          1       51
1          2       15
2          3       15
3          4       61
4          5       16
5          7       14
6          8       15
7          9        1
8         10       61
9         11       16
----------2----------subject_id first_name last_name
0          1       Alex  Anderson
1          2        Amy  Ackerman
2          3      Allen       Ali
3          4      Alice      Aoni
4          5     Ayoung   Atiches
0          4      Billy    Bonder
1          5      Brian     Black
2          6       Bran   Balwner
3          7      Bryce     Brice
4          8      Betty    Btisan
----------3----------subject_id first_name last_name subject_id first_name last_name
0          1       Alex  Anderson          4      Billy    Bonder
1          2        Amy  Ackerman          5      Brian     Black
2          3      Allen       Ali          6       Bran   Balwner
3          4      Alice      Aoni          7      Bryce     Brice
4          5     Ayoung   Atiches          8      Betty    Btisan
----------4----------subject_id first_name last_name  test_id
0          1       Alex  Anderson       51
1          2        Amy  Ackerman       15
2          3      Allen       Ali       15
3          4      Alice      Aoni       61
4          4      Billy    Bonder       61
5          5     Ayoung   Atiches       16
6          5      Brian     Black       16
7          7      Bryce     Brice       14
8          8      Betty    Btisan       15
----------5----------subject_id first_name_x last_name_x first_name_y last_name_y
0          4        Alice        Aoni        Billy      Bonder
1          5       Ayoung     Atiches        Brian       Black
----------6----------subject_id first_name_x last_name_x first_name_y last_name_y
0          1         Alex    Anderson          NaN         NaN
1          2          Amy    Ackerman          NaN         NaN
2          3        Allen         Ali          NaN         NaN
3          4        Alice        Aoni        Billy      Bonder
4          5       Ayoung     Atiches        Brian       Black
5          6          NaN         NaN         Bran     Balwner
6          7          NaN         NaN        Bryce       Brice
7          8          NaN         NaN        Betty      Btisan

6. 数据统计

pd.read_table 函数是 Pandas 中用于从文本文件读取数据的函数。该函数的主要作用是将文本数据读取为 DataFrame 对象，方便后续的数据分析和处理。
语法:
pd.read_table(filepath_or_buffer, sep='\t', delimiter=None, header='infer', names=None, index_col=None, usecols=None, engine='c', skiprows=None, nrows=None, skipfooter=0, skip_blank_lines=True, encoding=None, squeeze=False, thousands=None, decimal=b'.', lineterminator=None, quotechar='"', quoting=0, escapechar=None, comment=None, float_precision=None, parse_dates=False, infer_datetime_format=False, keep_date_col=False, dayfirst=False, date_parser=None, memory_map=False, na_values=None, true_values=None, false_values=None, delimiter_whitespace=False, converters=None, dtype=None, use_unsigned=False, low_memory=True, buffer_lines=None, warn_bad_lines=True, error_bad_lines=True, keep_default_na=True, thousands=',', comment=None, decimal='.', lineterminator=None, quotechar='"', quoting=0, escapechar=None, comment=None, float_precision=None)
主要参数说明:
filepath_or_buffer: 文件路径或文件对象，表示要读取的文本文件。
sep: 列之间的分隔符，默认为制表符 \t。
delimiter: 与 sep 参数功能相同，指定列之间的分隔符。
header: 指定哪一行作为列名，默认为 ‘infer’，表示自动推断。
names: 用于指定列名的列表。
index_col: 指定哪一列作为行索引，可以是列名或列的索引。
usecols: 指定要读取的列，可以是列名或列的索引。
parse_dates: 解析日期的列，可以是列名、列的索引或包含列的列表。
dtype: 指定列的数据类型。
其他参数用于处理文件的格式、编码、缺失值等情况。

import datetime
csv_path6="./pandas_data/wind.data"
#1:加载数据 "\s+"指定分隔符为一个或者多个空格，并且在parse_dates参数可以接受第0，1，2列合并为一个日期时间列
data = pd.read_table(csv_path6, sep="\s+", parse_dates=[[0, 1, 2]])
print(data.head())
print('----------1----------')
#2.修复step1中自动创建索引的错误数据(2061年？)
def fix_year(x):year=x.year-100 if x.year > 1989 else x.yearreturn datetime.date(year,x.month,x.day)
data['Yr_Mo_Dy']=data['Yr_Mo_Dy'].apply(fix_year)
print(data.head())
print('----------2----------')
#3.将Yr_Mo_Dy设置为索引，类型datetime64[ns]
data['Yr_Mo_Dy']=pd.to_datetime(data['Yr_Mo_Dy'])
data.set_index('Yr_Mo_Dy',inplace=True)
print(data)
print('----------3----------')
#4.统计每个location数据缺失值(每列)
print(data.isnull().sum())
print('----------4----------')
#5.统计每个location数据完整值 data.isnull的每个元素都是布尔值，表示该位置是否缺失，data.isnull().sum()对列求和，得到每列缺失值
print(data.shape[0]-data.isnull().sum())
print('----------5----------')
#6. 计算所有数据平均值
#data.mean()是对每一列取均值，data.mean().mean()对这个包含每个列均值的Series再次取得均值，得到最终结果
print(data.mean().mean())
print('----------6----------')
#7.  创建数据集，存储每个location最小值、最大值、平均值、标准差
loc_stats=pd.DataFrame()
loc_stats['min']=data.min()
loc_stats['max']=data.max()
loc_stats['mean'] = data.mean()
loc_stats['std'] = data.std()
print(loc_stats)
print('----------7----------')
# 8. 创建数据集，存储所有location最小值、最大值、平均值、标准差
day_stats = pd.DataFrame()
day_stats['min'] = data.min(axis=1)
day_stats['max'] = data.max(axis=1)
day_stats['mean'] = data.mean(axis=1)
day_stats['std'] = data.std(axis=1)
print(day_stats.head())

    Yr_Mo_Dy    RPT    VAL    ROS    KIL    SHA   BIR    DUB    CLA    MUL  \
0 2061-01-01  15.04  14.96  13.17   9.29    NaN  9.87  13.67  10.25  10.83   
1 2061-01-02  14.71    NaN  10.83   6.50  12.62  7.67  11.50  10.04   9.79   
2 2061-01-03  18.50  16.88  12.33  10.13  11.17  6.17  11.25    NaN   8.50   
3 2061-01-04  10.58   6.63  11.75   4.58   4.54  2.88   8.63   1.79   5.83   
4 2061-01-05  13.33  13.25  11.42   6.17  10.71  8.21  11.92   6.54  10.92   CLO    BEL    MAL  
0  12.58  18.50  15.04  
1   9.67  17.54  13.83  
2   7.67  12.75  12.71  
3   5.88   5.46  10.88  
4  10.34  12.92  11.83  
----------1----------Yr_Mo_Dy    RPT    VAL    ROS    KIL    SHA   BIR    DUB    CLA    MUL  \
0  1961-01-01  15.04  14.96  13.17   9.29    NaN  9.87  13.67  10.25  10.83   
1  1961-01-02  14.71    NaN  10.83   6.50  12.62  7.67  11.50  10.04   9.79   
2  1961-01-03  18.50  16.88  12.33  10.13  11.17  6.17  11.25    NaN   8.50   
3  1961-01-04  10.58   6.63  11.75   4.58   4.54  2.88   8.63   1.79   5.83   
4  1961-01-05  13.33  13.25  11.42   6.17  10.71  8.21  11.92   6.54  10.92   CLO    BEL    MAL  
0  12.58  18.50  15.04  
1   9.67  17.54  13.83  
2   7.67  12.75  12.71  
3   5.88   5.46  10.88  
4  10.34  12.92  11.83  
----------2----------RPT    VAL    ROS    KIL    SHA    BIR    DUB    CLA    MUL  \
Yr_Mo_Dy                                                                    
1961-01-01  15.04  14.96  13.17   9.29    NaN   9.87  13.67  10.25  10.83   
1961-01-02  14.71    NaN  10.83   6.50  12.62   7.67  11.50  10.04   9.79   
1961-01-03  18.50  16.88  12.33  10.13  11.17   6.17  11.25    NaN   8.50   
1961-01-04  10.58   6.63  11.75   4.58   4.54   2.88   8.63   1.79   5.83   
1961-01-05  13.33  13.25  11.42   6.17  10.71   8.21  11.92   6.54  10.92   
...           ...    ...    ...    ...    ...    ...    ...    ...    ...   
1978-12-27  17.58  16.96  17.62   8.08  13.21  11.67  14.46  15.59  14.04   
1978-12-28  13.21   5.46  13.46   5.00   8.12   9.42  14.33  16.25  15.25   
1978-12-29  14.00  10.29  14.42   8.71   9.71  10.54  19.17  12.46  14.50   
1978-12-30  18.50  14.04  21.29   9.13  12.75   9.71  18.08  12.87  12.46   
1978-12-31  20.33  17.41  27.29   9.59  12.08  10.13  19.25  11.63  11.58   CLO    BEL    MAL  
Yr_Mo_Dy                         
1961-01-01  12.58  18.50  15.04  
1961-01-02   9.67  17.54  13.83  
1961-01-03   7.67  12.75  12.71  
1961-01-04   5.88   5.46  10.88  
1961-01-05  10.34  12.92  11.83  
...           ...    ...    ...  
1978-12-27  14.00  17.21  40.08  
1978-12-28  18.05  21.79  41.46  
1978-12-29  16.42  18.88  29.58  
1978-12-30  12.12  14.67  28.79  
1978-12-31  11.38  12.08  22.08  [6574 rows x 12 columns]
----------3----------
RPT    6
VAL    3
ROS    2
KIL    5
SHA    2
BIR    0
DUB    3
CLA    2
MUL    3
CLO    1
BEL    0
MAL    4
dtype: int64
----------4----------
RPT    6568
VAL    6571
ROS    6572
KIL    6569
SHA    6572
BIR    6574
DUB    6571
CLA    6572
MUL    6571
CLO    6573
BEL    6574
MAL    6570
dtype: int64
----------5----------
10.227982360836924
----------6----------min    max       mean       std
RPT  0.67  35.80  12.362987  5.618413
VAL  0.21  33.37  10.644314  5.267356
ROS  1.50  33.84  11.660526  5.008450
KIL  0.00  28.46   6.306468  3.605811
SHA  0.13  37.54  10.455834  4.936125
BIR  0.00  26.16   7.092254  3.968683
DUB  0.00  30.37   9.797343  4.977555
CLA  0.00  31.08   8.495053  4.499449
MUL  0.00  25.88   8.493590  4.166872
CLO  0.04  28.21   8.707332  4.503954
BEL  0.13  42.38  13.121007  5.835037
MAL  0.67  42.54  15.599079  6.699794
----------7----------min    max       mean       std
Yr_Mo_Dy                                    
1961-01-01  9.29  18.50  13.018182  2.808875
1961-01-02  6.50  17.54  11.336364  3.188994
1961-01-03  6.17  18.50  11.641818  3.681912
1961-01-04  1.79  11.75   6.619167  3.198126
1961-01-05  6.17  13.33  10.630000  2.445356

7. 数据可视化

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
csv_path7="./pandas_data/train.csv"
#1:加载数据
titantic=pd.read_csv(csv_path7)
print(titantic.head())
print('----------1----------')
#2.设置索引
titantic.set_index('PassengerId',inplace=True)
print(titantic.head())
print('----------2----------')
#3.分别统计男女乘客数量
mal_sum=(titantic['Sex']=='male').sum()
female_sum=(titantic['Sex']=='female').sum()
print(mal_sum,female_sum)
print('----------3----------') 
#4.绘制表示乘客票价、年龄、性别的散点图 hue='Sex'根据性别分别用不同颜色表示三点，fit_reg=false 不显示回归线
lm=sns.lmplot(x='Age',y='Fare',data=titantic,hue='Sex',fit_reg=False)
lm.set(title='Fare x Age')
#获取图的坐标轴对象
axes=lm.axes
#设置横轴范围，将下限设为-5
axes[0,0].set_ylim(-5,)
#设置纵轴范围，将下限设为05，上限85
axes[0,0].set_xlim(-5,85)
plt.show()
print('----------4----------')
#5.统计生还人数
print(titantic['Survived'].sum())print('----------5----------')
#6. 绘制展示票价的直方图
data=titantic['Fare'].sort_values(ascending=False)
print(data)
binsVal=np.arange(0,600,10)
plt.hist(data,bins=binsVal)
plt.xlabel('Fare')
#纵轴表示价格在某个区间的数据店数量
plt.ylabel('Frequency')
plt.title('Fare Payed Histrogram')
plt.show()
print('----------6----------')

   PassengerId  Survived  Pclass  \
0            1         0       3   
1            2         1       1   
2            3         1       3   
3            4         1       1   
4            5         0       3   Name     Sex   Age  SibSp  \
0                            Braund, Mr. Owen Harris    male  22.0      1   
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                             Heikkinen, Miss. Laina  female  26.0      0   
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                           Allen, Mr. William Henry    male  35.0      0   Parch            Ticket     Fare Cabin Embarked  
0      0         A/5 21171   7.2500   NaN        S  
1      0          PC 17599  71.2833   C85        C  
2      0  STON/O2. 3101282   7.9250   NaN        S  
3      0            113803  53.1000  C123        S  
4      0            373450   8.0500   NaN        S  
----------1----------Survived  Pclass  \
PassengerId                     
1                   0       3   
2                   1       1   
3                   1       3   
4                   1       1   
5                   0       3   Name     Sex   Age  \
PassengerId                                                                    
1                                      Braund, Mr. Owen Harris    male  22.0   
2            Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0   
3                                       Heikkinen, Miss. Laina  female  26.0   
4                 Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0   
5                                     Allen, Mr. William Henry    male  35.0   SibSp  Parch            Ticket     Fare Cabin Embarked  
PassengerId                                                          
1                1      0         A/5 21171   7.2500   NaN        S  
2                1      0          PC 17599  71.2833   C85        C  
3                0      0  STON/O2. 3101282   7.9250   NaN        S  
4                1      0            113803  53.1000  C123        S  
5                0      0            373450   8.0500   NaN        S  
----------2----------
577 314
----------3----------

png

----------4----------
342
----------5----------
PassengerId
259    512.3292
738    512.3292
680    512.3292
89     263.0000
28     263.0000...   
634      0.0000
414      0.0000
823      0.0000
733      0.0000
675      0.0000
Name: Fare, Length: 891, dtype: float64

png

----------6----------

从此看出船票主要集中在0-100的价格区间

8. 创建数据框

#1. 构造数据
raw_data = {"name": ['Bulbasaur', 'Charmander','Squirtle','Caterpie'],"evolution": ['Ivysaur','Charmeleon','Wartortle','Metapod'],"type": ['grass', 'fire', 'water', 'bug'],"hp": [45, 39, 44, 45],"pokedex": ['yes', 'no','yes','no']}
pokemon = pd.DataFrame(raw_data)
print(pokemon.head())
print('----------1----------')
#2.修改列排序
pokemon=pokemon[['name','type','hp','evolution','pokedex']]
print(pokemon)
print('----------2----------')
#3.新增place列
pokemon['place']=['park','street','lake','forest']
print(pokemon)
print('----------3----------')
#4.查看每列的数据类型
#方法1
print(pokemon.dtypes)
#方法2
print(pokemon.info())

         name   evolution   type  hp pokedex
0   Bulbasaur     Ivysaur  grass  45     yes
1  Charmander  Charmeleon   fire  39      no
2    Squirtle   Wartortle  water  44     yes
3    Caterpie     Metapod    bug  45      no
----------1----------name   type  hp   evolution pokedex
0   Bulbasaur  grass  45     Ivysaur     yes
1  Charmander   fire  39  Charmeleon      no
2    Squirtle  water  44   Wartortle     yes
3    Caterpie    bug  45     Metapod      no
----------2----------name   type  hp   evolution pokedex   place
0   Bulbasaur  grass  45     Ivysaur     yes    park
1  Charmander   fire  39  Charmeleon      no  street
2    Squirtle  water  44   Wartortle     yes    lake
3    Caterpie    bug  45     Metapod      no  forest
----------3----------
name         object
type         object
hp            int64
evolution    object
pokedex      object
place        object
dtype: object
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 6 columns):#   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 0   name       4 non-null      object1   type       4 non-null      object2   hp         4 non-null      int64 3   evolution  4 non-null      object4   pokedex    4 non-null      object5   place      4 non-null      object
dtypes: int64(1), object(5)
memory usage: 320.0+ bytes
None

9. 时间序列

is_unique 是 Pandas Series 对象的一个属性，用于检查 Series 中的值是否都是唯一的。具体作用如下：
如果 Series 中的所有值都是唯一的，is_unique 返回 True。
如果 Series 中存在重复的值，is_unique 返回 False。

csv_path9="./pandas_data/Apple_stock.csv"
#1:加载数据
apple =pd.read_csv(csv_path9)
print(apple.head())
print('----------1----------')
#2.查看每列的数据类型
print(apple.dtypes)
print('----------2----------')
#3.将Date转换为datetime类型
apple['Date']=pd.to_datetime(apple['Date'])
print(apple['Date'].dtype)
print('----------3----------')
#4.将Date设置为索引
apple.set_index('Date',inplace=True)
print(apple.head())
print('----------4----------')
#5.查看是否有重复日期
print(apple.index.is_unique)
print('----------5----------')
#6. 将index设置为升序
apple.sort_index(ascending=True)
print('----------6----------')
#7.获取每月的最后一个交易日
#注意B表示Business Day为工作日，M为月份
#last() 是采样的聚合函数，它选择每个时间窗口中的最后一个数据点
apple_month = apple.resample('BM').last()
print(apple_month.head())
print('----------7----------')
#8. 计算数据集中最早日期和最晚日期相差多少天
print((apple.index.max()-apple.index.min()).days)
print('----------8----------')
#9. 计算数据集中一共有多少个月
months_count = apple.resample('M').count()
#方法1
print(months_count.shape[0])
#方法2
print(len(months_count))
print('----------9----------')
#10. 按照时间顺序可视化Adj Close值【绘制苹果股票的调整后收盘价的折线图】
appl_open = apple['Adj Close'].plot(title = "Apple Stock")
#获取折线图所在的 Figure 对象。
fig = appl_open.get_figure()
fig.set_size_inches(13.5, 9)
plt.show()

         Date   Open   High    Low  Close    Volume  Adj Close
0  2014-07-08  96.27  96.80  93.92  95.35  65130000      95.35
1  2014-07-07  94.14  95.99  94.10  95.97  56305400      95.97
2  2014-07-03  93.67  94.10  93.20  94.03  22891800      94.03
3  2014-07-02  93.87  94.06  93.09  93.48  28420900      93.48
4  2014-07-01  93.52  94.07  93.13  93.52  38170200      93.52
----------1----------
Date          object
Open         float64
High         float64
Low          float64
Close        float64
Volume         int64
Adj Close    float64
dtype: object
----------2----------
datetime64[ns]
----------3----------Open   High    Low  Close    Volume  Adj Close
Date                                                       
2014-07-08  96.27  96.80  93.92  95.35  65130000      95.35
2014-07-07  94.14  95.99  94.10  95.97  56305400      95.97
2014-07-03  93.67  94.10  93.20  94.03  22891800      94.03
2014-07-02  93.87  94.06  93.09  93.48  28420900      93.48
2014-07-01  93.52  94.07  93.13  93.52  38170200      93.52
----------4----------
True
----------5----------
----------6----------Open   High    Low  Close    Volume  Adj Close
Date                                                       
1980-12-31  34.25  34.25  34.13  34.13   8937600       0.53
1981-01-30  28.50  28.50  28.25  28.25  11547200       0.44
1981-02-27  26.50  26.75  26.50  26.50   3690400       0.41
1981-03-31  24.75  24.75  24.50  24.50   3998400       0.38
1981-04-30  28.38  28.62  28.38  28.38   3152800       0.44
----------7----------
12261
----------8----------
404
404
----------9----------

png

10. 删除数据

csv_path10="./pandas_data/iris.csv"
#1:加载数据
iris =pd.read_csv(csv_path10)
print(iris.head())
print('----------1----------')
#2.添加列名称
iris = pd.read_csv(csv_path10, names=['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class'])
print(iris.head())print('----------2----------')
#3.查看是否有缺失值
print(iris.isnull().sum())print('----------3----------')
#4.将列petal_length的第10到19行设置为缺失值
iris.iloc[10:20,2:3]=np.nan
print(iris.head(20))
print('----------4----------')
#5.将缺失值替换为1.0
iris['petal_length'].fillna(1,inplace=True)
print(iris.head(20))
print('----------5----------')
#6.删除class列
#方法1
iris.drop('class',axis=1,inplace=True)
print(iris.head())
#方法2
# del iris['class']
# print(iris.head())
print('----------6----------')
#7.数据集前三行设置为NaN
iris.iloc[0:3,:]=np.nan
print(iris.head())print('----------7----------')
#8. 删除含有NaN的行
iris=iris.dropna(how='any')
print('----------8----------')
#9. 重置索引
iris.reset_index(drop=True)
print(iris.head())
print('----------9----------')

   5.1  3.5  1.4  0.2  Iris-setosa
0  4.9  3.0  1.4  0.2  Iris-setosa
1  4.7  3.2  1.3  0.2  Iris-setosa
2  4.6  3.1  1.5  0.2  Iris-setosa
3  5.0  3.6  1.4  0.2  Iris-setosa
4  5.4  3.9  1.7  0.4  Iris-setosa
----------1----------sepal_length  sepal_width  petal_length  petal_width        class
0           5.1          3.5           1.4          0.2  Iris-setosa
1           4.9          3.0           1.4          0.2  Iris-setosa
2           4.7          3.2           1.3          0.2  Iris-setosa
3           4.6          3.1           1.5          0.2  Iris-setosa
4           5.0          3.6           1.4          0.2  Iris-setosa
----------2----------
sepal_length    0
sepal_width     0
petal_length    0
petal_width     0
class           0
dtype: int64
----------3----------sepal_length  sepal_width  petal_length  petal_width        class
0            5.1          3.5           1.4          0.2  Iris-setosa
1            4.9          3.0           1.4          0.2  Iris-setosa
2            4.7          3.2           1.3          0.2  Iris-setosa
3            4.6          3.1           1.5          0.2  Iris-setosa
4            5.0          3.6           1.4          0.2  Iris-setosa
5            5.4          3.9           1.7          0.4  Iris-setosa
6            4.6          3.4           1.4          0.3  Iris-setosa
7            5.0          3.4           1.5          0.2  Iris-setosa
8            4.4          2.9           1.4          0.2  Iris-setosa
9            4.9          3.1           1.5          0.1  Iris-setosa
10           5.4          3.7           NaN          0.2  Iris-setosa
11           4.8          3.4           NaN          0.2  Iris-setosa
12           4.8          3.0           NaN          0.1  Iris-setosa
13           4.3          3.0           NaN          0.1  Iris-setosa
14           5.8          4.0           NaN          0.2  Iris-setosa
15           5.7          4.4           NaN          0.4  Iris-setosa
16           5.4          3.9           NaN          0.4  Iris-setosa
17           5.1          3.5           NaN          0.3  Iris-setosa
18           5.7          3.8           NaN          0.3  Iris-setosa
19           5.1          3.8           NaN          0.3  Iris-setosa
----------4----------sepal_length  sepal_width  petal_length  petal_width        class
0            5.1          3.5           1.4          0.2  Iris-setosa
1            4.9          3.0           1.4          0.2  Iris-setosa
2            4.7          3.2           1.3          0.2  Iris-setosa
3            4.6          3.1           1.5          0.2  Iris-setosa
4            5.0          3.6           1.4          0.2  Iris-setosa
5            5.4          3.9           1.7          0.4  Iris-setosa
6            4.6          3.4           1.4          0.3  Iris-setosa
7            5.0          3.4           1.5          0.2  Iris-setosa
8            4.4          2.9           1.4          0.2  Iris-setosa
9            4.9          3.1           1.5          0.1  Iris-setosa
10           5.4          3.7           1.0          0.2  Iris-setosa
11           4.8          3.4           1.0          0.2  Iris-setosa
12           4.8          3.0           1.0          0.1  Iris-setosa
13           4.3          3.0           1.0          0.1  Iris-setosa
14           5.8          4.0           1.0          0.2  Iris-setosa
15           5.7          4.4           1.0          0.4  Iris-setosa
16           5.4          3.9           1.0          0.4  Iris-setosa
17           5.1          3.5           1.0          0.3  Iris-setosa
18           5.7          3.8           1.0          0.3  Iris-setosa
19           5.1          3.8           1.0          0.3  Iris-setosa
----------5----------sepal_length  sepal_width  petal_length  petal_width
0           5.1          3.5           1.4          0.2
1           4.9          3.0           1.4          0.2
2           4.7          3.2           1.3          0.2
3           4.6          3.1           1.5          0.2
4           5.0          3.6           1.4          0.2
----------6----------sepal_length  sepal_width  petal_length  petal_width
0           NaN          NaN           NaN          NaN
1           NaN          NaN           NaN          NaN
2           NaN          NaN           NaN          NaN
3           4.6          3.1           1.5          0.2
4           5.0          3.6           1.4          0.2
----------7----------
----------8----------sepal_length  sepal_width  petal_length  petal_width
3           4.6          3.1           1.5          0.2
4           5.0          3.6           1.4          0.2
5           5.4          3.9           1.7          0.4
6           4.6          3.4           1.4          0.3
7           5.0          3.4           1.5          0.2
----------9----------

Pandas十大练习题，掌握常用方法

文章目录 Pandas分析练习题1. 获取并了解数据2. 数据过滤与排序3. 数据分组4. Apply函数5. 合并数据6. 数据统计7. 数据可视化8. 创建数据框9. 时间序列10. 删除数据代码均在Jupter Notebook上完成 Pandas分析练习题数据集可从此获取： 链接: https://pan.baidu.co…...

编程日记 2024/1/16 17:15:00

CMake TcpServer项目链接静态库/动态库

一、链接静态库查看项目结构 hehedalinux:~/Linux/LinuxServerCpp-Link$ tree . ├── CMakeLists.txt ├── include │ ├── common │ │ ├── Buffer.h │ │ ├── Channel.h │ │ └── Log.h │ ├── http │ │ ├── HttpRequest…...

编程日记 2024/1/16 17:13:59

uint32无符号字节转为Java中的int

文章目录前言一、无符号字节转为int1.前置知识2.无符号转int代码3.Java中字节转为int 二、字节缓冲流1.基础知识2.String与ByteBuffer转换总结前言 Java 中基本类型都是有符号数值，如果接收到了 C/C 处理的无符号数值字节流，将出现转码错误。提示&a…...

编程日记 2024/1/16 17:12:57

Python网络爬虫进阶：自动切换HTTP代理IP的应用

前言当你决定做一个网络爬虫的时候，就意味着你要面对一个很大的挑战——IP池和中间件。这两个东西听起来很大上，但其实就是为了让你的爬虫不被封杀了。下面我就来给你讲讲如何搞定这些东西。第一步：创建爬虫IP池的详细过程首先&#xf…...

编程日记 2024/1/16 17:11:55

vivado 使用IP Integrator源

使用IP Integrator源在Vivado Design Suite中，您可以在RTL中添加和管理IP子系统块设计（.bd）项目或设计。使用Vivado IP集成程序，您可以创建IP子系统块设计。IP集成程序使您能够通过实例化和将Vivado IP目录中的多个IP核互连。可…...

编程日记 2024/1/16 17:08:52

【Mybatis系列】Mybatis空值关联

💝💝💝欢迎来到我的博客，很高兴能够在这里和您见面！希望您在这里可以感受到一份轻松愉快的氛围，不仅可以获得有趣的内容和知识，也可以畅所欲言、分享您的想法和见解。推荐:kwan 的首页,持续学…...

编程日记 2024/1/16 17:07:51

计算机组成原理运输层

文章目录运输层运输层协议概述进程之间的通信运输层的两个主要协议运输层的端口用户数据报协议 UDPUDP 概述UDP 的首部格式传输控制协议 TCP 概述TCP 最主要的特点TCP 的连接可靠传输的工作原理停止等待协议连续 ARQ协议 TCP 报文段的首部格式TCP 可靠传输的实现以字节为单…...

编程日记 2024/1/16 17:06:50

shp文件与数据库（创建shp文件）

前言前面把shp文件中的内容读取到数据库，接下来就把数据库中的表变成shp文件。正文简单的创建一个shp文件暂时不读取数据库的表，先随机创建一个shp文件。既然是随机的，这就需要使用到faker这个第三方库，代码如下。 impor…...

编程日记 2024/1/16 17:03:46

106、Text-Image Conditioned Diffusion for Consistent Text-to-3D Generation

简介很多工作在扩散先验中注入跨视图一致性，但仍然缺乏细粒度的视图一致性。论文提出的文本到3d的方法有效地减轻了漂浮物(由于密度过大)和完全空白空间(由于密度不足)的产生。实现过程简单而言，论文工作是 DreamfusionZero123。使用两种不同的分数…...

编程日记 2024/1/16 17:01:44

MAC通过终端,使用python3建立本地Web服务

实现局域网Web服务，很简单几句命令，一起看看。 1. 我相信你已经有 brew(Homebrew 包管理器) 了对么? 如果没有可以执行这个方法 /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"2. 安…...

编程日记 2024/1/16 17:00:43

闲鱼宝库亮相！闲鱼商品详情关键词搜索电商API接口助你畅享无尽好货！

随着互联网的快速发展，电商平台的崛起已经改变了人们的购物习惯。而在众多电商平台中，闲鱼作为一款社区二手交易平台，一直备受用户喜爱。如今，闲鱼宝库正式亮相，为用户带来了更加全面、详细的商品详情关键词搜索电商AP…...

编程日记 2024/1/16 16:59:42

后台生成随机验证码验证登录

web get请求获取图片 <div class"p2"><img id"imgId" src"/get/code"><a href"#">看不清，换一张</a> </div> 后台代码: /*获取动态验证码*/ ResponseBody RequestMapping(value "/…...

编程日记 2024/1/16 16:58:40

常见的HTTP接口超时问题出现原因及解决办法

HTTP接口超时问题是指在HTTP请求发送到服务器后，由于等待服务器响应的时间超过了预设的超时时间，导致请求被中断。以下是可能导致HTTP接口超时问题的原因和解决方法： 网络延迟或不稳定：网络延迟或不稳定可能导致请求在传输过程中…...

编程日记 2024/1/16 16:54:36

Pandas实战100例 | 案例 54: 日期时间运算

案例 54: 日期时间运算知识点讲解当处理带有 datetime 类型数据的 DataFrame 时，Pandas 提供了多种方法来提取和计算日期时间组件。这包括提取年份、月份、日期、星期几以及小时等。提取日期时间组件: 使用 .dt 访问器，可以从 datetime 类型的列中…...

编程日记 2024/1/16 16:53:35

SDL2 连续帧图像显示

QT使用SDL多窗口显示视频（linux，ubuntu）_linux qt sdl-CSDN博客 QT使用SDL播放YUV视频 - C - QT C 使用SDL显示RGB图像数据_c sdl-CSDN博客 SDL库入门：掌握跨平台游戏开发和多媒体编程_sdl开发-CSDN博客 SDL教程零基础入门简单…...

编程日记 2024/1/16 16:52:34

回归预测 | MATLAB实现SSA-CNN-GRU-Attention多变量回归预测（SE注意力机制）

回归预测 | MATLAB实现SSA-CNN-GRU-Attention多变量回归预测（SE注意力机制） 目录回归预测 | MATLAB实现SSA-CNN-GRU-Attention多变量回归预测（SE注意力机制）预测效果基本描述程序设计参考资料预测效果基本描述 1.Matlab实现SSA…...

编程日记 2024/1/16 16:49:30

使用composer构建软件包时文件（夹）权限设置

在构建软件包的时候你可能会需要对包源内文件或文件夹的权限做出相应的调整，以确保软件包在部署到客户端后可以正常运行。在此之前我们先来了解一下Apple文件系统内文件或文件夹的权限设定。常见的文件或文件夹会有Owner, Group, Everyone这三种类型的所有权&#…...

编程日记 2024/1/16 16:47:28

【C#】面向对象的三大特性，还记得吗，简单代码举例回顾

欢迎来到《小5讲堂》大家好，我是全栈小5。这是《C#》序列文章，每篇文章将以博主理解的角度展开讲解， 特别是针对知识点的概念进行叙说，大部分文章将会对这些概念进行实际例子验证，以此达到加深对知识点的理解和掌握。…...

编程日记 2024/1/16 16:46:28

235.【2023年华为OD机试真题（C卷）】机器人搬砖（二分查找-JavaPythonC++JS实现）

🚀点击这里可直接跳转到本专栏，可查阅顶置最新的华为OD机试宝典~ 本专栏所有题目均包含优质解题思路，高质量解题代码(Java&Python&C++&JS分别实现)，详细代码讲解，助你深入学习，深度掌握！文章目录一. 题目二.解题思路三.题解代码Python题解代码JAVA题解…...

编程日记 2024/1/16 16:44:26

git hooks

介绍当我们在执行git管理仓库代码时，想规范下每个用户的commit内容？想检查下提交的代码规范？想检查下PR是否通过，那么这个时候就需要用到git hooks，git hooks可以在我们进行git操作的关键时机插入我们想要执行的“脚…...

编程日记 2024/1/16 16:43:24

7.4.分块查找

一.分块查找的算法思想： 1.实例： 以上述图片的顺序表为例， 该顺序表的数据元素从整体来看是乱序的，但如果把这些数据元素分成一块一块的小区间， 第一个区间[0,1]索引上的数据元素都是小于等于10的， 第二…...

编程新知 2025/7/23 22:31:49

23-Oracle 23 ai 区块链表（Blockchain Table）

小伙伴有没有在金融强合规的领域中遇见，必须要保持数据不可变，管理员都无法修改和留痕的要求。比如医疗的电子病历中，影像检查检验结果不可篡改行的，药品追溯过程中数据只可插入无法删除的特性需求；登录日志、修改日志…...

编程新知 2025/7/11 20:07:50

理解 MCP 工作流：使用 Ollama 和 LangChain 构建本地 MCP 客户端

🌟 什么是 MCP？ 模型控制协议 (MCP) 是一种创新的协议，旨在无缝连接 AI 模型与应用程序。 MCP 是一个开源协议，它标准化了我们的 LLM 应用程序连接所需工具和数据源并与之协作的方式。可以把它想象成你的 AI 模型和想要使用它…...

编程新知 2025/7/22 19:26:33

【2025年】解决Burpsuite抓不到https包的问题

环境：windows11 burpsuite:2025.5 在抓取https网站时，burpsuite抓取不到https数据包，只显示： 解决该问题只需如下三个步骤： 1、浏览器中访问 http://burp 2、下载 CA certificate 证书 3、在设置--隐私与安全--…...

编程新知 2025/7/17 16:06:16

【android bluetooth 框架分析 04】【bt-framework 层详解 1】【BluetoothProperties介绍】

1. BluetoothProperties介绍 libsysprop/srcs/android/sysprop/BluetoothProperties.sysprop BluetoothProperties.sysprop 是 Android AOSP 中的一种系统属性定义文件（System Property Definition File），用于声明和管理 Bluetooth 模块相…...

编程新知 2025/7/9 6:14:15

高危文件识别的常用算法：原理、应用与企业场景

高危文件识别的常用算法：原理、应用与企业场景高危文件识别旨在检测可能导致安全威胁的文件，如包含恶意代码、敏感数据或欺诈内容的文档，在企业协同办公环境中（如Teams、Google Workspace）尤为重要。结合大模型技术&…...

编程新知 2025/7/23 1:39:55

3403. 从盒子中找出字典序最大的字符串 I

3403. 从盒子中找出字典序最大的字符串 I 题目链接：3403. 从盒子中找出字典序最大的字符串 I 代码如下： class Solution { public:string answerString(string word, int numFriends) {if (numFriends 1) {return word;}string res;for (int i 0;i &…...

编程新知 2025/6/21 22:11:13

微软PowerBI考试 PL300-在 Power BI 中清理、转换和加载数据

微软PowerBI考试 PL300-在 Power BI 中清理、转换和加载数据 Power Query 具有大量专门帮助您清理和准备数据以供分析的功能。您将了解如何简化复杂模型、更改数据类型、重命名对象和透视数据。您还将了解如何分析列，以便知晓哪些列包含有价值的数据，…...

编程新知 2025/7/16 8:40:38

用机器学习破解新能源领域的“弃风”难题

音乐发烧友深有体会，玩音乐的本质就是玩电网。火电声音偏暖，水电偏冷，风电偏空旷。至于太阳能发的电，则略显朦胧和单薄。不知你是否有感觉，近两年家里的音响声音越来越冷，听起来越来越单薄？ —…...

编程新知 2025/7/17 6:27:29

GO协程(Goroutine)问题总结

在使用Go语言来编写代码时，遇到的一些问题总结一下 [参考文档]：https://www.topgoer.com/%E5%B9%B6%E5%8F%91%E7%BC%96%E7%A8%8B/goroutine.html 1. main()函数默认的Goroutine 场景再现： 今天在看到这个教程的时候，在自己的电…...

编程新知 2025/7/22 6:49:38