当前位置: 首页 > news >正文

学习大数据DAY59 全量抽取和增量抽取实战

目录

需求流程:

需求分析与规范

作业

作业2


需求流程:

全量抽取 增量抽取 - DataX Kettle Sqoop ...
场景: 业务部门同事或者甲方的工作人员给我们的部门经理和你提出了新的需
流程: 联系 => 开会讨论 => 确认需求 => 落地
需求文档( 具体需要的东西)
原型文档 (报表的原型 纸笔/画图工具)
第一张汇总报表需要的指标 - 决策报表 汇总表 每次计算只有一天的记录 -
大 BOSS:
- 全部会员数 新增会员数
- 有效会员数 有效会员占比
- 流失会员数: 倒推一年含一年无消费记录的会员
- 净增有效会员数
- 会员消费级别分类人数 (A >=2000 B >=1000 < 2000 C >=500 <1000 D >=100
<500 E <100)
- 会员消费总额
- 会员消费总次数
- 会员消费单价
- 流失会员数
- 新增流失会员
- 60 天会员复购率
- 180 天会员复购率
- 365 天会员复购率
第二张报表用于市场营销 - 明细报表, 普通报表 - 市场部同事
- 筛选大于 30 笔的会员或者消费总金额大于 1500 的会员作为目标用户 用于电
话营销
- 字段: 姓名 手机号 消费总额 消费次数 城市 门店 付款偏好 (手机 刷卡
现金..) 关注的疾病
- 该会员最近 3 个月的月消费订单数和额度 m1_total m1_sale m2_total
m2_sale
第三张报表用于市场营销 - 2022.1-2023.12 每个月消费前 20 的会员名单
24X20=480 条 - 市场部经理
- T+1(月) yyyy-mm 月份
- 会员姓名 联系方式... 消费级别分类, 最近 30 天消费订单数和总额
- 该会员当月前 30 天消费, 60 天消费, 90 天消费 (困难点)
- 报表排序方式: 默认按消费总额倒序 / 按消费次数倒序
- 报表默认显示 2021 年 1 月份的数据, 可选 2 年内任何一个月的数据查看

需求分析与规范

经理整理出一个类似宽表文档的东西 - 方便后续的明细查询和指标计算
它决定了我们需要抽取哪些表
crm.user_base_info_his 客户信息表
erp.u_memcard_reg 会员卡信息表
erp.u_sale_m 订单表
1900W 数据
erp.u_sale_pay 订单支付表
1200W erp.c_memcard_class_group 会员分组表
erp.u_memcard_reg_c 疾病关注表
his.chronic_patient_info_new 检测表
erp.c_org_busi 门店表
# 额外的从文件处理的码值表
erp.c_code_value
# 7 个全量
系统名前缀 _ 表名 _(full|inc)
crm.user_base_info_his
全量 => ods_lijinquan.crm_user_base_info_his_full
erp.u_memcard_reg
全量=> ods_lijinquan.erp_u_memcard_reg_full
erp.c_memcard_class_group
全量 => ods_lijinquan.erp_c_memcard_class_group_full
erp.u_memcard_reg_c
全量=>ods_lijinquan.erp_u_memcard_reg_c_full
his.chronic_patient_info_new
全量 => ods_lijinquan.his_chronic_patient_info_new_full
erp.c_org_busi 全量 => ods_lijinquan.erp_c_org_busi_full
erp.c_code_value
全量文件处理
=>ods_lijinquan.c_code_value_full
# 增量
erp.u_sale_m 先 做 全 量 ( 一 次 性 ) 再 做 增 量 ( 每 天 执 行 )
=> ods_lijinquan.erp_u_sale_m_inc
erp.u_sale_pay 同上 增量 => ods_lijinquan.erp_u_sale_pay_inc

作业

完成 7 张全量表的抽取, 部署到调度平台 7 个调度任务
所有表最后都要跟源表的总数进行对比 需要一致
全量表处理
升级辅助脚本
- 自动读取表的字段信息, 自动生成 datax json 文件 和 Hive
建表文件
full.py:
#!/bin/python3
import pymysql
import sys
# 自动写 datax 的 json 文件
if len(sys.argv)!=3:
print("使用方法为:python3 full.py 数据库名 表名")
sys.exit()
sys_name=sys.argv[1]
table_name=sys.argv[2]
# datax_json=f"{sys_name}.{table_name}_full.json"
db=pymysql.connect(
host='zhiyun.pub',
port=23306,
user='zhiyun',
password='zhiyun',
database='information_schema'
)
cursor=db.cursor()
cursor.execute(f"select column_name,data_type from
information_schema.columns where table_schema='{sys_name}' and
table_name='{table_name}'")
data=cursor.fetchall()
fileds=[]for field in data:
field_name = field[0]
field_type = field[1]
#转换成 hive 类型
field_hive_type="string"
if field_type=="int" or field_type=="tinyint" or
field_type=="bigint":
field_hive_type="int"
if field_type=="float" or field_type=="double":
field_hive_type="float"
fileds.append([field_name,field_hive_type])
db.close()
print("=============== 配置 datax ===============")
file_path=f"/zhiyun/shihaihong/jobs/{sys_name}_{table_name}_fu
ll.json"
template_path="/zhiyun/shihaihong/jobs/template.json"
with open(template_path,"r",encoding="utf-8") as f:
template_content=f.read()
new_content=template_content.replace("#sys_name#",sys_name)
new_content=new_content.replace("#table_name#",table_name)
#列的替换
lines=[]
for filed in fileds:
line='
{"name":"'+filed[0]+'
","type":"'+filed[1]+'"},'
lines.append(line)
columns="\n".join(lines)
columns=columns.strip(",")
new_content=new_content.replace("\"#columns#\"",columns)
#写入到新的配置
with open(file_path,"w",encoding="utf-8") as ff:
ff.write(new_content)
ff.close()
f.close()
print("datax 文件配置成功")
print("=============== 配置 hive ===============")file_path=f"/zhiyun/shihaihong/sql/{sys_name}_{table_name}_ful
l.sql"
template_path="/zhiyun/shihaihong/sql/template.sql"
with open(template_path,"r",encoding="utf-8") as f:
template_content=f.read()
new_content=template_content.replace("#sys_name#",sys_name)
new_content=new_content.replace("#table_name#",table_name)
#列的替换
lines=[]
for filed in fileds:
line=f"
{filed[0]} {filed[1]},"
lines.append(line)
columns="\n".join(lines)
columns=columns.strip(",")
new_content=new_content.replace("#columns#",columns)
#写入到新的配置
with open(file_path,"w",encoding="utf-8") as ff:
ff.write(new_content)
ff.close()
print("hive 建表文件生成成功")

json 模板:

template.json:

{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": ["*"],
"connection": [
{
"jdbcUrl": ["jdbc:mysql://zhiyun.pub:233
06/crm?useSSL=false"
],
"table": [
"#table_name#"
]
}
],
"password": "zhiyun",
"username": "zhiyun"
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [
"#column#"
],
"defaultFS": "hdfs://cdh02:8020",
"fieldDelimiter": "\t",
"fileName":
"#sys_name#_#table_name#_full.data",
"fileType": "orc",
"path":
"/zhiyun/shihaihong/ods/#sys_name#_#table_name#_full",
"writeMode": "truncate"
}
}
}
],
"setting": {
"speed": {
"channel": "3"
}
}
}
}
sql 模板文件:
create database if not exists ods_shihaihong location
"/zhiyun/shihaihong/ods";-- 增量表
create external table if not exists
ods_shihaihong.#table_name#_full(
#columns#
)
row format delimited fields terminated by "\t"
lines terminated by "\n"
stored as orc
location "/zhiyun/shihaihong/ods/#sys_name#_#table_name#_full";
测试:
python3 python/full.py crm user_base_info_his
生成 json sql 文件:
根据 json sql 文件写出获得数据的 shell 脚本:
crm_user_base_info_his_full.sh
#!/bin/bash
echo "生成全量配置文件"mkdir -p /zhiyun/shihaihong/jobs
echo '{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": ["*"],
"connection": [
{
"jdbcUrl": [
"jdbc:mysql://zhiyun.pub:233
06/crm?useSSL=false"
],
"table": [
"user_base_info_his"
]
}
],
"password": "zhiyun",
"username": "zhiyun"
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [
{"name":
"id","type":"int"},
{"name":"user_id","type":"string"},
{"name":"user_type","type":"string"}
,
{"name":"source","type":"string"},
{"name":"erp_code","type":"string"}
,
{"name":"active_time","type":"strin
g"},
{"name":"name","type":"string"},
{"name":"sex","type":"string"},
{"name":"education","type":"string"}
,{"name":"job","type":"string"},
{"name":"email","type":"string"},
{"name":"wechat","type":"string"},
{"name":"webo","type":"string"},
{"name":"birthday","type":"string"}
,
{"name":"age","type":"int"},
{"name":"id_card_no","type":"string
"},
{"name":"social_insurance_no","type
":"string"},
{"name":"address","type":"string"},
{"name":"last_subscribe_time","type
":"int"}
],
"defaultFS": "hdfs://cdh02:8020",
"fieldDelimiter": "\t",
"fileName":
"crm_user_base_info_his_full.data",
"fileType": "orc",
"path":
"/zhiyun/shihaihong/ods/crm_user_base_info_his_full",
"writeMode": "truncate"
}
}
}
],
"setting": {
"speed": {
"channel": "3"
}
}
}
}' > /zhiyun/shihaihong/jobs/crm_user_base_info_his_full.json
echo "开始抽取"
hadoop fs -mkdir -p
/zhiyun/shihaihong/ods/crm_user_base_info_his_full
python /opt/datax/bin/datax.py
/zhiyun/shihaihong/jobs/crm_user_base_info_his_full.json
echo "hive 建表"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e 'create database if not exists ods_shihaihong location
"/zhiyun/shihaihong/ods";
-- 增量表
create external table if not exists
ods_shihaihong.user_base_info_his_full(
id int,
user_id string,
user_type string,
source string,
erp_code string,
active_time string,
name string,
sex string,
education string,
job string,
email string,
wechat string,
webo string,
birthday string,
age int,
id_card_no string,
social_insurance_no string,
address string,
last_subscribe_time int
)
row format delimited fields terminated by "\t"
lines terminated by "\n"
stored as orc
location "/zhiyun/shihaihong/ods/crm_user_base_info_his_full";
'
# echo "加载数据"
# beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e "
# load data inpath
\"/zhiyun/shihaihong/tmp/crm_user_base_info_his_full/*\"
overwrite into table ods_shihaihong.crm_user_base_info_his_full
partition(createtime='$day');
# "
echo "验证数据"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e "
select count(1) from ods_shihaihong.user_base_info_his_full;
"echo "抽取完成"
运行测试:
在本人数据库检查:
在生产调度中心设置任务:
在 GLUE IDE 插入 sh 文件后,执行一次:
运行成功。
其它六张表用同样的方式操作即可。
其它六张表的 shell 脚本: erp_u_memcard_reg_full.sh:
#!/bin/bash
echo "生成全量配置文件"
mkdir -p /zhiyun/shihaihong/jobs
echo '{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": ["*"],
"connection": [
{
"jdbcUrl": [
"jdbc:mysql://zhiyun.pub:233
06/erp?useSSL=false"
],
"table": [
"u_memcard_reg"
]
}
],
"password": "zhiyun",
"username": "zhiyun"
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [
{"name":
"id","type":"int"},
{"name":"memcardno","type":"string"}
,
{"name":"busno","type":"string"},
{"name":"introducer","type":"string
"},
{"name":"cardtype","type":"int"},
{"name":"cardlevel","type":"int"},{"name":"cardpass","type":"string"}
,
{"name":"cardstatus","type":"int"},
{"name":"saleamount","type":"string
"},
{"name":"realamount","type":"string
"},
{"name":"puramount","type":"string"}
,
{"name":"integral","type":"string"}
,
{"name":"integrala","type":"string"}
,
{"name":"integralflag","type":"int"}
,
{"name":"cardholder","type":"string
"},
{"name":"cardaddress","type":"strin
g"},
{"name":"sex","type":"string"},
{"name":"tel","type":"string"},
{"name":"handset","type":"string"},
{"name":"fax","type":"string"},
{"name":"createuser","type":"string
"},
{"name":"createtime","type":"string
"},
{"name":"tstatus","type":"int"},
{"name":"notes","type":"string"},
{"name":"stamp","type":"string"},
{"name":"idcard","type":"string"},
{"name":"birthday","type":"string"}
,
{"name":"allowintegral","type":"int
"},
{"name":"apptype","type":"string"},
{"name":"applytime","type":"string"}
,
{"name":"invalidate","type":"string
"},
{"name":"lastdate","type":"string"}
,
{"name":"bak1","type":"string"},{"name":"scrm_userid","type":"strin
g"}
],
"defaultFS": "hdfs://cdh02:8020",
"fieldDelimiter": "\t",
"fileName":
"erp_u_memcard_reg_full.data",
"fileType": "orc",
"path":
"/zhiyun/shihaihong/ods/erp_u_memcard_reg_full",
"writeMode": "truncate"
}
}
}
],
"setting": {
"speed": {
"channel": "3"
}
}
}
}' > /zhiyun/shihaihong/jobs/erp_u_memcard_reg_full.json
echo "开始抽取"
hadoop fs -mkdir -p /zhiyun/shihaihong/ods/erp_u_memcard_reg_full
python /opt/datax/bin/datax.py
/zhiyun/shihaihong/jobs/erp_u_memcard_reg_full.json
echo "hive 建表"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e '
create database if not exists ods_shihaihong location
"/zhiyun/shihaihong/ods";
-- 增量表
create external table if not exists
ods_shihaihong.u_memcard_reg_full(
id int,
memcardno string,
busno string,
introducer string,
cardtype int,
cardlevel int,
cardpass string,
cardstatus int,saleamount string,
realamount string,
puramount string,
integral string,
integrala string,
integralflag int,
cardholder string,
cardaddress string,
sex string,
tel string,
handset string,
fax string,
createuser string,
createtime string,
tstatus int,
notes string,
stamp string,
idcard string,
birthday string,
allowintegral int,
apptype string,
applytime string,
invalidate string,
lastdate string,
bak1 string,
scrm_userid string
)
row format delimited fields terminated by "\t"
lines terminated by "\n"
stored as orc
location "/zhiyun/shihaihong/ods/erp_u_memcard_reg_full";
'
# echo "加载数据"
# beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e "
# load data inpath
\"/zhiyun/shihaihong/tmp/erp_u_memcard_reg_full/*\" overwrite
into table ods_shihaihong.erp_u_memcard_reg_full
partition(createtime='$day');
# "
echo "验证数据"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e "select count(1) from ods_shihaihong.u_memcard_reg_full;
"
echo "抽取完成"
erp.c_memcard_class_group:
#!/bin/bash
echo "生成全量配置文件"
mkdir -p /zhiyun/shihaihong/jobs
echo '{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": ["*"],
"connection": [
{
"jdbcUrl": [
"jdbc:mysql://zhiyun.pub:233
06/erp?useSSL=false"
],
"table": [
"c_memcard_class_group"
]
}
],
"password": "zhiyun",
"username": "zhiyun"
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [
{"name":
"createtime","type":"string"},{"name":"createuser","type":"string
"},
{"name":"groupid","type":"int"},
{"name":"groupname","type":"string"}
,
{"name":"notes","type":"string"},
{"name":"stamp","type":"int"}
],
"defaultFS": "hdfs://cdh02:8020",
"fieldDelimiter": "\t",
"fileName":
"erp_c_memcard_class_group_full.data",
"fileType": "orc",
"path":
"/zhiyun/shihaihong/ods/erp_c_memcard_class_group_full",
"writeMode": "truncate"
}
}
}
],
"setting": {
"speed": {
"channel": "3"
}
}
}
}' > /zhiyun/shihaihong/jobs/erp_c_memcard_class_group_full.json
echo "开始抽取"
hadoop fs -mkdir -p
/zhiyun/shihaihong/ods/erp_c_memcard_class_group_full
python /opt/datax/bin/datax.py
/zhiyun/shihaihong/jobs/erp_c_memcard_class_group_full.json
echo "hive 建表"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e '
create database if not exists ods_shihaihong location
"/zhiyun/shihaihong/ods";
-- 增量表
create external table if not exists
ods_shihaihong.c_memcard_class_group_full(
createtime string,
createuser string,groupid int,
groupname string,
notes string,
stamp int
)
row format delimited fields terminated by "\t"
lines terminated by "\n"
stored as orc
location
"/zhiyun/shihaihong/ods/erp_c_memcard_class_group_full";
'
# echo "加载数据"
# beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e "
# load data inpath
\"/zhiyun/shihaihong/tmp/erp_c_memcard_class_group_full/*\"
overwrite into table
ods_shihaihong.erp_c_memcard_class_group_full
partition(createtime='$day');
# "
echo "验证数据"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e "
select count(1) from ods_shihaihong.c_memcard_class_group_full;
"
echo "抽取完成"
erp.u_memcard_reg_c:
#!/bin/bash
echo "生成全量配置文件"
mkdir -p /zhiyun/shihaihong/jobs
echo '{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader","parameter": {
"column": ["*"],
"connection": [
{
"jdbcUrl": [
"jdbc:mysql://zhiyun.pub:233
06/erp?useSSL=false"
],
"table": [
"u_memcard_reg_c"
]
}
],
"password": "zhiyun",
"username": "zhiyun"
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [
{"name":
"id","type":"int"},
{"name":"memcardno","type":"string"}
,
{"name":"sickness","type":"string"}
,
{"name":"status","type":"string"}
],
"defaultFS": "hdfs://cdh02:8020",
"fieldDelimiter": "\t",
"fileName":
"erp_u_memcard_reg_c_full.data",
"fileType": "orc",
"path":
"/zhiyun/shihaihong/ods/erp_u_memcard_reg_c_full",
"writeMode": "truncate"
}
}
}
],
"setting": {
"speed": {"channel": "3"
}
}
}
}' > /zhiyun/shihaihong/jobs/erp_u_memcard_reg_c_full.json
echo "开始抽取"
hadoop fs -mkdir -p
/zhiyun/shihaihong/ods/erp_u_memcard_reg_c_full
python /opt/datax/bin/datax.py
/zhiyun/shihaihong/jobs/erp_u_memcard_reg_c_full.json
echo "hive 建表"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e '
create database if not exists ods_shihaihong location
"/zhiyun/shihaihong/ods";
-- 增量表
create external table if not exists
ods_shihaihong.u_memcard_reg_c_full(
id int,
memcardno string,
sickness string,
status string
)
row format delimited fields terminated by "\t"
lines terminated by "\n"
stored as orc
location "/zhiyun/shihaihong/ods/erp_u_memcard_reg_c_full";
'
# echo "加载数据"
# beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e "
# load data inpath
\"/zhiyun/shihaihong/tmp/erp_u_memcard_reg_c_full/*\" overwrite
into table ods_shihaihong.erp_u_memcard_reg_c_full
partition(createtime='$day');
# "
echo "验证数据"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e "
select count(1) from ods_shihaihong.u_memcard_reg_c_full;
"echo "抽取完成"
his.chronic_patient_info_new:
#!/bin/bash
echo "生成全量配置文件"
mkdir -p /zhiyun/shihaihong/jobs
echo '{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": ["*"],
"connection": [
{
"jdbcUrl": [
"jdbc:mysql://zhiyun.pub:233
06/his?useSSL=false"
],
"table": [
"chronic_patient_info_new"
]
}
],
"password": "zhiyun",
"username": "zhiyun"
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [
{"name":
"id","type":"int"},{"name":"member_id","type":"string"}
,
{"name":"erp_code","type":"string"}
,
{"name":"extend","type":"string"},
{"name":"detect_time","type":"strin
g"},
{"name":"bec_chr_mbr_date","type":"
string"}
],
"defaultFS": "hdfs://cdh02:8020",
"fieldDelimiter": "\t",
"fileName":
"his_chronic_patient_info_new_full.data",
"fileType": "orc",
"path":
"/zhiyun/shihaihong/ods/his_chronic_patient_info_new_full",
"writeMode": "truncate"
}
}
}
],
"setting": {
"speed": {
"channel": "3"
}
}
}
}' >
/zhiyun/shihaihong/jobs/his_chronic_patient_info_new_full.json
echo "开始抽取"
hadoop fs -mkdir -p
/zhiyun/shihaihong/ods/his_chronic_patient_info_new_full
python /opt/datax/bin/datax.py
/zhiyun/shihaihong/jobs/his_chronic_patient_info_new_full.json
echo "hive 建表"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e '
create database if not exists ods_shihaihong location
"/zhiyun/shihaihong/ods";
-- 增量表create external table if not exists
ods_shihaihong.chronic_patient_info_new_full(
id int,
member_id string,
erp_code string,
extend string,
detect_time string,
bec_chr_mbr_date string
)
row format delimited fields terminated by "\t"
lines terminated by "\n"
stored as orc
location
"/zhiyun/shihaihong/ods/his_chronic_patient_info_new_full";
'
# echo "加载数据"
# beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e "
# load data inpath
\"/zhiyun/shihaihong/tmp/his_chronic_patient_info_new_full/*\"
overwrite into table
ods_shihaihong.his_chronic_patient_info_new_full
partition(createtime='$day');
# "
echo "验证数据"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e "
select count(1) from
ods_shihaihong.chronic_patient_info_new_full;
"
echo "抽取完成"
erp.c_org_busi :
#!/bin/bash
echo "生成全量配置文件"
mkdir -p /zhiyun/shihaihong/jobsecho '{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": ["*"],
"connection": [
{
"jdbcUrl": [
"jdbc:mysql://zhiyun.pub:233
06/erp?useSSL=false"
],
"table": [
"c_org_busi"
]
}
],
"password": "zhiyun",
"username": "zhiyun"
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [
{"name":
"id","type":"int"},
{"name":"busno","type":"string"},
{"name":"orgname","type":"string"},
{"name":"orgsubno","type":"string"}
,
{"name":"orgtype","type":"string"},
{"name":"salegroup","type":"string"}
,
{"name":"org_tran_code","type":"str
ing"},
{"name":"accno","type":"string"},
{"name":"sendtype","type":"string"}
,
{"name":"sendday","type":"string"},{"name":"maxday","type":"string"},
{"name":"minday","type":"string"},
{"name":"notes","type":"string"},
{"name":"stamp","type":"string"},
{"name":"status","type":"string"},
{"name":"customid","type":"string"}
,
{"name":"whl_vendorno","type":"stri
ng"},
{"name":"whlgroup","type":"string"}
,
{"name":"rate","type":"string"},
{"name":"creditamt","type":"string"}
,
{"name":"creditday","type":"string"}
,
{"name":"peoples","type":"string"},
{"name":"area","type":"string"},
{"name":"abc","type":"string"},
{"name":"address","type":"string"},
{"name":"tel","type":"string"},
{"name":"principal","type":"string"}
,
{"name":"identity_card","type":"str
ing"},
{"name":"mobil","type":"string"},
{"name":"corporation","type":"strin
g"},
{"name":"saler","type":"string"},
{"name":"createtime","type":"string
"},
{"name":"bank","type":"string"},
{"name":"bankno","type":"string"},
{"name":"bak1","type":"string"},
{"name":"bak2","type":"string"},
{"name":"a_bak1","type":"string"},
{"name":"aa_bak1","type":"string"},
{"name":"b_bak1","type":"string"},
{"name":"bb_bak1","type":"string"},
{"name":"y_bak1","type":"string"},
{"name":"t_bak1","type":"string"},
{"name":"ym_bak1","type":"string"},
{"name":"tm_bak1","type":"string"},{"name":"supervise_code","type":"st
ring"},
{"name":"monthrent","type":"string"}
,
{"name":"wms_warehid","type":"strin
g"},
{"name":"settlement_cycle","type":"
string"},
{"name":"apply_cycle","type":"strin
g"},
{"name":"applydate","type":"string"}
,
{"name":"accounttype","type":"strin
g"},
{"name":"applydate_last","type":"st
ring"},
{"name":"paymode","type":"string"},
{"name":"yaolian_flag","type":"stri
ng"},
{"name":"org_longitude","type":"str
ing"},
{"name":"org_latitude","type":"stri
ng"},
{"name":"org_province","type":"stri
ng"},
{"name":"org_city","type":"string"}
,
{"name":"org_area","type":"string"}
,
{"name":"business_time","type":"str
ing"},
{"name":"yaolian_group","type":"str
ing"},
{"name":"pacard_storeid","type":"st
ring"},
{"name":"opening_time","type":"stri
ng"},
{"name":"ret_ent_id","type":"string
"},
{"name":"ent_id","type":"string"}
],
"defaultFS": "hdfs://cdh02:8020",
"fieldDelimiter": "\t","fileName": "erp_c_org_busi_full.data",
"fileType": "orc",
"path":
"/zhiyun/shihaihong/ods/erp_c_org_busi_full",
"writeMode": "truncate"
}
}
}
],
"setting": {
"speed": {
"channel": "3"
}
}
}
}' > /zhiyun/shihaihong/jobs/erp_c_org_busi_full.json
echo "开始抽取"
hadoop fs -mkdir -p /zhiyun/shihaihong/ods/erp_c_org_busi_full
python /opt/datax/bin/datax.py
/zhiyun/shihaihong/jobs/erp_c_org_busi_full.json
echo "hive 建表"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e '
create database if not exists ods_shihaihong location
"/zhiyun/shihaihong/ods";
-- 增量表
create external table if not exists
ods_shihaihong.c_org_busi_full(
id int,
busno string,
orgname string,
orgsubno string,
orgtype string,
salegroup string,
org_tran_code string,
accno string,
sendtype string,
sendday string,
maxday string,
minday string,
notes string,
stamp string,status string,
customid string,
whl_vendorno string,
whlgroup string,
rate string,
creditamt string,
creditday string,
peoples string,
area string,
abc string,
address string,
tel string,
principal string,
identity_card string,
mobil string,
corporation string,
saler string,
createtime string,
bank string,
bankno string,
bak1 string,
bak2 string,
a_bak1 string,
aa_bak1 string,
b_bak1 string,
bb_bak1 string,
y_bak1 string,
t_bak1 string,
ym_bak1 string,
tm_bak1 string,
supervise_code string,
monthrent string,
wms_warehid string,
settlement_cycle string,
apply_cycle string,
applydate string,
accounttype string,
applydate_last string,
paymode string,
yaolian_flag string,
org_longitude string,
org_latitude string,
org_province string,org_city string,
org_area string,
business_time string,
yaolian_group string,
pacard_storeid string,
opening_time string,
ret_ent_id string,
ent_id string
)
row format delimited fields terminated by "\t"
lines terminated by "\n"
stored as orc
location "/zhiyun/shihaihong/ods/erp_c_org_busi_full";
'
# echo "加载数据"
# beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e "
# load data inpath
\"/zhiyun/shihaihong/tmp/erp_c_org_busi_full/*\" overwrite into
table ods_shihaihong.erp_c_org_busi_full
partition(createtime='$day');
# "
echo "验证数据"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e "
select count(1) from ods_shihaihong.c_org_busi_full;
"
echo "抽取完成"
erp.c_code_value:
#!/bin/bash
echo "生成全量配置文件"
mkdir -p /zhiyun/shihaihong/jobs
echo '{
"job": {"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": ["*"],
"connection": [
{
"jdbcUrl": [
"jdbc:mysql://zhiyun.pub:233
06/erp?useSSL=false"
],
"table": [
"c_code_value"
]
}
],
"password": "zhiyun",
"username": "zhiyun"
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [
{"name":
"id","type":"int"},
{"name":"cat_name","type":"string"}
,
{"name":"cat_code","type":"string"}
,
{"name":"val_name","type":"string"}
,
{"name":"var_desc","type":"string"}
],
"defaultFS": "hdfs://cdh02:8020",
"fieldDelimiter": "\t",
"fileName":
"erp_c_code_value_full.data",
"fileType": "orc",
"path":
"/zhiyun/shihaihong/ods/erp_c_code_value_full",
"writeMode": "truncate"}
}
}
],
"setting": {
"speed": {
"channel": "3"
}
}
}
}' > /zhiyun/shihaihong/jobs/erp_c_code_value_full.json
echo "开始抽取"
hadoop fs -mkdir -p /zhiyun/shihaihong/ods/erp_c_code_value_full
python /opt/datax/bin/datax.py
/zhiyun/shihaihong/jobs/erp_c_code_value_full.json
echo "hive 建表"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e '
create database if not exists ods_shihaihong location
"/zhiyun/shihaihong/ods";
-- 增量表
create external table if not exists
ods_shihaihong.c_code_value_full(
id int,
cat_name string,
cat_code string,
val_name string,
var_desc string
)
row format delimited fields terminated by "\t"
lines terminated by "\n"
stored as orc
location "/zhiyun/shihaihong/ods/erp_c_code_value_full";
'
# echo "加载数据"
# beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e "
# load data inpath
\"/zhiyun/shihaihong/tmp/erp_c_code_value_full/*\" overwrite
into table ods_shihaihong.erp_c_code_value_full
partition(createtime='$day');
# "echo "验证数据"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e "
select count(1) from ods_shihaihong.c_code_value_full;
"
echo "抽取完成"
任务调度:
erp.u_memcard_reg:
erp.c_memcard_class_group:
erp.u_memcard_reg_c:
his.chronic_patient_info_new:
erp.c_org_busi :
erp.c_code_value :
将码值表文件上传到 data 中,用 python 写一个数据清洗的脚本:
#!/bin/python3
import os
import pandas as pd
from openpyxl import load_workbook
ss=''
lst=[]#使用 pandas 的 read_excel 函数读取指定路径的 Excel 文件。
sheet_name=None 表示读取文件中的所有工作表,而 header=2 表示数据的表
头位于第 3 行(索引从 0 开始)
dfs = pd.read_excel('/zhiyun/shihaihong/data/12.码值
表.xlsx',sheet_name=None,header=2)
dir=list(dfs.keys())
#获取 xlsx 文件数据
for i in range(len(dir)):
if i>1:
#获取 A2 行数据
wb = load_workbook(filename='/zhiyun/shihaihong/data/12.
码值表.xlsx')
str_head = wb[dir[i]]['A2'].value
data=dfs[dir[i]]
#获取其它行数据
lst1=[]
for i in data.columns:
for j in range(len(data)):
if data[i][j] != 'NaN':
lst1.append(str(data[i][j]))
n=int(len(lst1)/2)
for i in range(n):
ss=f"{str_head.split('-')[0]}|{str_head.split('-')[
1]}|{lst1[i]}|{lst1[i+n]}"
lst.append(ss)
print("写入数据到 data")
template_path = "/zhiyun/shihaihong/data/code_value.txt"
with open(template_path,"w",encoding="utf-8") as f:
content="\n".join(lst)
f.write(content)
f.close
print("上传 data 文件 到 hdfs")
os.system(f"hdfs dfs -mkdir -p /zhiyun/shihaihong/filetxt/")
os.system(f"hdfs dfs -put {template_path}
/zhiyun/shihaihong/filetxt/")
#!/bin/bash
# 作用: 完成从编写配置文件到验证数据的整个过程
# 需要在任何节点都可以执行
# 创建本人文件夹
mkdir -p /zhiyun/shihaihong/data /zhiyun/shihaihong/jobs
/zhiyun/shihaihong/python /zhiyun/shihaihong/shell
/zhiyun/shihaihong/sql
echo "hive 建表"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e'
create database if not exists ods_shihaihong location
"/zhiyun/shihaihong/ods";
create external table if not exists
ods_shihaihong.c_code_value_full(
cat_name string,
cat_code string,
val_name string,
var_desc string
)
row format delimited fields terminated by "|"
lines terminated by "\n"
stored as textfile
location "/zhiyun/shihaihong/filetxt";'
echo "hive 建表完成"
echo "验证数据"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e "
select count(1) from ods_shihaihong.c_code_value_full;
"
echo "验证完成"

执行后,用 shell 脚本抽取
 

任务调度:

作业2

完成 2 张增量表的处理 历史数据调度任务 + 增量调度任务 4 个调度任务
所有表最后都要跟源表的总数进行对比 需要一致
抽取的增量表:
erp.u_sale_m 先 做 全 量 ( 一 次 性 ) 再 做 增 量 ( 每 天 执 行 ) =>
ods_lijinquan.erp_u_sale_m_inc
erp.u_sale_pay 同上 增量 => ods_lijinquan.erp_u_sale_pay_inc
首次抽取为全量抽取:
erp_u_sale_m_full.sh:
#!/bin/bash
echo "生成全量配置文件"
mkdir -p /zhiyun/shihaihong/jobs
echo '{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": ["*"],
"connection": [
{
"jdbcUrl": [
"jdbc:mysql://zhiyun.pub:233
06/erp?useSSL=false"
],
"table": [
"u_sale_m"]
}
],
"password": "zhiyun",
"username": "zhiyun"
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [
{"name":"id","type":"int"},
{"name":"saleno","type":"string"},
{"name":"busno","type":"string"},
{"name":"posno","type":"string"},
{"name":"extno","type":"string"},
{"name":"extsource","type":"string"}
,
{"name":"o2o_trade_from","type":"st
ring"},
{"name":"channel","type":"int"},
{"name":"starttime","type":"string"}
,
{"name":"finaltime","type":"string"}
,
{"name":"payee","type":"string"},
{"name":"discounter","type":"string
"},
{"name":"crediter","type":"string"}
,
{"name":"returner","type":"string"}
,
{"name":"warranter1","type":"string
"},
{"name":"warranter2","type":"string
"},
{"name":"stdsum","type":"string"},
{"name":"netsum","type":"string"},
{"name":"loss","type":"string"},
{"name":"discount","type":"float"},
{"name":"member","type":"string"},
{"name":"precash","type":"string"},
{"name":"stamp","type":"string"},{"name":"shiftid","type":"string"},
{"name":"shiftdate","type":"string"}
,
{"name":"yb_saleno","type":"string"
}
],
"defaultFS": "hdfs://cdh02:8020",
"fieldDelimiter": "\t",
"fileName": "erp_u_sale_m_full.data",
"fileType": "orc",
"path":
"/zhiyun/shihaihong/ods/erp_u_sale_m_full",
"writeMode": "truncate"
}
}
}
],
"setting": {
"speed": {
"channel": "3"
}
}
}
}' > /zhiyun/shihaihong/jobs/erp_u_sale_m_full.json
echo "开始抽取"
hadoop fs -mkdir -p /zhiyun/shihaihong/ods/erp_u_sale_m_full
python /opt/datax/bin/datax.py
/zhiyun/shihaihong/jobs/erp_u_sale_m_full.json
echo "hive 建表"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e '
create database if not exists ods_shihaihong location
"/zhiyun/shihaihong/ods";
-- 增量表
create external table if not exists ods_shihaihong.u_sale_m_full(
id int,
saleno string,
busno string,
posno string,
extno string,
extsource string,
o2o_trade_from string,channel int,
starttime string,
finaltime string,
payee string,
discounter string,
crediter string,
returner string,
warranter1 string,
warranter2 string,
stdsum string,
netsum string,
loss string,
discount float,
member string,
precash string,
stamp string,
shiftid string,
shiftdate string,
yb_saleno string
)
row format delimited fields terminated by "\t"
lines terminated by "\n"
stored as orc
location "/zhiyun/shihaihong/ods/erp_u_sale_m_full";
'
# echo "加载数据"
# beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e "
# load data inpath \"/zhiyun/shihaihong/tmp/erp_u_sale_m_full/*\"
overwrite into table ods_shihaihong.erp_u_sale_m_full
partition(createtime='$day');
# "
echo "验证数据"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e "
select count(1) from ods_shihaihong.u_sale_m_full;
"
echo "抽取完成"

写入任务调度平台:

将 sh 文件内容复制入 GLUE IDE 中,执行一次:
后续用增量抽取:
#!/bin/bash
day=$(date -d "yesterday" +%Y-%m-%d)
if [ $1 != "" ]; thenday=$(date -d "$1 -1 day" +%Y-%m-%d);
fi;
echo "抽取的日期为 $day"
echo "生成增量配置文件"
mkdir -p /zhiyun/shihaihong/jobs
echo '
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"connection": [
{
"jdbcUrl": [
"jdbc:mysql://zhiyun.pub:233
06/erp?useSSL=false"
],
"querySql": [
"select * from u_sale_m where
stamp between '\'"$day" 00:00:00\'' and '\'"$day" 23:59:59\'' and
id>0"
]
}
],
"password": "zhiyun",
"username": "zhiyun"
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [
{"name":"id","type":"int"},
{"name":"saleno","type":"string"},
{"name":"busno","type":"string"},
{"name":"posno","type":"string"},
{"name":"extno","type":"string"},{"name":"extsource","type":"string"}
,
{"name":"o2o_trade_from","type":"str
ing"},
{"name":"channel","type":"int"},
{"name":"starttime","type":"string"}
,
{"name":"finaltime","type":"string"}
,
{"name":"payee","type":"string"},
{"name":"discounter","type":"string"}
,
{"name":"crediter","type":"string"},
{"name":"returner","type":"string"},
{"name":"warranter1","type":"string"}
,
{"name":"warranter2","type":"string"}
,
{"name":"stdsum","type":"string"},
{"name":"netsum","type":"string"},
{"name":"loss","type":"string"},
{"name":"discount","type":"float"},
{"name":"member","type":"string"},
{"name":"precash","type":"string"},
{"name":"stamp","type":"string"},
{"name":"shiftid","type":"string"},
{"name":"shiftdate","type":"string"}
,
{"name":"yb_saleno","type":"string"}
],
"defaultFS": "hdfs://cdh02:8020",
"fieldDelimiter": "\t",
"fileName": "erp_u_sale_m_inc.data",
"fileType": "orc",
"path":
"/zhiyun/shihaihong/tmp/erp_u_sale_m_inc",
"writeMode": "truncate"
}
}
}
],
"setting": {
"speed": {"channel": 2
}
}
}
}' > /zhiyun/shihaihong/jobs/erp_u_sale_m_inc.json
echo "开始抽取"
hadoop fs -mkdir -p /zhiyun/shihaihong/tmp/erp_u_sale_m_inc
python /opt/datax/bin/datax.py
/zhiyun/shihaihong/jobs/erp_u_sale_m_inc.json
echo "hive 建表"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e '
create database if not exists ods_shihaihong location
"/zhiyun/shihaihong/ods";
create external table if not exists
ods_shihaihong.erp_u_sale_m_inc(
id int,
saleno string,
busno string,
posno string,
extno string,
extsource string,
o2o_trade_from string,
channel int,
starttime string,
finaltime string,
payee string,
discounter string,
crediter string,
returner string,
warranter1 string,
warranter2 string,
stdsum string,
netsum string,
loss string,
discount float,
member string,
precash string,
shiftid string,
shiftdate string,
yb_saleno string) partitioned by (stamp string)
row format delimited fields terminated by "\t"
lines terminated by "\n"
stored as orc
location "/zhiyun/shihaihong/ods/erp_u_sale_m_inc";
'
echo "加载数据"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e "
load data inpath '/zhiyun/shihaihong/tmp/erp_u_sale_m_inc/*'
overwrite into table ods_shihaihong.erp_u_sale_m_inc
partition(stamp='"$day"');
"
echo "验证数据"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e "
show partitions ods_shihaihong.erp_u_sale_m_inc;
select count(1) from ods_shihaihong.erp_u_sale_m_inc where stamp
= '"$day"';
select * from ods_shihaihong.erp_u_sale_m_inc where stamp =
'"$day"' limit 5;
"
echo "抽取完成"
任务调度:
执行一次,输入参数:
erp_u_sale_pay_inc
全量抽取:
#!/bin/bash
echo "生成全量配置文件"
mkdir -p /zhiyun/shihaihong/jobsecho '{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": ["*"],
"connection": [
{
"jdbcUrl": [
"jdbc:mysql://zhiyun.pub:233
06/erp?useSSL=false"
],
"table": [
"u_sale_pay"
]
}
],
"password": "zhiyun",
"username": "zhiyun"
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [
{"name":
"id","type":"int"},
{"name":"saleno","type":"string"},
{"name":"cardno","type":"string"},
{"name":"netsum","type":"string"},
{"name":"paytype","type":"string"},
{"name":"bak1","type":"string"}
],
"defaultFS": "hdfs://cdh02:8020",
"fieldDelimiter": "\t",
"fileName": "erp_u_sale_pay_full.data",
"fileType": "orc",
"path":
"/zhiyun/shihaihong/ods/erp_u_sale_pay_full",
"writeMode": "truncate"}
}
}
],
"setting": {
"speed": {
"channel": "3"
}
}
}
}' > /zhiyun/shihaihong/jobs/erp_u_sale_pay_full.json
echo "开始抽取"
hadoop fs -mkdir -p /zhiyun/shihaihong/ods/erp_u_sale_pay_full
python /opt/datax/bin/datax.py
/zhiyun/shihaihong/jobs/erp_u_sale_pay_full.json
echo "hive 建表"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e '
create database if not exists ods_shihaihong location
"/zhiyun/shihaihong/ods";
-- 增量表
create external table if not exists
ods_shihaihong.u_sale_pay_full(
id int,
saleno string,
cardno string,
netsum string,
paytype string,
bak1 string
)
row format delimited fields terminated by "\t"
lines terminated by "\n"
stored as orc
location "/zhiyun/shihaihong/ods/erp_u_sale_pay_full";
'
# echo "加载数据"
# beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e "
# load data inpath
\"/zhiyun/shihaihong/tmp/erp_u_sale_pay_full/*\" overwrite into
table ods_shihaihong.erp_u_sale_pay_full
partition(createtime='$day');# "
echo "验证数据"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e "
select count(1) from ods_shihaihong.u_sale_pay_full;
"
echo "抽取完成"
任务调度:
编辑 GLUE IDE, 执行一次:
增量抽取:
#!/bin/bash
day=$(date -d "yesterday" +%Y-%m-%d)
if [ $1 != "" ]; then
day=$(date -d "$1 -1 day" +%Y-%m-%d);
fi;
echo "抽取的日期为 $day"
echo "生成增量配置文件"
mkdir -p /zhiyun/shihaihong/jobs
echo '
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": ["*"],
"connection": [
{
"jdbcUrl": [
"jdbc:mysql://zhiyun.pub:233
06/erp?useSSL=false"
],
"table": [
"select u_sale_pay.*,stamp
from u_sale_pay left join u_sale_m on
u_sale_pay.saleno=u_sale_m.saleno where stamp between '\'"$day"
00:00:00\'' and '\'"$day" 23:59:59\'' and id>0 "
]
}
],
"password": "zhiyun",
"username": "zhiyun"
}},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [
{"name":
"id","type":"int"},
{"name":"saleno","type":"string"},
{"name":"cardno","type":"string"},
{"name":"netsum","type":"string"},
{"name":"paytype","type":"string"},
{"name":"bak1","type":"string"},
{"name":"stamp","type":"string"}
],
"defaultFS": "hdfs://cdh02:8020",
"fieldDelimiter": "\t",
"fileName": "erp_u_sale_pay_inc.data",
"fileType": "orc",
"path":
"/zhiyun/shihaihong/tmp/erp_u_sale_pay_inc",
"writeMode": "truncate"
}
}
}
],
"setting": {
"speed": {
"channel": "3"
}
}
}
}
' > /zhiyun/shihaihong/jobs/erp_u_sale_play_inc.json
echo "开始抽取"
hadoop fs -mkdir -p /zhiyun/shihaihong/tmp/erp_u_sale_play_inc
python /opt/datax/bin/datax.py
/zhiyun/shihaihong/jobs/erp_u_sale_play_inc.json
echo "hive 建表"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e 'create database if not exists ods_shihaihong location
"/zhiyun/shihaihong/ods";
create external table if not exists
ods_shihaihong.erp_u_sale_play_inc(
id int,
saleno string,
cardno string,
netsum string,
paytype string,
bak1 string
) partitioned by (stamp string)
row format delimited fields terminated by "\t"
lines terminated by "\n"
stored as orc
location "/zhiyun/shihaihong/ods/erp_u_sale_play_inc";
'
echo "加载数据"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e "
load data inpath '/zhiyun/shihaihong/tmp/erp_u_sale_play_inc/*'
overwrite into table ods_shihaihong.erp_u_sale_play_inc
partition(stamp='"$day"');
"
echo "验证数据"
beeline -u jdbc:hive2://localhost:10000 -n root -p 123 -e "
show partitions ods_shihaihong.erp_u_sale_play_inc;
select count(1) from ods_shihaihong.erp_u_sale_play_inc where
stamp = '"$day"';
select * from ods_shihaihong.erp_u_sale_play_inc where stamp =
'"$day"' limit 5;
"
echo "抽取完成"
任务调度:
执行一次:

相关文章:

学习大数据DAY59 全量抽取和增量抽取实战

目录 需求流程&#xff1a; 需求分析与规范 作业 作业2 需求流程&#xff1a; 全量抽取 增量抽取 - DataX Kettle Sqoop ... 场景: 业务部门同事或者甲方的工作人员给我们的部门经理和你提出了新的需 求 流程: 联系 > 开会讨论 > 确认需求 > 落地 需求文档( 具体…...

YOLOv8——测量高速公路上汽车的速度

引言 在人工神经网络和计算机视觉领域&#xff0c;目标识别和跟踪是非常重要的技术&#xff0c;它们可以应用于无数的项目中&#xff0c;其中许多可能不是很明显&#xff0c;比如使用这些算法来测量距离或对象的速度。 测量汽车速度基本步骤如下&#xff1a; 视频采集&#x…...

在线相亲交友系统:寻找另一半的新方式

在这个快节奏的时代里&#xff0c;越来越多的单身男女发现&#xff0c;传统意义上的相亲方式已经难以满足他们的需求。与此同时&#xff0c;互联网技术的迅猛发展为人们提供了新的社交渠道——在线相亲交友系统作者h17711347205。本文将探讨在线相亲交友系统如何成为一种寻找另…...

MySQL 中存储过程参数的设置与使用

《MySQL 中存储过程参数的设置与使用》 在 MySQL 数据库中&#xff0c;存储过程是一组预先编译好的 SQL 语句集合&#xff0c;可以接受参数并返回结果。使用存储过程可以提高数据库的性能和可维护性&#xff0c;同时也可以减少网络流量和代码重复。那么&#xff0c;如何在 MyS…...

2k1000LA 调试HDMI

问题: 客户需要使用HDMI 接口,1080p 的分辨率。 ---------------------------------------------------------------------------------------------------------------- 这里需要看看 龙芯派的 demo 版 的 硬件上的连接。 硬件上: 官方的demo 板 , dvo1 应该是 HDMI的…...

24年蓝桥杯及攻防世界赛题-MISC-1

2 What-is-this AZADI TOWER 3 Avatar 题目 一个恐怖份子上传了这张照片到社交网络。里面藏了什么信息?隐藏内容即flag 解题 ┌──(holyeyes㉿kali2023)-[~/Misc/tool-misc/outguess] └─$ outguess -r 035bfaa85410429495786d8ea6ecd296.jpg flag1.txt Reading 035bf…...

前端项目代码开发规范及工具配置

在项目开发中&#xff0c;良好的代码编写规范是项目组成的重要元素。本文将详细介绍在项目开发中如何集成相应的代码规范插件及使用方法。 项目规范及工具 集成 EditorConfig集成 Prettier1. 安装 Prettier2. 创建 Prettier 配置文件3. 配置 .prettierrc4. 使用 Prettier 集成 …...

【JVM】JVM执行流程和内存区域划分

文章目录 是什么JVM 执行流程内存区域划分堆栈程序计数器元数据区经典笔试题 是什么 Java 虚拟机 JDK&#xff0c;Java 开发工具包JRE&#xff0c;Java 运行时环境JVM&#xff0c;Java 虚拟机 JVM 就是 Java 虚拟机&#xff0c;解释执行 Java 字节码 JVM 执行流程 编程语言…...

Python | 读取.dat 文件

写在前面 使用matlab可以输出为 .dat 或者 .mat 形式的文件&#xff0c;之前介绍过读取 .mat 后缀文件&#xff0c;今天正好把 .dat 的读取也记录一下。 读取方法 这里可以使用pandas库将其作为一个dataframe的形式读取进python&#xff0c;数据内容格式如下&#xff0c;根据…...

信息技术的变革与未来发展的思考

信息技术的变革与未来发展的思考 在21世纪&#xff0c;信息技术&#xff08;IT&#xff09;正在以前所未有的速度推动社会、经济、文化的深刻变革。无论是人工智能、大数据&#xff0c;还是云计算、物联网&#xff0c;信息技术的发展已经渗透到了各个行业&#xff0c;彻底改变…...

融会贯通记单词,绝对丝滑,一天轻松记几百

如果我将flower(花&#xff09;、flat(公寓)、floor(地板)、plane(飞机)几个单词放在一起&#xff0c;你会怎么来记忆这样的一些单词呢&#xff1f; 我们会发现&#xff0c;我们首先可以将plane去掉&#xff0c;因为它看上去几乎就是一个异类。这样&#xff0c;我们首先就可以将…...

【计算机视觉】YoloV8-训练与测试教程

✨ Blog’s 主页: 白乐天_ξ( ✿&#xff1e;◡❛) &#x1f308; 个人Motto&#xff1a;他强任他强&#xff0c;清风拂山冈&#xff01; &#x1f4ab; 欢迎来到我的学习笔记&#xff01; 制作数据集 Labelme 数据集 数据集选用自己标注的&#xff0c;可参考以下&#xff1a…...

响应式布局-媒体查询父级布局容器

1.响应式布局容器 父局作为布局容器&#xff0c;配合自己元素实现变化效果&#xff0c;原理&#xff1a;在不通过屏幕下面吗&#xff0c;通过媒体查询来改变子元素的排列方式和大小&#xff0c;从而实现不同尺寸屏幕下看到不同的效果。 2.响应尺寸布局容器常见宽度划分 手机-…...

Android APN type 配置和问题

问题/疑问 如果APN配置了非法类型(代码没有定义的),则APN匹配加载的时候最终结果会是空类型。 那么到底是xml解析到数据库就是空type呢?还是Java代码匹配的时候映射是空的呢? Debug Log 尝试将原本的APN type加入ota或者新建一条ota type APN,检查log情况。 //Type有…...

前端mock了所有……

目录 一、背景描述 二、开发流程 1.引入Mock 2.创建文件 3.需求描述 4.Mock实现 三、总结 一、背景描述 前提&#xff1a; 事情是这样的&#xff0c;老板想要我们写一个demo拿去路演/拉项目&#xff0c;有一些数据&#xff0c;希望前端接一下&#xff0c;写几个表格&a…...

fiddler抓包10_列表显示请求方法

① 请求列表表头&#xff0c;鼠标悬停点击右键弹出选项菜单。 ② 点击“Customize columns”&#xff08;定制列&#xff09;。 ③ 弹窗中&#xff0c;“Collection”下拉列表选择“Miscellaneous”&#xff08;更多字段&#xff09;。 ④ “Field Name”选择“RequestMethod”…...

Win10系统复制、粘贴、新建、删除文件或文件夹后需要手动刷新的解决办法

有些win10系统可能会出现新建、粘贴、删除文件或文件夹后保持原来的状态不变&#xff0c;需要手动刷新&#xff0c;我这边新装的几个系统都有这个问题&#xff0c;已经困扰很久了&#xff0c;我从微软论坛和CSDN社区找了了很多方法都没解决&#xff0c;微软工程师给的建议包括重…...

BERT训练环节(代码实现)

1.代码实现 #导包 import torch from torch import nn import dltools #加载数据需要用到的声明变量 batch_size, max_len 1, 64 #获取训练数据迭代器、词汇表 train_iter, vocab dltools.load_data_wiki(batch_size, max_len) #其余都是二维数组 #tokens, segments, vali…...

必须执行该语句才能获得结果

UncategorizedSQLException: Error getting generated key or setting result to parameter object. Cause: com.microsoft.sqlserver.jdbc.SQLServerException: 必须执行该语句才能获得结果。 ; uncategorized SQLException; SQL state [null]; error code [0]; 必须执行该语句…...

AI论文写作可靠吗?分享5款论文写作助手ai免费网站

AI论文写作的可靠性是一个备受关注的话题。在当前的技术背景下&#xff0c;AI写作工具能够显著提高论文写作的效率和质量&#xff0c;但其可靠性和安全性仍需谨慎评估。 AI论文写作的可靠性 技术能力与限制 AI论文写作的质量很大程度上取决于用户提供的输入指令或素材的质量…...

网络编程(Modbus进阶)

思维导图 Modbus RTU&#xff08;先学一点理论&#xff09; 概念 Modbus RTU 是工业自动化领域 最广泛应用的串行通信协议&#xff0c;由 Modicon 公司&#xff08;现施耐德电气&#xff09;于 1979 年推出。它以 高效率、强健性、易实现的特点成为工业控制系统的通信标准。 包…...

简易版抽奖活动的设计技术方案

1.前言 本技术方案旨在设计一套完整且可靠的抽奖活动逻辑,确保抽奖活动能够公平、公正、公开地进行,同时满足高并发访问、数据安全存储与高效处理等需求,为用户提供流畅的抽奖体验,助力业务顺利开展。本方案将涵盖抽奖活动的整体架构设计、核心流程逻辑、关键功能实现以及…...

Python:操作 Excel 折叠

💖亲爱的技术爱好者们,热烈欢迎来到 Kant2048 的博客!我是 Thomas Kant,很开心能在CSDN上与你们相遇~💖 本博客的精华专栏: 【自动化测试】 【测试经验】 【人工智能】 【Python】 Python 操作 Excel 系列 读取单元格数据按行写入设置行高和列宽自动调整行高和列宽水平…...

Java线上CPU飙高问题排查全指南

一、引言 在Java应用的线上运行环境中&#xff0c;CPU飙高是一个常见且棘手的性能问题。当系统出现CPU飙高时&#xff0c;通常会导致应用响应缓慢&#xff0c;甚至服务不可用&#xff0c;严重影响用户体验和业务运行。因此&#xff0c;掌握一套科学有效的CPU飙高问题排查方法&…...

服务器--宝塔命令

一、宝塔面板安装命令 ⚠️ 必须使用 root 用户 或 sudo 权限执行&#xff01; sudo su - 1. CentOS 系统&#xff1a; yum install -y wget && wget -O install.sh http://download.bt.cn/install/install_6.0.sh && sh install.sh2. Ubuntu / Debian 系统…...

HarmonyOS运动开发:如何用mpchart绘制运动配速图表

##鸿蒙核心技术##运动开发##Sensor Service Kit&#xff08;传感器服务&#xff09;# 前言 在运动类应用中&#xff0c;运动数据的可视化是提升用户体验的重要环节。通过直观的图表展示运动过程中的关键数据&#xff0c;如配速、距离、卡路里消耗等&#xff0c;用户可以更清晰…...

安宝特案例丨Vuzix AR智能眼镜集成专业软件,助力卢森堡医院药房转型,赢得辉瑞创新奖

在Vuzix M400 AR智能眼镜的助力下&#xff0c;卢森堡罗伯特舒曼医院&#xff08;the Robert Schuman Hospitals, HRS&#xff09;凭借在无菌制剂生产流程中引入增强现实技术&#xff08;AR&#xff09;创新项目&#xff0c;荣获了2024年6月7日由卢森堡医院药剂师协会&#xff0…...

Java编程之桥接模式

定义 桥接模式&#xff08;Bridge Pattern&#xff09;属于结构型设计模式&#xff0c;它的核心意图是将抽象部分与实现部分分离&#xff0c;使它们可以独立地变化。这种模式通过组合关系来替代继承关系&#xff0c;从而降低了抽象和实现这两个可变维度之间的耦合度。 用例子…...

人工智能(大型语言模型 LLMs)对不同学科的影响以及由此产生的新学习方式

今天是关于AI如何在教学中增强学生的学习体验&#xff0c;我把重要信息标红了。人文学科的价值被低估了 ⬇️ 转型与必要性 人工智能正在深刻地改变教育&#xff0c;这并非炒作&#xff0c;而是已经发生的巨大变革。教育机构和教育者不能忽视它&#xff0c;试图简单地禁止学生使…...

C#中的CLR属性、依赖属性与附加属性

CLR属性的主要特征 封装性&#xff1a; 隐藏字段的实现细节 提供对字段的受控访问 访问控制&#xff1a; 可单独设置get/set访问器的可见性 可创建只读或只写属性 计算属性&#xff1a; 可以在getter中执行计算逻辑 不需要直接对应一个字段 验证逻辑&#xff1a; 可以…...