当前位置: 首页 > news >正文

DataX,MongoDB数据导入hdfs与mysql

  1. 【尚硅谷】Alibaba开源数据同步工具DataX技术教程
  2. 【尚硅谷】Alibaba开源数据同步工具DataX技术教程_哔哩哔哩_bilibili

目录

1、MongoDB

1.1、MongoDB介绍

1.2、MongoDB基本概念解析

1.3、MongoDB中的数据存储结构

1.4、MongoDB启动服务

1.5、MongoDB小案例

2、DataX导入导出案例

2.1、读取MongoDB的数据导入到HDFS

2.1.1、mongodb2hdfs.json

2.2.2、mongodb数据

2.2.3、运行命令

2.2、读取 MongoDB 的数据导入 MySQL

2.2.1、mongodb2mysql.json

2.2.2、MySQL数据

2.2.3、运行命令


1、MongoDB

1.1、MongoDB介绍

  1. MongoDB:应用程序数据平台 | MongoDB

1.2、MongoDB基本概念解析

1.3、MongoDB中的数据存储结构

通过下图实例,我们也可以更直观的了解 Mongo 中的一些概念:

1.4、MongoDB启动服务

1、启动 MongoDB 服务

[atguigu@node001 mongodb-5.0.2]$ pwd
/opt/module/mongoDB/mongodb-5.0.2
[atguigu@node001 mongodb-5.0.2]$ bin/mongod --bind_ip 0.0.0.0

2、进入 shell 页面

[atguigu@node001 bin]$ pwd
/opt/module/mongoDB/mongodb-5.0.2/bin
[atguigu@node001 bin]$ /opt/module/mongoDB/mongodb-5.0.2/bin/mongo

5)启动 MongoDB 服务
[atguigu@hadoop102 mongodb]$ bin/mongod
6)进入 shell 页面
[atguigu@hadoop102 mongodb]$ bin/mongo
[atguigu@node001 mongodb-5.0.2]$ pwd
/opt/module/mongoDB/mongodb-5.0.2
[atguigu@node001 mongodb-5.0.2]$ sudo mkdir -p /data/db/
[atguigu@node001 mongodb-5.0.2]$ sudo mkdir -p /opt/module/mongoDB/mongodb-5.0.2/data/db/
[atguigu@node001 mongodb-5.0.2]$ sudo chmod 777 -R /data/db/
[atguigu@node001 mongodb-5.0.2]$ sudo chmod 777 -R /opt/module/mongoDB/mongodb-5.0.2/data/db/
[atguigu@node001 mongodb-5.0.2]$ bin/mongod
[atguigu@node001 bin]$ /opt/module/mongoDB/mongodb-5.0.2/bin/mongo
MongoDB shell version v5.0.2
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("b9e6b776-c67d-4a8d-8e31-c0350850065e") }
MongoDB server version: 5.0.2
================
Warning: the "mongo" shell has been superseded by "mongosh",
which delivers improved usability and compatibility.The "mongo" shell has been deprecated and will be removed in
an upcoming release.
We recommend you begin using "mongosh".
For installation instructions, see
https://docs.mongodb.com/mongodb-shell/install/
================
Welcome to the MongoDB shell.
For interactive help, type "help".
For more comprehensive documentation, seehttps://docs.mongodb.com/
Questions? Try the MongoDB Developer Community Forumshttps://community.mongodb.com
---
The server generated these startup warnings when booting: 2024-04-09T15:39:39.728+08:00: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine. See http://dochub.mongodb.org/core/prodnotes-filesystem2024-04-09T15:39:40.744+08:00: Access control is not enabled for the database. Read and write access to data and configuration is unrestricted2024-04-09T15:39:40.744+08:00: This server is bound to localhost. Remote systems will be unable to connect to this server. Start the server with --bind_ip <address> to specify which IP addresses it should serve responses from, or with --bind_ip_all to bind to all interfaces. If this behavior is desired, start the server with --bind_ip 127.0.0.1 to disable this warning2024-04-09T15:39:40.748+08:00: /sys/kernel/mm/transparent_hugepage/enabled is 'always'. We suggest setting it to 'never'2024-04-09T15:39:40.749+08:00: /sys/kernel/mm/transparent_hugepage/defrag is 'always'. We suggest setting it to 'never'
---
---Enable MongoDB's free cloud-based monitoring service, which will then receive and displaymetrics about your deployment (disk utilization, CPU, operation statistics, etc).The monitoring data will be available on a MongoDB website with a unique URL accessible to youand anyone you share the URL with. MongoDB may use this information to make productimprovements and to suggest MongoDB products and deployment options to you.To enable free monitoring, run the following command: db.enableFreeMonitoring()To permanently disable this reminder, run the following command: db.disableFreeMonitoring()
---
> 

1.5、MongoDB小案例

> show databases;
admin   0.000GB
config  0.000GB
local   0.000GB
> show tables;
> db
test
> use test
switched to db test
> db.create
db.createCollection(  db.createRole(        db.createUser(        db.createView(
> db.createCollection("atguigu")
{ "ok" : 1 }
> show tables;
atguigu
> db.atguigu.insert({"name":"atguigu","url":"www.atguigu.com"})
WriteResult({ "nInserted" : 1 })
>  db.atguigu.find()
{ "_id" : ObjectId("6614f3467fc519a5008ef01a"), "name" : "atguigu", "url" : "www.atguigu.com" }
> db.createCollection("mycol",{ capped : true,autoIndexId : true,size : 6142800, max :
... 1000})
{"note" : "The autoIndexId option is deprecated and will be removed in a future release","ok" : 1
}
> show tables;
atguigu
mycol
> db.mycol2.insert({"name":"atguigu"})
WriteResult({ "nInserted" : 1 })
> show collections
atguigu
mycol
mycol2
> 

2、DataX导入导出案例

2.1、读取MongoDB的数据导入到HDFS

2.1.1、mongodb2hdfs.json

[atguigu@node001 ~]$ cd /opt/module/datax
[atguigu@node001 datax]$ bin/datax.py -r mongodbreader -w hdfswriterDataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.Please refer to the mongodbreader document:https://github.com/alibaba/DataX/blob/master/mongodbreader/doc/mongodbreader.md Please refer to the hdfswriter document:https://github.com/alibaba/DataX/blob/master/hdfswriter/doc/hdfswriter.md Please save the following configuration as a json file and  usepython {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json 
to run the job.{"job": {"content": [{"reader": {"name": "mongodbreader", "parameter": {"address": [], "collectionName": "", "column": [], "dbName": "", "userName": "", "userPassword": ""}}, "writer": {"name": "hdfswriter", "parameter": {"column": [], "compress": "", "defaultFS": "", "fieldDelimiter": "", "fileName": "", "fileType": "", "path": "", "writeMode": ""}}}], "setting": {"speed": {"channel": ""}}}
}
[atguigu@node001 datax]$ 
{"job": {"content": [{"reader": {"name": "mongodbreader","parameter": {"address": ["node001:27017"],"collectionName": "atguigu","column": [{"name": "name","type": "string"},{"name": "url","type": "string"}],"dbName": "test","userName": "","userPassword": ""}},"writer": {"name": "hdfswriter","parameter": {"column": [{"name": "name","type": "string"},{"name": "url","type": "string"}],"compress": "","defaultFS": "hdfs://node001:8020","fieldDelimiter": "\t","fileName": "mongoDB001.txt","fileType": "text","path": "/mongoDB","writeMode": "append"}}}],"setting": {"speed": {"channel": "1"}}}
}

2.2.2、mongodb数据

> db.atguigu.find()
{ "_id" : ObjectId("6614f3467fc519a5008ef01a"), "name" : "atguigu", "url" : "www.atguigu.com" }
> db.atguigu.insert({"name":"atguigu001","url":"www.atguigu.com001"})
WriteResult({ "nInserted" : 1 })
> db.atguigu.insert({"name":"atguigu002","url":"www.atguigu.com002"})
WriteResult({ "nInserted" : 1 })
> db.atguigu.insert({"name":"atguigu003","url":"www.atguigu.com003"})
WriteResult({ "nInserted" : 1 })
> db.atguigu.find()
{ "_id" : ObjectId("6614f3467fc519a5008ef01a"), "name" : "atguigu", "url" : "www.atguigu.com" }
{ "_id" : ObjectId("6614fb187fc519a5008ef01c"), "name" : "atguigu001", "url" : "www.atguigu.com001" }
{ "_id" : ObjectId("6614fb207fc519a5008ef01d"), "name" : "atguigu002", "url" : "www.atguigu.com002" }
{ "_id" : ObjectId("6614fb297fc519a5008ef01e"), "name" : "atguigu003", "url" : "www.atguigu.com003" }
> 

2.2.3、运行命令

[atguigu@node001 mongodb-5.0.2]$ bin/mongod --bind_ip 0.0.0.0
{"t":{"$date":"2024-04-09T16:11:21.173+08:00"},"s":"I",  "c":"NETWORK",  "id":4915701, "ctx":"-","msg":"Initialized wire specification","attr":{"spec":{"incomingExternalClient":{"minWireVersion":0,"maxWireVersion":13},"incomingInternalClient":{"minWireVersion":0,"maxWireVersion":13},"outgoing":{"minWireVersion":0,"maxWireVersion":13},"isInternalClient":true}}}
[atguigu@node001 datax]$ bin/datax.py job/mongodb/mongdb2hdfs.json DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.2024-04-09 16:13:00.914 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2024-04-09 16:13:00.930 [main] INFO  Engine - the machine info  => osInfo: Red Hat, Inc. 1.8 25.372-b07jvmInfo:        Linux amd64 3.10.0-862.el7.x86_64cpu num:        4totalPhysicalMemory:    -0.00GfreePhysicalMemory:     -0.00GmaxFileDescriptorCount: -1currentOpenFileDescriptorCount: -1GC Names        [PS MarkSweep, PS Scavenge]MEMORY_NAME                    | allocation_size                | init_size                      PS Eden Space                  | 256.00MB                       | 256.00MB                       Code Cache                     | 240.00MB                       | 2.44MB                         Compressed Class Space         | 1,024.00MB                     | 0.00MB                         PS Survivor Space              | 42.50MB                        | 42.50MB                        PS Old Gen                     | 683.00MB                       | 683.00MB                       Metaspace                      | -0.00MB                        | 0.00MB                         2024-04-09 16:13:00.959 [main] INFO  Engine - 
{"content":[{"reader":{"name":"mongodbreader","parameter":{"address":["node001:27017"],"collectionName":"atguigu","column":[{"name":"name","type":"string"},{"name":"url","type":"string"}],"dbName":"test","userName":"","userPassword":""}},"writer":{"name":"hdfswriter","parameter":{"column":[{"name":"name","type":"string"},{"name":"url","type":"string"}],"compress":"","defaultFS":"hdfs://node001:8020","fieldDelimiter":"\t","fileName":"mongo.txt","fileType":"text","path":"/","writeMode":"append"}}}],"setting":{"speed":{"channel":"1"}}
}2024-04-09 16:13:00.986 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2024-04-09 16:13:00.990 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2024-04-09 16:13:00.990 [main] INFO  JobContainer - DataX jobContainer starts job.
2024-04-09 16:13:00.993 [main] INFO  JobContainer - Set jobId = 0
2024-04-09 16:13:01.175 [job-0] INFO  cluster - Cluster created with settings {hosts=[node001:27017], mode=MULTIPLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
2024-04-09 16:13:01.175 [job-0] INFO  cluster - Adding discovered server node001:27017 to client view of cluster
2024-04-09 16:13:01.405 [cluster-ClusterId{value='6614f88d21b10d1b9122f208', description='null'}-node001:27017] INFO  connection - Opened connection [connectionId{localValue:1, serverValue:2}] to node001:27017
2024-04-09 16:13:01.408 [cluster-ClusterId{value='6614f88d21b10d1b9122f208', description='null'}-node001:27017] INFO  cluster - Monitor thread successfully connected to server with description ServerDescription{address=node001:27017, type=STANDALONE, state=CONNECTED, ok=true, version=ServerVersion{versionList=[5, 0, 2]}, minWireVersion=0, maxWireVersion=13, maxDocumentSize=16777216, roundTripTimeNanos=1825826}
2024-04-09 16:13:01.410 [cluster-ClusterId{value='6614f88d21b10d1b9122f208', description='null'}-node001:27017] INFO  cluster - Discovered cluster type of STANDALONE
四月 09, 2024 4:13:02 下午 org.apache.hadoop.util.NativeCodeLoader <clinit>
警告: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2024-04-09 16:13:02.883 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2024-04-09 16:13:02.886 [job-0] INFO  JobContainer - DataX Reader.Job [mongodbreader] do prepare work .
2024-04-09 16:13:02.887 [job-0] INFO  JobContainer - DataX Writer.Job [hdfswriter] do prepare work .
2024-04-09 16:13:03.426 [job-0] INFO  HdfsWriter$Job - 由于您配置了writeMode append, 写入前不做清理工作, [/] 目录下写入相应文件名前缀  [mongo.txt] 的文件
2024-04-09 16:13:03.429 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2024-04-09 16:13:03.430 [job-0] INFO  JobContainer - Job set Channel-Number to 1 channels.
2024-04-09 16:13:03.565 [job-0] INFO  connection - Opened connection [connectionId{localValue:2, serverValue:3}] to node001:27017
2024-04-09 16:13:03.591 [job-0] INFO  JobContainer - DataX Reader.Job [mongodbreader] splits to [1] tasks.
2024-04-09 16:13:03.592 [job-0] INFO  HdfsWriter$Job - begin do split...
2024-04-09 16:13:03.628 [job-0] INFO  HdfsWriter$Job - splited write file name:[hdfs://node001:8020/__8ea8878b_9c2e_4b6a_95b7_dcacb6f05d49/mongo.txt__355376ce_ce5d_40be_911e_e884dfac15bd]
2024-04-09 16:13:03.628 [job-0] INFO  HdfsWriter$Job - end do split.
2024-04-09 16:13:03.629 [job-0] INFO  JobContainer - DataX Writer.Job [hdfswriter] splits to [1] tasks.
2024-04-09 16:13:03.709 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2024-04-09 16:13:03.731 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2024-04-09 16:13:03.741 [job-0] INFO  JobContainer - Running by standalone Mode.
2024-04-09 16:13:03.774 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2024-04-09 16:13:03.823 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2024-04-09 16:13:03.824 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2024-04-09 16:13:03.942 [0-0-0-reader] INFO  cluster - Cluster created with settings {hosts=[node001:27017], mode=MULTIPLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
2024-04-09 16:13:03.942 [0-0-0-reader] INFO  cluster - Adding discovered server node001:27017 to client view of cluster
2024-04-09 16:13:03.943 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2024-04-09 16:13:03.986 [cluster-ClusterId{value='6614f88f21b10d1b9122f209', description='null'}-node001:27017] INFO  connection - Opened connection [connectionId{localValue:3, serverValue:4}] to node001:27017
2024-04-09 16:13:03.988 [cluster-ClusterId{value='6614f88f21b10d1b9122f209', description='null'}-node001:27017] INFO  cluster - Monitor thread successfully connected to server with description ServerDescription{address=node001:27017, type=STANDALONE, state=CONNECTED, ok=true, version=ServerVersion{versionList=[5, 0, 2]}, minWireVersion=0, maxWireVersion=13, maxDocumentSize=16777216, roundTripTimeNanos=1293606}
2024-04-09 16:13:03.988 [cluster-ClusterId{value='6614f88f21b10d1b9122f209', description='null'}-node001:27017] INFO  cluster - Discovered cluster type of STANDALONE
2024-04-09 16:13:04.029 [0-0-0-reader] INFO  connection - Opened connection [connectionId{localValue:4, serverValue:5}] to node001:27017
2024-04-09 16:13:04.061 [0-0-0-writer] INFO  HdfsWriter$Task - begin do write...
2024-04-09 16:13:04.062 [0-0-0-writer] INFO  HdfsWriter$Task - write to file : [hdfs://node001:8020/__8ea8878b_9c2e_4b6a_95b7_dcacb6f05d49/mongo.txt__355376ce_ce5d_40be_911e_e884dfac15bd]
2024-04-09 16:13:08.157 [0-0-0-writer] INFO  HdfsWriter$Task - end do write
2024-04-09 16:13:08.185 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[4268]ms
2024-04-09 16:13:08.185 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2024-04-09 16:13:13.866 [job-0] INFO  StandAloneJobContainerCommunicator - Total 1 records, 22 bytes | Speed 2B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2024-04-09 16:13:13.866 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2024-04-09 16:13:13.866 [job-0] INFO  JobContainer - DataX Writer.Job [hdfswriter] do post work.
2024-04-09 16:13:13.867 [job-0] INFO  HdfsWriter$Job - start rename file [hdfs://node001:8020/__8ea8878b_9c2e_4b6a_95b7_dcacb6f05d49/mongo.txt__355376ce_ce5d_40be_911e_e884dfac15bd] to file [hdfs://node001:8020/mongo.txt__355376ce_ce5d_40be_911e_e884dfac15bd].
2024-04-09 16:13:13.964 [job-0] INFO  HdfsWriter$Job - finish rename file [hdfs://node001:8020/__8ea8878b_9c2e_4b6a_95b7_dcacb6f05d49/mongo.txt__355376ce_ce5d_40be_911e_e884dfac15bd] to file [hdfs://node001:8020/mongo.txt__355376ce_ce5d_40be_911e_e884dfac15bd].
2024-04-09 16:13:13.965 [job-0] INFO  HdfsWriter$Job - start delete tmp dir [hdfs://node001:8020/__8ea8878b_9c2e_4b6a_95b7_dcacb6f05d49] .
2024-04-09 16:13:13.995 [job-0] INFO  HdfsWriter$Job - finish delete tmp dir [hdfs://node001:8020/__8ea8878b_9c2e_4b6a_95b7_dcacb6f05d49] .
2024-04-09 16:13:13.996 [job-0] INFO  JobContainer - DataX Reader.Job [mongodbreader] do post work.
2024-04-09 16:13:13.996 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2024-04-09 16:13:13.997 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /opt/module/datax/hook
2024-04-09 16:13:14.102 [job-0] INFO  JobContainer - [total cpu info] => averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    -1.00%                         | -1.00%                         | -1.00%[total gc info] => NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     PS MarkSweep         | 1                  | 1                  | 1                  | 0.042s             | 0.042s             | 0.042s             PS Scavenge          | 1                  | 1                  | 1                  | 0.020s             | 0.020s             | 0.020s             2024-04-09 16:13:14.102 [job-0] INFO  JobContainer - PerfTrace not enable!
2024-04-09 16:13:14.103 [job-0] INFO  StandAloneJobContainerCommunicator - Total 1 records, 22 bytes | Speed 2B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2024-04-09 16:13:14.111 [job-0] INFO  JobContainer - 
任务启动时刻                    : 2024-04-09 16:13:00
任务结束时刻                    : 2024-04-09 16:13:14
任务总计耗时                    :                 13s
任务平均流量                    :                2B/s
记录写入速度                    :              0rec/s
读出记录总数                    :                   1
读写失败总数                    :                   0[atguigu@node001 datax]$ 

2.2、读取 MongoDB 的数据导入 MySQL

2.2.1、mongodb2mysql.json

[atguigu@node001 datax]$ bin/datax.py -r mongodbreader -w mysqlwriterDataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.Please refer to the mongodbreader document:https://github.com/alibaba/DataX/blob/master/mongodbreader/doc/mongodbreader.md Please refer to the mysqlwriter document:https://github.com/alibaba/DataX/blob/master/mysqlwriter/doc/mysqlwriter.md Please save the following configuration as a json file and  usepython {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json 
to run the job.{"job": {"content": [{"reader": {"name": "mongodbreader", "parameter": {"address": [], "collectionName": "", "column": [], "dbName": "", "userName": "", "userPassword": ""}}, "writer": {"name": "mysqlwriter", "parameter": {"column": [], "connection": [{"jdbcUrl": "", "table": []}], "password": "", "preSql": [], "session": [], "username": "", "writeMode": ""}}}], "setting": {"speed": {"channel": ""}}}
}
[atguigu@node001 datax]$ 
{"job": {"content": [{"reader": {"name": "mongodbreader","parameter": {"address": ["node001:27017"],"collectionName": "atguigu","column": [{"name": "name","type": "string"},{"name": "url","type": "string"}],"dbName": "test","userName": "","userPassword": ""}},"writer": {"name": "mysqlwriter","parameter": {"column": ["*"],"connection": [{"jdbcUrl": "jdbc:mysql://node001:3306/datax","table": ["mongodb"]}],"username": "root","password": "123456","preSql": [],"session": [],"writeMode": "insert"}}}],"setting": {"speed": {"channel": "1"}}}
}

2.2.2、MySQL数据

/*Navicat Premium Data TransferSource Server         : localhostSource Server Type    : MySQLSource Server Version : 50735 (5.7.35-log)Source Host           : localhost:3306Source Schema         : dataxTarget Server Type    : MySQLTarget Server Version : 50735 (5.7.35-log)File Encoding         : 65001Date: 09/04/2024 16:40:39
*/SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;-- ----------------------------
-- Table structure for atguigu
-- ----------------------------
DROP TABLE IF EXISTS `atguigu`;
CREATE TABLE `atguigu`  (`name` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,`url` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Dynamic;-- ----------------------------
-- Records of atguigu
-- ----------------------------
INSERT INTO `atguigu` VALUES ('111', '111');
INSERT INTO `atguigu` VALUES ('名字', '地址');-- ----------------------------
-- Table structure for mongodb
-- ----------------------------
DROP TABLE IF EXISTS `mongodb`;
CREATE TABLE `mongodb`  (`name` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,`url` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Dynamic;-- ----------------------------
-- Records of mongodb
-- ----------------------------
INSERT INTO `mongodb` VALUES ('111', '111');
INSERT INTO `mongodb` VALUES ('aaa', 'bbb');
INSERT INTO `mongodb` VALUES ('名字', '地址');SET FOREIGN_KEY_CHECKS = 1;

2.2.3、运行命令

[atguigu@node001 datax]$ bin/datax.py job/mongodb/mongodb2mysql.json DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.2024-04-09 16:42:09.299 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2024-04-09 16:42:09.319 [main] INFO  Engine - the machine info  => osInfo: Red Hat, Inc. 1.8 25.372-b07jvmInfo:        Linux amd64 3.10.0-862.el7.x86_64cpu num:        4totalPhysicalMemory:    -0.00GfreePhysicalMemory:     -0.00GmaxFileDescriptorCount: -1currentOpenFileDescriptorCount: -1GC Names        [PS MarkSweep, PS Scavenge]MEMORY_NAME                    | allocation_size                | init_size                      PS Eden Space                  | 256.00MB                       | 256.00MB                       Code Cache                     | 240.00MB                       | 2.44MB                         Compressed Class Space         | 1,024.00MB                     | 0.00MB                         PS Survivor Space              | 42.50MB                        | 42.50MB                        PS Old Gen                     | 683.00MB                       | 683.00MB                       Metaspace                      | -0.00MB                        | 0.00MB                         2024-04-09 16:42:09.366 [main] INFO  Engine - 
{"content":[{"reader":{"name":"mongodbreader","parameter":{"address":["node001:27017"],"collectionName":"atguigu","column":[{"name":"name","type":"string"},{"name":"url","type":"string"}],"dbName":"test","userName":"","userPassword":""}},"writer":{"name":"mysqlwriter","parameter":{"column":["*"],"connection":[{"jdbcUrl":"jdbc:mysql://node001:3306/datax","table":["mongodb"]}],"password":"******","preSql":[],"session":[],"username":"root","writeMode":"insert"}}}],"setting":{"speed":{"channel":"1"}}
}2024-04-09 16:42:09.401 [main] WARN  Engine - prioriy set to 0, because NumberFormatException, the value is: null
2024-04-09 16:42:09.405 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
2024-04-09 16:42:09.405 [main] INFO  JobContainer - DataX jobContainer starts job.
2024-04-09 16:42:09.411 [main] INFO  JobContainer - Set jobId = 0
2024-04-09 16:42:09.681 [job-0] INFO  cluster - Cluster created with settings {hosts=[node001:27017], mode=MULTIPLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
2024-04-09 16:42:09.681 [job-0] INFO  cluster - Adding discovered server node001:27017 to client view of cluster
2024-04-09 16:42:09.951 [cluster-ClusterId{value='6614ff6121b10d1da8148cac', description='null'}-node001:27017] INFO  connection - Opened connection [connectionId{localValue:1, serverValue:17}] to node001:27017
2024-04-09 16:42:09.954 [cluster-ClusterId{value='6614ff6121b10d1da8148cac', description='null'}-node001:27017] INFO  cluster - Monitor thread successfully connected to server with description ServerDescription{address=node001:27017, type=STANDALONE, state=CONNECTED, ok=true, version=ServerVersion{versionList=[5, 0, 2]}, minWireVersion=0, maxWireVersion=13, maxDocumentSize=16777216, roundTripTimeNanos=1877048}
2024-04-09 16:42:09.960 [cluster-ClusterId{value='6614ff6121b10d1da8148cac', description='null'}-node001:27017] INFO  cluster - Discovered cluster type of STANDALONE
2024-04-09 16:42:11.020 [job-0] INFO  OriginalConfPretreatmentUtil - table:[mongodb] all columns:[
name,url
].
2024-04-09 16:42:11.021 [job-0] WARN  OriginalConfPretreatmentUtil - 您的配置文件中的列配置信息存在风险. 因为您配置的写入数据库表的列为*,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改.
2024-04-09 16:42:11.024 [job-0] INFO  OriginalConfPretreatmentUtil - Write data [
insert INTO %s (name,url) VALUES(?,?)
], which jdbcUrl like:[jdbc:mysql://node001:3306/datax?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]
2024-04-09 16:42:11.025 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2024-04-09 16:42:11.027 [job-0] INFO  JobContainer - DataX Reader.Job [mongodbreader] do prepare work .
2024-04-09 16:42:11.031 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
2024-04-09 16:42:11.032 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2024-04-09 16:42:11.032 [job-0] INFO  JobContainer - Job set Channel-Number to 1 channels.
2024-04-09 16:42:11.094 [job-0] INFO  connection - Opened connection [connectionId{localValue:2, serverValue:18}] to node001:27017
2024-04-09 16:42:11.123 [job-0] INFO  JobContainer - DataX Reader.Job [mongodbreader] splits to [1] tasks.
2024-04-09 16:42:11.125 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks.
2024-04-09 16:42:11.172 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2024-04-09 16:42:11.183 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2024-04-09 16:42:11.189 [job-0] INFO  JobContainer - Running by standalone Mode.
2024-04-09 16:42:11.204 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2024-04-09 16:42:11.217 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2024-04-09 16:42:11.218 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2024-04-09 16:42:11.244 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2024-04-09 16:42:11.246 [0-0-0-reader] INFO  cluster - Cluster created with settings {hosts=[node001:27017], mode=MULTIPLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
2024-04-09 16:42:11.247 [0-0-0-reader] INFO  cluster - Adding discovered server node001:27017 to client view of cluster
2024-04-09 16:42:11.292 [cluster-ClusterId{value='6614ff6321b10d1da8148cad', description='null'}-node001:27017] INFO  connection - Opened connection [connectionId{localValue:3, serverValue:19}] to node001:27017
2024-04-09 16:42:11.295 [cluster-ClusterId{value='6614ff6321b10d1da8148cad', description='null'}-node001:27017] INFO  cluster - Monitor thread successfully connected to server with description ServerDescription{address=node001:27017, type=STANDALONE, state=CONNECTED, ok=true, version=ServerVersion{versionList=[5, 0, 2]}, minWireVersion=0, maxWireVersion=13, maxDocumentSize=16777216, roundTripTimeNanos=1154664}
2024-04-09 16:42:11.295 [cluster-ClusterId{value='6614ff6321b10d1da8148cad', description='null'}-node001:27017] INFO  cluster - Discovered cluster type of STANDALONE
2024-04-09 16:42:11.311 [0-0-0-reader] INFO  connection - Opened connection [connectionId{localValue:4, serverValue:20}] to node001:27017
2024-04-09 16:42:11.447 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[207]ms
2024-04-09 16:42:11.449 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2024-04-09 16:42:21.236 [job-0] INFO  StandAloneJobContainerCommunicator - Total 4 records, 106 bytes | Speed 10B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2024-04-09 16:42:21.236 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2024-04-09 16:42:21.237 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do post work.
2024-04-09 16:42:21.238 [job-0] INFO  JobContainer - DataX Reader.Job [mongodbreader] do post work.
2024-04-09 16:42:21.241 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2024-04-09 16:42:21.242 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: /opt/module/datax/hook
2024-04-09 16:42:21.244 [job-0] INFO  JobContainer - [total cpu info] => averageCpu                     | maxDeltaCpu                    | minDeltaCpu                    -1.00%                         | -1.00%                         | -1.00%[total gc info] => NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     PS MarkSweep         | 1                  | 1                  | 1                  | 0.057s             | 0.057s             | 0.057s             PS Scavenge          | 1                  | 1                  | 1                  | 0.021s             | 0.021s             | 0.021s             2024-04-09 16:42:21.244 [job-0] INFO  JobContainer - PerfTrace not enable!
2024-04-09 16:42:21.245 [job-0] INFO  StandAloneJobContainerCommunicator - Total 4 records, 106 bytes | Speed 10B/s, 0 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2024-04-09 16:42:21.248 [job-0] INFO  JobContainer - 
任务启动时刻                    : 2024-04-09 16:42:09
任务结束时刻                    : 2024-04-09 16:42:21
任务总计耗时                    :                 11s
任务平均流量                    :               10B/s
记录写入速度                    :              0rec/s
读出记录总数                    :                   4
读写失败总数                    :                   0[atguigu@node001 datax]$ 

相关文章:

DataX,MongoDB数据导入hdfs与mysql

【尚硅谷】Alibaba开源数据同步工具DataX技术教程【尚硅谷】Alibaba开源数据同步工具DataX技术教程_哔哩哔哩_bilibili 目录 1、MongoDB 1.1、MongoDB介绍 1.2、MongoDB基本概念解析 1.3、MongoDB中的数据存储结构 1.4、MongoDB启动服务 1.5、MongoDB小案例 2、DataX导入…...

【OpenCV-颜色空间】

OpenCV-颜色空间 ■ RGB■ BGR■ HSV■ HSL■ HUE■ YUV ■ RGB ■ BGR BGR 就是RGB R和B调换位置。 OpenCV 默认使用BGR ■ HSV ■ HSL ■ HUE ■ YUV...

电脑硬盘分区表的两种格式:MBR 和 GPT

电脑硬盘分区表的两种格式&#xff1a;MBR 和 GPT 段子手168 2024-4-5 电脑硬盘分区表有两种格式&#xff1a;MBR 和 GPT&#xff1a; 一、MBR 分区表 1.MBR 是主引导记录 (Master Boot Record) 的英文缩写 在传统&#xff08;Legacy&#xff09;硬盘分区模式中&#xff0c…...

kafka 常用非基础的核心设置项

在测试的过程中&#xff0c;心血来潮&#xff0c;想要测试下新topic中还没被消费的消息。专门查了下ai&#xff0c;奈何一本正经的胡说八道&#xff0c;浪费了点时间。现在记录下&#xff1a; 解决topic缺失时项目无法启动 &#xff0c; 报错&#xff1a; Topic(s) [……] is/a…...

杂谈 EV之我见

每周至少更新一片博文&#xff0c;没有目的的看代码是没有效率的&#xff0c;带着目的去看代码才会有所得&#xff0c; 目前车载行业火爆&#xff0c;得益于EV和AI技术的发展&#xff0c;汽车从一个传统工业产品&#xff0c;摇身一变成为了前沿科技产品。 小米su7的发布会我看…...

白色磨砂质感html5页源码

白色磨砂质感html5页源码&#xff0c;简约的基础上加上了团队成员&#xff0c;自动打字特效音乐播放器存活时间 源码下载 https://www.qqmu.com/2980.html...

sqlite建立数据库

在做一些简单的实验项目的时候&#xff0c;sqlite比较好用&#xff08;MacOS验视环境&#xff09;。相关包下载网页&#xff1a;https://www.sqlite.org/download.html 1 创建数据文件目录 cd /<project_path> mkdir database cd /database2 创建数据库 在当前目录&am…...

HTML5标签(网页编程)

一、常用标签 HTML5&#xff08;或HTML&#xff09;中有很多常用的标签&#xff0c;这些标签用于构建网页的结构和内容。以下是一些常用的HTML5标签&#xff1a; 1. 标题标签 <h1> 到 <h6>&#xff1a;定义六个级别的标题&#xff0c;<h1> 级别最高&#…...

RabbitMQ小记

参考书籍&#xff1a;朱忠华的《RabbitMQ实战指南》 一、基础概念 1.Exchange 1.1 创建方法的参数&#xff0c;exchangeDeclare() exchange&#xff1a;交换器的名称type&#xff1a;交换器的类型durable&#xff1a;是否持久化&#xff0c;true代表持久化。&#xff08;持…...

【备忘录】docker-maven-plugin 使用

在使用docker-maven-plugin 插件时&#xff0c;经常会碰到一些奇怪的问题&#xff1a; 比如&#xff1a; 1、docker远程访问时&#xff0c;认证安全问题&#xff1f; 2、dockerHost 访问地址准确性&#xff1f; 3、需要多个tag时如何处理&#xff1f; 4、push 到仓库时&#xf…...

一起学习python——基础篇(10)

前言&#xff0c;Python 是一种面向对象的编程语言。以前大学读书的时候经常开玩笑说的一句话“如果没有对象&#xff0c;就new一个”。起因就是编程老师上课时经常说一句“首先&#xff0c;我们new一个对象”。 今天讲一下python的类和对象。 类是什么&#xff1f;它是一种用…...

LoRa自组网络设计 6

1 深入了解LoRaWan 1.1 LoRaWan概述 LoRaWAN采用星型无线拓扑 End Nodes 节点 Gateway 网关 Network Server 网络服务器 Application Server 应用服务器 LoRa联盟是2015年3月Semtech牵头成立的一个开放的、非盈利的组织&#xff0c;发起成员还有法国Actility&#xff0c;中国…...

C++手撕红黑树

文章目录 红黑树概念性质&#xff08;条件限制&#xff09;节点的定义红黑树的结构红黑树的插入cur为红&#xff0c;p为红&#xff0c;g为黑&#xff0c;u存在且为红cur为红&#xff0c;p为红&#xff0c;g为黑&#xff0c;u不存在或u为黑&#xff0c;插入到p对应的一边cur为红…...

计算机中,逻辑端口

计算机中,端口是什么 在计算机领域中,端口(Port)是一个逻辑概念,用于标识计算机与外部设备或另一台计算机通信时的出入口。它是计算机与外部通信的途径,分为物理端口和逻辑端口两种。 物理端口:物理端口也被称为接口,是计算机主板上或其他设备上的硬件接口,如USB接口…...

SV学习笔记(一)

SV&#xff1a;SystemVerilog 开启SV之路 数据类型 內建数据类型 四状态与双状态 &#xff1a; 四状态指0、1、X、Z&#xff0c;包括logic、integer、 reg、 wire。双状态指0、1&#xff0c;包括bit、byte、 shortint、int、longint。 有符号与无符号 &#xff1a; 有符号&am…...

大型商业银行基础设施的用户安全管理创新与实践

文丨中国工商银行数据中心安全运营部 陈茜倩 王喆 基础设施的用户安全一直是银行业信息系统稳定运行的重要保障&#xff0c;近年来&#xff0c;随着科技的快速发展&#xff0c;针对用户安全的攻击方式层出不穷&#xff0c;如网络钓鱼、恶意软件、身份盗用、弱密码攻击、社交工程…...

数据库入门-----SQL基础知识

目录 &#x1f4d6;前言&#xff1a; &#x1f4d1;SQL概述&&通用语法&#xff1a; &#x1f433;DDL&#xff1a; &#x1f43b;操作数据库&#xff1a; &#x1f41e;数据类型&#xff1a; &#x1f989;操作表&#xff1a; &#x1f9a6;DML: 语法规则&#x…...

本地代码第一次提交到远程仓库gitee

1.在gitee新建仓库 2.新建一个空文件夹 打开黑窗口,执行命令 克隆仓库地址 执行命令 git clone https://gitee.com/llncomms/test.git打开隐藏的项目 复制全部内容到需要提交的代码中 3.在提交的代码中执行命令 $ git add .git commit -m 首次提交$ git push提交成功...

蓝桥杯刷题 深度优先搜索-[178]全球变暖(C++)

题目描述 你有一张某海域 NN 像素的照片&#xff0c;”.”表示海洋、”#”表示陆地&#xff0c;如下所示&#xff1a; … .##… .##… …##. …####. …###. … 其中”上下左右”四个方向上连在一起的一片陆地组成一座岛屿&#xff0c;例如上图就有 2 座岛屿。 由于全球变暖…...

C语言-函数指针-快速排序算法(书籍示例-入门)

概述 使用C语言&#xff0c;实现结构体多元素&#xff0c;排序算法&#xff08;冒泡排序&#xff09;&#xff0c;这里使用示例&#xff1a;书籍示例讲解 函数简介 函数声明 void qsort(void *base, size_t nitems, size_t size, int (*compar)(const void *, const void*)) 参…...

谷歌浏览器插件

项目中有时候会用到插件 sync-cookie-extension1.0.0&#xff1a;开发环境同步测试 cookie 至 localhost&#xff0c;便于本地请求服务携带 cookie 参考地址&#xff1a;https://juejin.cn/post/7139354571712757767 里面有源码下载下来&#xff0c;加在到扩展即可使用FeHelp…...

阿里云ACP云计算备考笔记 (5)——弹性伸缩

目录 第一章 概述 第二章 弹性伸缩简介 1、弹性伸缩 2、垂直伸缩 3、优势 4、应用场景 ① 无规律的业务量波动 ② 有规律的业务量波动 ③ 无明显业务量波动 ④ 混合型业务 ⑤ 消息通知 ⑥ 生命周期挂钩 ⑦ 自定义方式 ⑧ 滚的升级 5、使用限制 第三章 主要定义 …...

以下是对华为 HarmonyOS NETX 5属性动画(ArkTS)文档的结构化整理,通过层级标题、表格和代码块提升可读性:

一、属性动画概述NETX 作用&#xff1a;实现组件通用属性的渐变过渡效果&#xff0c;提升用户体验。支持属性&#xff1a;width、height、backgroundColor、opacity、scale、rotate、translate等。注意事项&#xff1a; 布局类属性&#xff08;如宽高&#xff09;变化时&#…...

基于ASP.NET+ SQL Server实现(Web)医院信息管理系统

医院信息管理系统 1. 课程设计内容 在 visual studio 2017 平台上&#xff0c;开发一个“医院信息管理系统”Web 程序。 2. 课程设计目的 综合运用 c#.net 知识&#xff0c;在 vs 2017 平台上&#xff0c;进行 ASP.NET 应用程序和简易网站的开发&#xff1b;初步熟悉开发一…...

Java 8 Stream API 入门到实践详解

一、告别 for 循环&#xff01; 传统痛点&#xff1a; Java 8 之前&#xff0c;集合操作离不开冗长的 for 循环和匿名类。例如&#xff0c;过滤列表中的偶数&#xff1a; List<Integer> list Arrays.asList(1, 2, 3, 4, 5); List<Integer> evens new ArrayList…...

基于uniapp+WebSocket实现聊天对话、消息监听、消息推送、聊天室等功能,多端兼容

基于 ​UniApp + WebSocket​实现多端兼容的实时通讯系统,涵盖WebSocket连接建立、消息收发机制、多端兼容性配置、消息实时监听等功能,适配​微信小程序、H5、Android、iOS等终端 目录 技术选型分析WebSocket协议优势UniApp跨平台特性WebSocket 基础实现连接管理消息收发连接…...

高频面试之3Zookeeper

高频面试之3Zookeeper 文章目录 高频面试之3Zookeeper3.1 常用命令3.2 选举机制3.3 Zookeeper符合法则中哪两个&#xff1f;3.4 Zookeeper脑裂3.5 Zookeeper用来干嘛了 3.1 常用命令 ls、get、create、delete、deleteall3.2 选举机制 半数机制&#xff08;过半机制&#xff0…...

CMake 从 GitHub 下载第三方库并使用

有时我们希望直接使用 GitHub 上的开源库,而不想手动下载、编译和安装。 可以利用 CMake 提供的 FetchContent 模块来实现自动下载、构建和链接第三方库。 FetchContent 命令官方文档✅ 示例代码 我们将以 fmt 这个流行的格式化库为例,演示如何: 使用 FetchContent 从 GitH…...

如何理解 IP 数据报中的 TTL?

目录 前言理解 前言 面试灵魂一问&#xff1a;说说对 IP 数据报中 TTL 的理解&#xff1f;我们都知道&#xff0c;IP 数据报由首部和数据两部分组成&#xff0c;首部又分为两部分&#xff1a;固定部分和可变部分&#xff0c;共占 20 字节&#xff0c;而即将讨论的 TTL 就位于首…...

rnn判断string中第一次出现a的下标

# coding:utf8 import torch import torch.nn as nn import numpy as np import random import json""" 基于pytorch的网络编写 实现一个RNN网络完成多分类任务 判断字符 a 第一次出现在字符串中的位置 """class TorchModel(nn.Module):def __in…...