ChunJun is a distributed integration framework, and currently is based on Apache Flink. It was initially known as FlinkX and renamed ChunJun on February 22, 2022. It can realize data synchronization and calculation between various heterogeneous data sources. ChunJun has been deployed and running stably in thousands of companies so far.
Official website of ChunJun: https://dtstack.github.io/chunjun/
ChunJun abstracts different databases into reader/source plugins, writer/sink plugins and lookup plugins, and it has the following features:
Use the git to clone the code of ChunJun
git clone https://github.com/DTStack/chunjun.git
Execute the command in the project directory.
./mvnw clean package
Or execute
sh build/build.sh
Error message:
[ERROR]Failed to execute goal com.diffplug.spotless:spotless-maven-plugin:2.4.2:check(spotless-check)on project chunjun-core:
Execution spotless-check of goal com.diffplug.spotless:spotless-maven-plugin:2.4.2:check failed:Unable to resolve dependencies:
Failed to collect dependencies at com.google.googlejavaformat:google-java-format:jar:1.7->com.google.errorprone:javac-shaded:jar:9+181-r4173-1:
Failed to read artifact descriptor for com.google.errorprone:javac-shaded:jar:9+181-r4173-1:Could not transfer artifact
com.google.errorprone:javac-shaded:pom:9+181-r4173-1 from/to aliyunmaven(https://maven.aliyun.com/repository/public):
Access denied to:https://maven.aliyun.com/repository/public/com/google/errorprone/javac-shaded/9+181-r4173-1/javac-shaded-9+181-r4173-1.pom -> [Help 1]
Solution: Download the 'javac-shaded-9+181-r4173-1.jar' from url 'https://repo1.maven.org/maven2/com/google/errorprone/javac-shaded/9+181-r4173-1/javac-shaded-9+181-r4173-1.jar', and then install locally by using command below:
mvn install:install-file -DgroupId=com.google.errorprone -DartifactId=javac-shaded -Dversion=9+181-r4173-1 -Dpackaging=jar -Dfile=./jars/javac-shaded-9+181-r4173-1.jar
The following table shows the correspondence between the branches of ChunJun and the version of flink. If the versions are not aligned, problems such as 'Serialization Exceptions', 'NoSuchMethod Exception', etc. mysql occur in tasks.
Branches | Flink version |
---|---|
master | 1.16.1 |
1.12_release | 1.12.7 |
1.10_release | 1.10.1 |
1.8_release | 1.8.3 |
ChunJun supports running tasks in multiple modes. Different modes depend on different environments and steps. The following are
Local mode does not depend on the Flink environment and Hadoop environment, and starts a JVM process in the local environment to perform tasks.
Go to the directory of 'chunjun-dist' and execute the command below:
sh bin/chunjun-local.sh -job $SCRIPT_PATH
The parameter of "$SCRIPT_PATH" means 'the path where the task script is located'. After execute, you can perform a task locally.
note:
when you package in windows and run sh in linux , you need to execute command sed -i "s/\r//g" bin/*.sh to fix the '\r' problems.
Standalone mode depend on the Flink Standalone environment and does not depend on the Hadoop environment.
Find directory of jars: if you build this project using maven, the directory name is 'chunjun-dist' ; if you download tar.gz file from release page, after decompression, the directory name would be like 'chunjun-assembly-${revision}-chunjun-dist'.
Copy jars to directory of Flink lib, command example:
cp -r chunjun-dist $FLINK_HOME/lib
Notice: this operation should be executed in all machines of Flink cluster, otherwise some jobs will fail because of ClassNotFoundException.
sh $FLINK_HOME/bin/start-cluster.sh
After the startup is successful, the default port of Flink Web is 8081, which you can configure in the file of 'flink-conf.yaml'. We can access the 8081 port of the current machine to enter the flink web of standalone cluster.
Go to the directory of 'chunjun-dist' and execute the command below:
sh bin/chunjun-standalone.sh -job chunjun-examples/json/stream/stream.json
After the command execute successfully, you can observe the task staus on the flink web.
YarnSession mode depends on the Flink jars and Hadoop environments, and the yarn-session needs to be started before the task is submitted.
Yarn-session mode depend on Flink and Hadoop environment. You need to set $HADOOP_HOME and $FLINK_HOME in advance, and we need to upload 'chunjun-dist' with yarn-session '-t' parameter.
cd $FLINK_HOME/bin
./yarn-session -t $CHUNJUN_HOME -d
Get the application id $SESSION_APPLICATION_ID corresponding to the yarn-session through yarn web, then enter the directory 'chunjun-dist' and execute the command below:
sh ./bin/chunjun-yarn-session.sh -job chunjun-examples/json/stream/stream.json -confProp {\"yarn.application.id\":\"SESSION_APPLICATION_ID\"}
'yarn.application.id' can also be set in 'flink-conf.yaml'. After the submission is successful, the task status can be observed on the yarn web.
Yarn Per-Job mode depend on Flink and Hadoop environment. You need to set $HADOOP_HOME and $FLINK_HOME in advance.
The yarn per-job task can be submitted after the configuration is correct. Then enter the directory 'chunjun-dist' and execute the command below:
sh ./bin/chunjun-yarn-perjob.sh -job chunjun-examples/json/stream/stream.json
After the submission is successful, the task status can be observed on the yarn web.
For details, please visit:https://dtstack.github.io/chunjun/documents/
Thanks to all contributors! We are very happy that you can contribute Chunjun.
ChunJun is under the Apache 2.0 license. Please visit LICENSE for details.
Join ChunJun Slack. https://join.slack.com/t/chunjun/shared_invite/zt-1hzmvh0o3-qZ726NXmhClmLFRMpEDHYw
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
Activity
Community
Health
Trend
Influence
:Code submit frequency
:React/respond to issue & PR etc.
:Well-balanced team members and collaboration
:Recent popularity of project
:Star counts, download counts etc.