2 Star 17 Fork 4

MARK LIU / paper_checking_system

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
GPL-2.0

简体中文论文查重系统

衍生软件及SDK

目前团队开发了JAVA版查重SDK,可深度集成到您的私有项目中
使用示例:https://github.com/tianlian0/duplicate-check-sample
目前已有多个商用查重系统基于此SDK开发上线。欢迎大家试用、反馈,希望它能帮助大家开发出自己应用场景下的查重系统。
基于本项目,我们开发了付费衍生产品“标书查重”,标书查重软件基于标书查重场景提供更高的可靠性及相关技术支持
试用链接:https://xincheck.com/?id=20

安装使用教程

1、clone源代码
2、使用vs打开、编译(vs需安装.NET开发组件)
3、运行paper_checking.exe文件即可
兼容性要求:
windows 7及以上,可用内存1.5GB以上。开发工具:vs2017及以上,需安装vc2015运行库和.NET Framework4.6。其它版本需自行测试。
报错排除:
1、如果Spire报错,可能是因为项目刚拉下来相关第三方包还未下载完毕,可等待vs自动下载或打开NuGet手动下载缺少的包。
2、本项目已不提供32位操作系统支持,如您使用32位操作系统将无法使用本系统。
3、为什么查重的结果永远是0?①这个系统是使用本地比对库进行纵向查重的,查重之前需要先添加本地比对库,不添加结果肯定是0。各位正在写论文或者作业的同学们,想给自己的论文查重请自行去各大查重平台购买,这个系统不适用没有本地比对库的用户。②添加比对库的时候必须使用比对库管理选项卡下的“添加到比对库”按钮进行添加,自己向文件夹中直接复制文件是不生效的,相当于没有添加比对库。

项目中包含了很多二进制DLL文件的说明

项目中使用了第三方开源组件PDFBOX和IKVM:
IKVM已于2017年停止维护,现在使用NuGet可以引用的是5年前的IKVM,存在一些BUG,因此本项目没有通过NuGet直接引用IKVM,而是对IKVM的一些bug进行了修复以后,编译成了DLL文件提供到本项目中,这样大家就不用自己编译了,我修改后的IKVM的源码在我的GitHub主页也可以看到,需要源码的同学可以自行Clone。
PdfBox只有Java版,C#中想要使用需要转成DLL,因此我是直接转好以后把二进制放到项目中的。有代码洁癖的用户可以自行去PdfBox官网下载jar包进行转换。

项目介绍与功能说明

该系统目前支持对简体中文的文件进行横向查重和纵向查重。
两个核心功能点说明如下:
1、纵向查重
选择一批待查文件后,将该批文件和比对库中的文件比对。通常用以检查该批文件有没有复制比对库中的文本。
2、横向查重
选择一批待查文件后,在该批次文件之间进行比对,用以检查该批次文件有没有互相复制的情况。这个功能是目前主流查重平台支持度较差的。

软件中涉及的其它功能点及名词说明如下:
1、添加到比对库
将文件添加到比对库中。这个功能是服务于纵向查重的,将文件通过这个功能添加到比对库中,在使用纵向查重功能的就可与比对库中的文件进行比对。需要选中存放文件的文件夹,然后点击“添加到比对库”按钮,按钮会变灰,等待按钮恢复正常即添加完毕。该过程需要耗费一定时间。
2、查重阈值的设置
该值的设置决定了待查文件连续多少个字与其它文件相同即判定为抄袭。推荐值为10至16,用户亦可根据实际情况自行设定为1至99之间的任意整数值。
3、保存查重报告的文件夹
选择一个文件夹,待查重完毕后查重报告将会输出到这个文件夹中。
4、生成统计表
查重结束后是否生成csv格式的统计表。
5、中断恢复
选中复选框,软件将不会清除上一次查重的结果继续查重。该复选框适用于应用程序中途意外退出,想从退出时的进度继续查重。
6、关键词过滤
将一些可能影响重复率的关键词添加到关键词过滤功能中,在查重时会将待查文本中出现的关键词全部删除,以避免它们对重复率的影响。这类关键词通常可能为学校名、机构名等。
7、查重进程数
该值的设置影响查重速度,默认为当前机器CPU逻辑核心数减2。
8、格式转换线程数
文件在查重前会进行格式转换,该值的设置影响文件格式转换的速度,默认为当前机器CPU逻辑核心数减2。

注意事项:
如果待查文件在比对库中存在,文件名相同会自动跳过比对,若文件名不相同,则会导致查重重复率高于90%的情况(相当于是两篇相同的文本进行比对,至于为什么不是100%后续有说明)。

查重原理说明

每两篇文件之间比对连续相同的字符串,超过查重阈值即认为这些字是重复的。但是,如果单篇文本重复率在0.25%以下或重复字数在30字以下,将不判定为重复。同时,如果一篇文本从其它文本复制了某句话很多次,只认为其中的一次为重复。
文件格式转换使用了pdfbox和spire word free。
文本格式化时会去掉文本中的摘要、目录、参考文献及非中文字符。
生成的查重报告为rtf格式。

已知bug

极少量正常的pdf、word文件转换失败,如您发现此类文件,希望您通过issuse进行反馈并提供下载链接,十分感谢。

其它说明

1、现在我们已经开发了一套全新的支持多语言的web版查重系统,暂无开源计划,如有相关需求可以通过[商业合作]联系进行试用/定制开发。该系统中使用的核心查重模块已以SDK的形式开放使用(https://xincheck.com/?id=16 )。
2、该开源版本为一个简化版本,拥有查重所需的所有核心功能并输出一个较为简易的查重报告,有其它定制需求见[商业合作]。

项目截图

image
image
image
项目运行演示视频:链接:https://pan.baidu.com/s/1VM9g4CT4nAwlZXOHePkoXQ 提取码:t059
PS:截图的查重报告为商用版,开源版的报告内容没这个丰富。商用版和开源版的安装包均可在右侧的Release中下载。

反馈交流群&捐助

image
欢迎各位免费、付费用户进入反馈交流QQ群:778041438;欢迎捐助本项目。

商业合作

本项目遵守GPL2协议,同时本项目已申请三项软件著作权
本项目可以提供c#/java版本的技术支持。欢迎各企业、高校、机构合作。商业合作专用微信:Newton_USA,QQ:415310449,电话:18910760331。添加好友烦请备注公司名称;个人项目合作可备注您的姓氏。非商业合作以及无购买需求的用户可加入上方的反馈交流QQ群。
可合作应用场景:高校论文查重、标书查重、辅助防串标、项目申报书查重、企业内部文档查重、舆情数据去重、学生作业查重等

 GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Lesser General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does. 1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions: a) You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change. b) You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License. c) If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.) These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it. Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program. In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License. 3. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following: a) Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, b) Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or, c) Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.) The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable. If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code. 4. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance. 5. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it. 6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License. 7. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program. If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances. It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice. This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License. 8. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License. 9. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation. 10. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally. NO WARRANTY 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms. To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found. <one line to give the program's name and a brief idea of what it does.> Copyright (C) <year> <name of author> This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. Also add information on how to contact you by electronic and paper mail. If the program is interactive, make it output a short notice like this when it starts in an interactive mode: Gnomovision version 69, Copyright (C) year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. This is free software, and you are welcome to redistribute it under certain conditions; type `show c' for details. The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program. You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names: Yoyodyne, Inc., hereby disclaims all copyright interest in the program `Gnomovision' (which makes passes at compilers) written by James Hacker. <signature of Ty Coon>, 1 April 1989 Ty Coon, President of Vice This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Lesser General Public License instead of this License. ----------------------------------------------------------------------------- NOTICE Copyright (C) 2018-2020 mark liu This software is provided 'as-is', without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software. Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter it and redistribute it freely, subject to the following restrictions: 1. The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required. 2. Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software. 3. This notice may not be removed or altered from any source distribution. mark liu liu13269455505@126.com

简介

基于C#和C++开发的文本查重/论文查重系统,一亿字次级论文库秒级查重。关联:查重算法、数据去重、文档查重、文本去重、标书查重、辅助防串标。该项目从本人的github仓库同步,可能会出现更新不及时的情况,可访问原地址:https://github.com/tianlian0/paper_checking_system 展开 收起
C#
GPL-2.0
取消

贡献者

全部

近期动态

加载更多
不能加载更多了
C#
1
https://gitee.com/mark_liu_admin/paper_checking_system.git
git@gitee.com:mark_liu_admin/paper_checking_system.git
mark_liu_admin
paper_checking_system
paper_checking_system
master

搜索帮助

53164aa7 5694891 3bd8fe86 5694891