同步操作将从 朱平齐/RuiJi.Net 强制同步,此操作会覆盖自 Fork 仓库以来所做的任何修改,且无法恢复!!!
确定后同步将在后台操作,完成时将刷新页面,请耐心等待。
RuiJi.Net is a distributed crawl framework written in c#.
RuiJi.Net is a self host webapi written using Microsoft.Owin. Major features include distribute crawler, distribute extracter and managed cookie.
RuiJi.Net support ip polling that using the server public network address and proxy server.
Building
http://www.ruijihg.com/archives/ruijinet/getting-started
The project is under development.
Feature | Support |
---|---|
webheader | custom |
method | get/post |
auto redirection | support |
cookie | managed/custom |
service point ip | auto/custom Bind |
encoding | auto detect/by specify |
response | raw/string |
proxy | future additions |
Feature | Support |
---|---|
selector | css/xpath/regex/json/text range/exclude text/clear |
extrac structure | block/tile/meta |
jsonconvert | extractblock |
var crawler = new IPCrawler();
var request = new Request("http://www.ruijihg.com/%e5%bc%80%e5%8f%91/");
var response = crawler.Request(request);
var content = response.Data.ToString();
var block = new ExtractBlock();
block.Selectors = new List<ISelector>
{
new CssSelector(".entry-content",CssTypeEnum.InnerHtml)
};
block.TileSelector = new ExtractTile
{
Selectors = new List<ISelector>
{
new CssSelector(".pt-cv-content-item",CssTypeEnum.InnerHtml)
}
};
block.TileSelector.Metas.AddMeta("title",new List<ISelector> {
new CssSelector(".pt-cv-title")
});
block.TileSelector.Metas.AddMeta("url", new List<ISelector> {
new CssSelector(".pt-cv-readmore","href")
});
var ext = new RuiJiExtracter();
var r = ext.Extract(content, block);
downloaded ZooKeeper from Apache mirrors http://mirrors.hust.edu.cn/apache/zookeeper/zookeeper-3.4.12/
Add the same file as zoo_sample.cfg in folder conf and rename it to zoo.cfg. and change dataDir with your
Please confirm whether the Java runtime environment is installed
run bin/zkServer.cmd in you zookeepr folder
run RuiJi.cmd.exe
if You see the following information
Server Start At http://x.x.x.x:x
proxy x.x.x.x:x ready to startup!
try connect to zookeeper server : x.x.x.x:2181
zookeeper server connected!
the service startup is complete!
Common.StartupNodes();
var request = new Request("http://www.ruijihg.com/%e5%bc%80%e5%8f%91/");
var response = Crawler.Request(request);
if (response.StatusCode != System.Net.HttpStatusCode.OK)
return;
var content = response.Data.ToString();
var block = new ExtractBlock();
block.Selectors = new List<ISelector>
{
new CssSelector(".entry-content",CssTypeEnum.InnerHtml)
};
block.TileSelector = new ExtractTile
{
Selectors = new List<ISelector>
{
new CssSelector(".pt-cv-content-item",CssTypeEnum.InnerHtml)
}
};
block.TileSelector.Metas.AddMeta("title", new List<ISelector> {
new CssSelector(".pt-cv-title")
});
block.TileSelector.Metas.AddMeta("url", new List<ISelector> {
new CssSelector(".pt-cv-readmore","href")
});
var r = Extracter.Extract(new ExtractRequest {
Block = block,
Content = content
});
Please contact me with any suggestion
my website : www.ruijihg.com
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。