596 Star 1.8K Fork 659

GVP自风 / Spiderman2

 / 详情

如何保存某个xpath下包含html标签的内容

Done
Opened this issue  
2017-02-05 20:27

比如model下的某个field,我想保存这个field的所有内容包括html标签,怎么做?

Comments (3)

cgnq created 任务

设置field的属性[isSerialize=true]即可

不过有一点要注意,field的xpath写到节点即可,不要用/text() ,也不要设置attr。我举个栗子给你:

1. 若使用代码来设置采集规则:
model.addField("xml")
    .set("isSerialize", true)
    .set("xpath", "//div[@class='head-info-list']");
2. 若使用配置文件来设置采集规则:
<field name="xml" isSerialize="true" xpath="//div[@class='head-info-list']" />

我这里跑了一个QFang网的例子,输出结果如下:

<ul>
  <li class="head-info-item clearfix">
    <span class="field fl">建筑年代</span>  
    <p class="place-area clearfix">
      <span class="link">2010年</span>
    </p>
  </li>  
  <li class="head-info-item clearfix">
    <span class="field fl">
      <em>停</em>
      <em>车</em>
      <em>位</em>
    </span>  
    <p class="counterpart-schools clearfix">2700个</p>
  </li>  
  <li class="head-info-item clearfix">
    <span class="field fl">停车费用</span>  
    <p class="fl">210.0</p>
  </li>  
  <li class="head-info-item clearfix">
    <span class="field fl">
      <em>容</em>
      <em>积</em>
      <em>率</em>
    </span>  
    <p class="fl">2.55</p>
  </li>  
  <li class="head-info-item clearfix">
    <span class="field fl">
      <em>绿</em>
      <em>化</em>
      <em>率</em>
    </span>  
    <p class="fl">32%</p>
  </li>  
  <li class="head-info-item clearfix">
    <span class="field fl">
      <em>物</em>
      <em>业</em>
      <em>费</em>
    </span>  
    <p class="fl">1.44元/平米・月</p>
  </li>  
  <li class="head-info-item clearfix">
    <span class="field fl">物业公司</span>  
    <p class="fl">保利广州物业管理有限公司</p>
  </li>  
  <li class="head-info-item clearfix">
    <span class="field fl">开 发 商</span>  
    <p class="fl">浙江保利房地产开发有限公司</p>
  </li>
</ul>
自风 added label question

感谢,很好用!加上UI和文档就更完美

cgnq closed 任务

Sign in to comment

Status
Assignees
Milestones
Pull Requests
Successfully merging a pull request will close this issue.
Branches
Planed to start   -   Planed to end
-
Top level
Priority
参与者(2)
117 l weiwei 1578913730
Java
1
https://toscode.gitee.com/l-weiwei/Spiderman2.git
git@toscode.gitee.com:l-weiwei/Spiderman2.git
l-weiwei
Spiderman2
Spiderman2

Search