Create your Gitee Account
Explore and code with more than 6 million developers,Free private repositories !:)
Sign up
This repository doesn't specify license. Without author's permission, this code is only for learning and cannot be used for other purposes.
Clone or download
Cancel
Notice: Creating folder will generate an empty file .keep, because not support in Git
Loading...
README.md

在我的博客里有代码的详细解读:我用python爬了知乎一百万用户的数据

这是一个多线程抓取知乎用户的程序

Requirements

需要用到的包: beautifulsoup4 html5lib image requests redis PyMySQL

pip安装所有依赖包:

pip install \
Image \
requests \
beautifulsoup4 \
html5lib \
redis \
PyMySQL

运行环境需要支持中文

测试运行环境python3.5,不保证其他运行环境能完美运行

需要安装mysql和redis

配置config.ini文件,设置好mysql和redis,并且填写你的知乎帐号

向数据库导入init.sql

Run

开始抓取数据:python get_user.py 查看抓取数量:python check_redis.py

效果

效果图1 效果图2

Docker

嫌麻烦的可以参考一下我用docker简单的搭建一个基础环境: mysql和redis都是官方镜像

docker run --name mysql -itd mysql:latest
docker run --name redis -itd mysql:latest

再利用docker-compose运行python镜像,我的python的docker-compose.yml:

python:
    container_name: python
    build: .
    ports:
      - "84:80"
    external_links:
      - memcache:memcache
      - mysql:mysql
      - redis:redis
    volumes:
      - /docker_containers/python/www:/var/www/html
    tty: true
    stdin_open: true
    extra_hosts:
      - "python:192.168.102.140"
    environment:
        PYTHONIOENCODING: utf-8

我的Dockerfile:

From kong36088/zhihu-spider:latest

Comments ( 0 )

Sign in for post a comment

About

多线程知乎用户爬虫,基于python3 spread retract
Cancel

Releases

No release

Contributors

All

Activities

load more
can not load any more