Python中Scrapy基本语法(语法，选择器)

在新建的爬虫文件中，通常需要根据爬虫选择不同的方式来取得我们所需要的值

一、返回值为json

当返回值为json时，可以在爬虫文件中使用以下方法进行

def parse(self, response):
    items = Item()
    res = json.loads(response.text)
    items['code'] = res['code']
    items['data'] = res['data']
    print(res, items['data'])
    yield items

二、返回值为html

1. XPATH选择器

当返回值为html时，可以使用XPATH方式

def parse(self, response):
    items = Item()
    title = response.xpath('/html/head/title/text()')
    items['title'] = res['title']
    print(title)
    yield items

操作	说明
nodeName	选取此节点的所有节点
/	从根节点选取
//	从匹配选择的当前节点选择文档中的节点，不考虑它们的位置
.	选择当前节点
..	选取当前节点的父节点
@	选取属性
*	匹配任何元素节点
@*	匹配任何属性节点
Node()	匹配任何类型的节点

2. CSS选择器

def parse(self, response):
    items = Item()
    title = response.css('title::text')
    items['title'] = res['title']
    print(title)
    yield items

具体方法可参考Scrapy官方中文文档

添加微信

一、 返回值为json

二、 返回值为html

1. XPATH选择器

2. CSS选择器

一、返回值为json

二、返回值为html