爬虫系列（5）更简便Requests请求库使用介绍

新闻公告

< 返回新闻公共列表

爬虫系列（5）更简便Requests请求库使用介绍

发布时间：2022-06-02 00:50:58

先介绍一个网站“Requests：让HTTP服务人类”
http://cn.python-requests.org/zh_CN/latest/

1. 介绍

对了解一些爬虫的基本理念，掌握爬虫爬取的流程有所帮助。入门之后，我们就需要学习一些更加高级的内容和工具来方便我们的爬取。那么这一节来简单介绍一下 requests 库的基本用法

2. 安装

利用 pip 安装

pip install requests

复制

3. 基本请求

req = requests.get("http://www.baidu.com")req = requests.post("http://www.baidu.com")req = requests.put("http://www.baidu.com")req = requests.delete("http://www.baidu.com")req = requests.head("http://www.baidu.com")req = requests.options("http://www.baidu.com")

复制

3.1 get请求

参数是字典，我们也可以传递json类型的参数：

import requests



url = "http://www.baidu.com/s"params = {'wd': 'Python爬虫'}response = requests.get(url, params=params)print(response.url)response.encoding = 'utf-8'html = response.text

# print(html)

复制

3.2 post请求

参数是字典，我们也可以传递json类型的参数：

url = "http://www.sxt.cn/index/login/login.html"formdata = {

 "user": "17703181473",

 "password": "123456"}response = requests.post(url, data=formdata)response.encoding = 'utf-8'html = response.text

# print(html)

复制

3.3 自定义请求头部

伪装请求头部是采集时经常用的，我们可以用这个方法来隐藏：

headers = {'User-Agent': 'python'}r = requests.get('http://www.zhidaow.com', headers = headers)print(r.request.headers['User-Agent'])

复制

3.4 设置超时时间

可以通过timeout属性设置超时时间，一旦超过这个时间还没获得响应内容，就会提示错误。

requests.get('http://github.com', timeout=0.001)

复制

3.5 代理访问

采集时为避免被封IP，经常会使用代理。requests也有相应的proxies属性：

import requests



proxies = {

 "http": "http://10.10.1.10:3128",

 "https": "https://10.10.1.10:1080",}requests.get("http://www.zhidaow.com", proxies=proxies)

复制

如果代理需要账户和密码，则需这样：

proxies = {

 "http": "http://user:pass@10.10.1.10:3128/",}

复制

3.6 session自动保存cookies

seesion的意思是保持一个会话，比如登陆后继续操作(记录身份信息) 而requests是单次请求的请求，身份信息不会被记录。

# 创建一个session对象 

s = requests.Session() # 用session对象发出get请求，设置cookies 

s.get('http://httpbin.org/cookies/set/sessioncookie/123456789')

复制

3.7 ssl验证

# 禁用安全请求警告

requests.packages.urllib3.disable_warnings()resp = requests.get(url, verify=False, headers=headers)

复制

4 获取响应信息

代码
含义
resp.json()
获取响应内容（以json字符串）
resp.text
获取响应内容
 (以字符串)
resp.content
获取响应内容（以字节的方式）
resp.headers
获取响应头内容
resp.url
获取访问地址
resp.encoding
获取网页编码
resp.request.headers
请求头内容
resp.cookie
获取cookie

关于我们

新闻公告

爬虫系列（5）更简便Requests请求库使用介绍

壹站邦云服务中心

行业解决方案

帮助与支持

代理产品

管理控制中心

代码	含义
resp.json()	获取响应内容（以json字符串）
resp.text	获取响应内容 (以字符串)
resp.content	获取响应内容（以字节的方式）
resp.headers	获取响应头内容
resp.url	获取访问地址
resp.encoding	获取网页编码
resp.request.headers	请求头内容
resp.cookie	获取cookie