mitmproxy 注入js代碼解決 chrome chromium 被 window.navigator.webdriver 反爬識別

作者: 魯智深 分類: Python 發布時間: 2020-01-21 16:37

有界面模式下

其實用 Selenium Webdriver chromedriver 做爬蟲也不是無敵的,同樣會被反爬。

在運行模擬瀏覽器中輸入以下js代碼

1
window.navigator.webdriver

發現是 true

window.navigator.webdriver

window.navigator.webdriver

正常打開的瀏覽器返回是 undefined

undefined

undefined

熟悉 js 的程序員就很容易判斷是否正常打開的瀏覽器

解決方案:

在啟動Chromedriver之前,為Chrome開啟開發者模式:

1
option.add_experimental_option('excludeSwitches', ['enable-automation'])

再次在開發者工具的Console選項卡中查詢window.navigator.webdriver,可以發現這個值已經自動變成 undefined 了。

undefined

undefined

這里要注意:開啟了開發者模式后一定要測試值是否已經自動變成 undefined ,Chrome 79版本會出現開發者模式 無法返回 true的情況

———ChromeDriver 79.0.3945.36 (2019-11-18)———
Supports Chrome version 79
Resolved issue 2117: Chromedriver locks when an alert()(js) is raised while taking a screenshot [Pri-2]
Resolved issue 2435: Chrome driver reports platform and platformName as XP on Win10 machine [Pri-2]
Resolved issue 2487: “Element is not clickable” when using headless [Pri-]
Resolved issue 3005: WPT test in element_clear “test_not_editable_inputs[hidden]” does not pass [Pri-3]
Resolved issue 3073: Alerts coming from backend response cause ChromeDriver in W3C mode disconnect from browser – unable to interact with Chrome anymore – java.net.ConnectException: Failed to connect to localhost/0:0:0:0:0:0:0:1:38699 [Pri-2]
Resolved issue 3133: window.navigator.webdriver is undefined when “enable-automation” is excluded in non-headless mode (should be true) [Pri-2]
Resolved issue 3148: ChromeDriver always ignores certificate errors [Pri-2]
Resolved issue 3205: Chrome driver 78 moveToElement action sometimes moves to wrong y coordinate [Pri-1]

無界面模式下

但是在無界面模式下,這么配置是無效的,我這里用到的是 mitmproxy 進行js代碼注入

安裝 mitmproxy

https://github.com/luzhisheng/mklearn/tree/master/spider/crawler_learn

modify_response.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import mitmproxy.http

t0 = 'Object.defineProperties(navigator,{webdriver:{get:() => false}});'

class Tb(object):
    def response(slef,flow: mitmproxy.http.HTTPFlow):
        if '.js' in flow.request.url or 'um.js' in flow.request.url:
                flow.response.text = t0 + flow.response.text
                print('注入成功')


addons = [
    Tb()
]

啟動 mitmdump

1
mitmdump -p 7777 -s modify_response.py

配置代理 selenium

1
self.chromeOptions.add_argument("--proxy-server=http://127.0.0.1:7777")

運行腳本 test.py

如果覺得我的文章對您有用,請隨意打賞。您的支持將鼓勵我繼續創作!

發表評論

電子郵件地址不會被公開。 必填項已用*標注

中了亿元大奖