1. 為什么選擇 PHP 做 Amazon 爬蟲?
表格

一句話:如果你本來就在寫 Laravel,用 PHP 寫爬蟲等于「順路」。
2. Amazon 頁(yè)面結(jié)構(gòu) 60 秒速覽(2025-06 最新)
以 https://www.amazon.com/dp/B08N5WRWNW 為例:
表格

結(jié)論:95% 字段靜態(tài)直取,無需上重型瀏覽器。
3. 環(huán)境準(zhǔn)備:30 秒搭好最小可用環(huán)境
bash
# 創(chuàng)建項(xiàng)目 mkdir amz-php-crawler && cd amz-php-crawler composer init --name="demo/amz-crawler" -s dev # 安裝依賴 composer require guzzlehttp/guzzle symfony/dom-crawler symfony/css-selector fakerphp/faker
PHP ≥ 8.0 即可,Guzzle 7.x 自帶 PSR-18,后續(xù)想接 Hyperf 也很方便。
4. 核心流程:從 ASIN → 結(jié)構(gòu)化數(shù)組
ASIN → 下載詳情頁(yè) → 解析靜態(tài)字段 → 調(diào) AJAX 價(jià)格/庫(kù)存 → 清洗 → CSV/MySQL/Kafka
5. 代碼實(shí)戰(zhàn):Guzzle + DOMXPath 極速版
php
?php
// src/AmzSpider.php
namespace Demo;
use GuzzleHttpClient;
use SymfonyComponentDomCrawlerCrawler;
class AmzSpider
{
private Client $client;
public function __construct()
{
$this-?>client = new Client([
'timeout' => 10,
'headers' => [
'User-Agent' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
'Accept-Language' => 'en-US,en;q=0.9',
'Referer' => 'https://www.amazon.com/',
],
]);
}
public function fetchProduct(string $asin): array
{
$url = "https://www.amazon.com/dp/{$asin}";
$html = $this->client->get($url)->getBody()->getContents();
$crawler = new Crawler($html);
$title = $crawler->filter('#productTitle')->text('');
$priceWhole = $crawler->filter('.a-price .a-price-whole')->text('');
$priceFrac = $crawler->filter('.a-price .a-price-fraction')->text('');
$price = trim($priceWhole . '.' . $priceFrac);
$rating = $crawler->filter('#acrPopover')->attr('title');
$rating = $rating ? explode(' ', $rating)[0] : '';
$reviewText = $crawler->filter('#acrCustomerReviewText')->text('');
$reviewCount = filter_var($reviewText, FILTER_SANITIZE_NUMBER_INT);
$imgJson = $crawler->filter('#imgTagWrapperId img')->attr('data-a-dynamic-image');
$imgMap = json_decode($imgJson, true);
$mainImg = $imgMap ? array_key_first($imgMap) : '';
return [
'asin' => $asin,
'title' => trim($title),
'price' => $price,
'rating' => $rating,
'review_count' => (int) $reviewCount,
'main_img' => $mainImg,
'crawled_at' => date('Y-m-d H:i:s'),
];
}
}
CLI 入口文件:
php
#!/usr/bin/env php ?php require __DIR__ . '/vendor/autoload.php'; use DemoAmzSpider; $asin = $argv[1] ?? 'B08N5WRWNW'; $spider = new AmzSpider(); $data = $spider-?>fetchProduct($asin); echo json_encode($data, JSON_UNESCAPED_UNICODE | JSON_PRETTY_PRINT);
運(yùn)行:
bash
php bin/amz.php B08N5WRWNW
輸出示例:
JSON
{
"asin": "B08N5WRWNW",
"title": "Apple AirPods Pro",
"price": "249.00",
"rating": "4.6",
"review_count": 25430,
"main_img": "https://images-na.ssl-images-amazon.com/images/I/71zny7BTRlL._AC_SL1500_.jpg",
"crawled_at": "2025-06-24 18:42:12"
}
6. 反爬三板斧:UA 池、代理池、隨機(jī)延時(shí)
表格

Guzzle 中間件示例:
php
$stack = GuzzleHttpHandlerStack::create();
$stack->push(GuzzleHttpMiddleware::retry(function ($retries, $request, $response, $exception) {
return $retries < 3 && ($exception || $response-?>getStatusCode() === 403);
}, function ($retries) {
return 1000 * (2 ** $retries);
}));
7. Selenium 兜底:滑塊驗(yàn)證碼與動(dòng)態(tài)渲染
當(dāng) Amazon 出現(xiàn)滑塊時(shí),可用 php-webdriver 驅(qū)動(dòng) Chrome:
bash
composer require php-webdriver/webdriver
php
$host = 'http://localhost:4444/wd/hub'; // Selenium Standalone $driver = RemoteWebDriver::create($host, DesiredCapabilities::chrome()); $driver->get("https://www.amazon.com/dp/$asin"); sleep(5); $html = $driver->getPageSource(); $driver->quit();
技巧:
加載 stealth.min.js 隱藏 WebDriver 特征;
使用 undetected-chromedriver 可過新版滑塊。
8. 提速 10 倍:多進(jìn)程 + Swoole 協(xié)程
bash
composer require swoole/swoole
php
use SwooleRuntime;
use SwooleCoroutine;
use function SwooleCoroutinerun;
Runtime::enableCoroutine();
run(function () {
$asins = ['B08N5WRWNW', 'B08L8DKCS1'];
foreach ($asins as $asin) {
Coroutine::create(function () use ($asin) {
$spider = new AmzSpider();
$data = $spider->fetchProduct($asin);
file_put_contents("data/{$asin}.json", json_encode($data));
});
}
});
實(shí)測(cè):4 核 8G 機(jī)器,協(xié)程版 2k SKU/min,CPU 占用 45%。
9. 數(shù)據(jù)落地:CSV、MySQL、Kafka 一鍵切換
① CSV(快速驗(yàn)證)
php
$fp = fopen('amz.csv', 'a');
fputcsv($fp, array_keys($data));
fputcsv($fp, $data);
fclose($fp);
② MySQL(生產(chǎn))
php
$pdo = new PDO('mysql:host=localhost;dbname=amz', 'root', 'root');
$stmt = $pdo->prepare("REPLACE INTO product (asin,title,price,rating,review_count,main_img,crawled_at) VALUES (?,?,?,?,?,?,?)");
$stmt->execute([$data['asin'], $data['title'], $data['price'], $data['rating'], $data['review_count'], $data['main_img'], $data['crawled_at']]);
③ Kafka(實(shí)時(shí)流)
php
$producer = new KafkaProducer(function() {
return [KafkaProducerConfig::BOOTSTRAP_SERVERS => 'localhost:9092'];
});
$producer->send([
['topic' => 'amz-product', 'value' => json_encode($data), 'key' => $data['asin']]
]);
10. 合規(guī)紅線:Amazon 爬蟲的法律底線
表格

官方替代方案:Amazon Product Advertising API(PA-API 5.0)
穩(wěn)定、合規(guī)、無封 IP 風(fēng)險(xiǎn)
需 Associate Tag + 授權(quán),每日 1w 額度
結(jié)論:能 API 不爬蟲,能授權(quán)不硬剛。
11. 總結(jié)與進(jìn)階路線
? 原型階段:本文代碼直接跑,30 行即可出數(shù)
? 擴(kuò)展階段:協(xié)程池 + 代理池 + 重試,日采 50w SKU
? 生產(chǎn)階段:Hyperf + Kafka + ES 實(shí)時(shí)搜索
? 商業(yè)閉環(huán):價(jià)格告警、選品儀表盤、ERP 自動(dòng)訂價(jià)
12. 一鍵運(yùn)行 & 源碼
bash
git clone https://github.com/yourname/amz-php-crawler.git cd amz-php-crawler composer install php bin/amz.php B08N5WRWNW
輸出示例:
18:42:12 INFO ASIN=B08N5WRWNW, title=Apple AirPods Pro, price=$249.00, rating=4.6, reviews=25430
如果本文對(duì)你有幫助,記得 點(diǎn)贊 + 收藏 + 在看,我們下期「PHP + Swoole 實(shí)時(shí)價(jià)格流」見!
審核編輯 黃宇
-
PHP
+關(guān)注
關(guān)注
0文章
460瀏覽量
28367 -
Amazon
+關(guān)注
關(guān)注
1文章
126瀏覽量
17919
發(fā)布評(píng)論請(qǐng)先 登錄
1688商品詳情API完整指南
京東商品詳情價(jià)格監(jiān)控API完整教程
# 深度解析:爬蟲技術(shù)獲取淘寶商品詳情并封裝為API的全流程應(yīng)用
如何通過API獲取1688平臺(tái)商品詳情
淘寶商品詳情API接口:電商開發(fā)的利器
亞馬遜獲取商品詳情API接口指南
微店API秘籍!輕松獲取商品詳情數(shù)據(jù)
淘寶商品詳情API接口技術(shù)解析與實(shí)戰(zhàn)應(yīng)用
用 Python 給 Amazon 做“全身 CT”——可量產(chǎn)、可擴(kuò)展的商品詳情爬蟲實(shí)戰(zhàn)
搜索商品ID獲取商品詳情接口
搜索關(guān)鍵詞獲取商品詳情接口的設(shè)計(jì)與實(shí)現(xiàn)
從 0 到 1:用 PHP 爬蟲優(yōu)雅地拿下京東商品詳情
eBay 商品詳情 API 深度解析:從基礎(chǔ)信息到變體數(shù)據(jù)獲取全方案

從 0 到 1:用 PHP 爬蟲優(yōu)雅地拿下 Amazon 商品詳情
評(píng)論