淘宝/天猫统一商品监控脚本 —— 教学设计文档

📚 文档目标

本文档面向编程学习者或爬虫初学者，详细讲解如何设计并编写一个通用的淘宝/天猫商品监控脚本。通过拆解脚本的每一个模块，你将学会：

如何分析并模拟淘宝/天猫的商品页面请求
如何从混乱的 HTML 中提取嵌入式 JSON 数据
如何安全地解析嵌套字典并设置回退逻辑
如何实现循环监控并持久化存储到 CSV
如何处理 Cookie 认证和常见的反爬问题

完整脚本源码见本文底部，本文以逐模块讲解的方式呈现。

一、整体设计思路

1.1 需求分析

我们要实现一个命令行工具，能够：

输入一个淘宝或天猫的商品链接
自动识别平台（taobao / tmall）
抓取商品的完整信息：标题、店铺、价格（SKU最低价）、参数、保障等
支持单次运行或定时循环监控
将每次结果追加到 CSV 文件中，方便后续分析

1.2 设计原则

健壮性：当某个字段缺失时，使用备选方案（如价格回退）或默认值，避免程序崩溃。
可维护性：功能模块化（请求、解析、存储分离），便于修改和扩展。
用户友好：提供清晰的命令行参数和运行日志。

1.3 模块划分


┌─────────────────┐
│   参数解析模块   │  (argparse)
└────────┬────────┘
         ▼
┌─────────────────┐
│  Cookie 加载模块 │
└────────┬────────┘
         ▼
┌─────────────────┐
│  网络请求模块    │  (requests)
└────────┬────────┘
         ▼
┌─────────────────┐
│  HTML 解析模块   │  (正则 + json)
└────────┬────────┘
         ▼
┌─────────────────┐
│  数据提取模块    │  (safe_get, 价格计算)
└────────┬────────┘
         ▼
┌─────────────────┐
│   CSV 存储模块   │
└─────────────────┘

二、核心模块详解

2.1 参数解析与用户接口

使用 Python 标准库 argparse 提供命令行接口：

python
parser = argparse.ArgumentParser(description="统一商品监控脚本")
parser.add_argument("--url", "-u", required=True, help="商品链接")
parser.add_argument("--cookie", "-c", default="cookies.txt", help="Cookie文件")
parser.add_argument("--output", "-o", default="item_full_data.csv", help="输出CSV")
parser.add_argument("--interval", "-t", type=int, default=0, help="监控间隔(秒)")

教学要点：

required=True 表示必须提供该参数。
默认值可以让用户省略常用参数。
通过 type=int 自动转换数据类型。

淘宝/天猫的许多数据需要登录后才能获取（如会员价、部分 SKU 信息）。脚本通过 cookies.txt 文件读取 Cookie 字符串。

python
def load_cookie_from_file(file_path):
    with open(file_path, 'r', encoding='utf-8') as f:
        cookie_str = f.read().strip()
    cookie_dict = {}
    for item in cookie_str.split(';'):
        if '=' not in item: continue
        name, value = item.strip().split('=', 1)
        cookie_dict[name] = value
    return cookie_dict

设计亮点：

支持从浏览器直接复制的 Cookie 字符串（格式 name1=value1; name2=value2）。
返回字典格式，直接用于 requests.get(cookies=cookie_dict)。

教学要点：

Cookie 是键值对，需要分割解析。
异常处理：文件不存在、内容为空时给出明确错误。

2.3 平台识别与请求头

不同平台（淘宝/天猫）的请求头略有差异（例如天猫需要 referer）。通过解析 URL 的域名自动选择：

python
def parse_url_and_params(url):
    parsed = urlparse(url)
    base_url = f"{parsed.scheme}://{parsed.netloc}{parsed.path}"
    query_params = parse_qs(parsed.query)
    params = {k: v[0] for k, v in query_params.items()}
    if "taobao.com" in parsed.netloc:
        platform = "taobao"
    elif "tmall.com" in parsed.netloc:
        platform = "tmall"
    else:
        platform = "unknown"
    return platform, base_url, params

教学要点：

urllib.parse 模块用于分解 URL。
保留原始请求参数（如 id、spm 等），因为这些参数可能影响页面渲染。
根据平台选择预定义的 HEADERS 字典。

2.4 页面请求与重试机制

使用 requests.get 发送请求，设置超时和异常捕获：

python
def fetch_page(url, params, headers, cookies):
    resp = requests.get(url, headers=headers, cookies=cookies, params=params, timeout=15)
    resp.raise_for_status()  # 非200状态码抛出异常
    return resp

教学要点：

timeout 防止长时间阻塞。
raise_for_status() 简化错误处理。
上层调用会捕获异常并返回失败信息。

2.5 核心难点：从 HTML 中提取结构化数据

淘宝/天猫的商品数据并不是直接写在 HTML 标签中，而是嵌在 <script> 标签内的一个全局变量 window.__ICE_APP_CONTEXT__。我们需要用正则表达式将其提取并解析为 JSON。

python
def extract_ice_context(html):
    patterns = [
        r'window\.__ICE_APP_CONTEXT__\s*=\s*(\{[\s\S]*?\});',
        r'var\s+b\s*=\s*(\{[\s\S]*?\});'   # 备选模式
    ]
    for pattern in patterns:
        match = re.search(pattern, html)
        if match:
            json_str = match.group(1).rstrip(';')
            try:
                return json.loads(json_str)
            except json.JSONDecodeError:
                continue
    return None

设计思路：

为什么用正则而不是 BeautifulSoup？因为目标数据在 JavaScript 变量中，普通解析器难以提取。
提供了两种模式，兼容不同页面的变量名。
[\s\S]*? 表示匹配任意字符（包括换行），非贪婪模式。
提取后去除末尾的分号再解析 JSON。

教学要点：

正则表达式的贪婪与非贪婪。
处理不规范的 JSON（例如注释、尾随逗号）可能仍需额外清洗，但本脚本简化处理。
若解析失败，保存前 2000 字符到调试文件，方便分析。

2.6 安全取值函数 `safe_get`

由于 JSON 嵌套层次深且字段可能缺失，编写一个安全取值函数：

python
def safe_get(data, *keys, default=''):
    temp = data
    for key in keys:
        if isinstance(temp, dict):
            temp = temp.get(key)
            if temp is None:
                return default
        else:
            return default
    return temp if temp is not None else default

使用示例：

python
shop_name = safe_get(res, 'seller', 'shopName')
title = safe_get(res, 'item', 'title')

教学要点：

可变参数 *keys 允许传入任意多级键名。
每一步检查类型是否为字典，否则返回默认值。
避免了冗长的 try...except 或多次 get() 调用。

2.7 SKU 最低价提取算法

SKU 信息位于 res['skuCore']['sku2info'] 中，是一个以 SKU ID 为键的对象。每个 SKU 可能包含多个价格字段（券后价、原价、促销价等）。我们需要遍历所有 SKU，提取有效价格，取最小值。

python
def parse_sku_min_price(sku2info):
    real_skus = {k: v for k, v in sku2info.items() if k != '0'}  # 排除默认SKU
    min_price = None
    for sku_data in real_skus.values():
        price_value = None
        # 优先级1: 券后价
        sub_price = sku_data.get('subPrice', {})
        if sub_price:
            price_text = sub_price.get('priceText', '')
            if price_text:
                price_value = _extract_price_from_text(price_text)
        # 优先级2: 原价
        if price_value is None:
            price_info = sku_data.get('price', {})
            if price_info:
                price_text = price_info.get('priceText', '')
                if price_text:
                    price_value = _extract_price_from_text(price_text)
        # 优先级3: 直接 price 字段
        if price_value is None:
            direct = sku_data.get('price')
            if direct is not None:
                price_value = _extract_price_from_text(str(direct))
        # 若得到有效价格，更新最小值
        if price_value is not None and price_value > 0:
            if min_price is None or price_value < min_price:
                min_price = price_value
    return min_price if min_price is not None else 0

辅助函数 _extract_price_from_text 使用正则清理价格字符串（如 ¥99.00 → 99.00）：

python
def _extract_price_from_text(price_str):
    if not price_str:
        return None
    cleaned = re.sub(r'[^0-9.]', '', str(price_str))
    return float(cleaned) if cleaned else None

设计亮点：

多个备选价格字段，按优先级尝试，提高成功率。
过滤掉 sku_id = '0' 的默认 SKU，因为它通常不代表真实可选规格。
异常处理：如果没有任何 SKU 价格，返回 0，后续由回退逻辑处理。

2.8 价格回退机制

当 SKU 信息缺失或解析失败时（例如商品无 SKU，或页面结构变化），我们从页面右上角显示的 priceVO.price.priceText 提取备选价格。

python
rightBarPriceText = safe_get(res, 'componentsVO', 'priceVO', 'price', 'priceText', default='')
if min_price == 0 and rightBarPriceText:
    price_match = re.search(r'(\d+(?:\.\d+)?)', rightBarPriceText)
    if price_match:
        min_price = float(price_match.group(1))

教学要点：

设计回退逻辑是健壮性的体现，避免因某个字段缺失导致整个程序失败。
备选价格仍可能包含“起”字或促销文案，需要用正则提取纯数字。

2.9 参数与保障信息提取

商品参数位于 componentsVO.extensionInfoVO.infos 中，其中 type 为 'BASE_PROPS' 的项包含参数列表。同样，保障信息有 'GUARANTEE' 和 'GUARANTEE_NEW' 两种类型。

python
def extract_extension_info(infos):
    result = {'guarantee': [], 'guarantee_new': [], 'params': {}}
    for item in infos:
        if item.get('type') == 'GUARANTEE':
            for sub in item.get('items', []):
                result['guarantee'].extend(sub.get('text', []))
        elif item.get('type') == 'GUARANTEE_NEW':
            for sub in item.get('items', []):
                result['guarantee_new'].append({
                    'title': sub.get('title'),
                    'description': sub.get('text', [''])[0]
                })
        elif item.get('type') == 'BASE_PROPS':
            for sub in item.get('items', []):
                name = sub.get('title')
                values = sub.get('text', [])
                if name:
                    result['params'][name] = values[0] if len(values) == 1 else values
    return result

教学要点：

数据结构是嵌套的，需要逐层遍历。
参数值可能是数组（例如多个颜色），需特殊处理。
保障信息分为新旧两种格式，分别提取。

2.10 CSV 存储

使用 csv.DictWriter 将记录写入文件，自动处理复杂类型（如图片列表、参数对象）的 JSON 序列化。

python
def append_full_record_to_csv(record, csv_file):
    fieldnames = ['timestamp', 'item_id', 'platform', ...]  # 18个字段
    file_exists = os.path.isfile(csv_file)
    with open(csv_file, 'a', newline='', encoding='utf-8-sig') as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        if not file_exists:
            writer.writeheader()
        # 将列表和字典转为 JSON 字符串
        record['images'] = json.dumps(record.get('images', []), ensure_ascii=False)
        record['guarantee'] = json.dumps(record.get('guarantee', []), ensure_ascii=False)
        record['guarantee_new'] = json.dumps(record.get('guarantee_new', []), ensure_ascii=False)
        record['params'] = json.dumps(record.get('params', {}), ensure_ascii=False)
        writer.writerow(record)

教学要点：

utf-8-sig 编码使 Excel 能正确识别中文。
使用 os.path.isfile 判断是否首次写入，自动添加表头。
复杂字段序列化为 JSON，保留完整信息，同时避免 CSV 格式错乱。

2.11 循环监控主流程

main 函数中根据 interval 参数决定是单次运行还是循环监控。

python
def main():
    args = parser.parse_args()
    cookies = load_cookie_from_file(args.cookie)
    while True:
        success = monitor_once(args.url, cookies, args.output)
        if not success:
            print("[ERROR] 监控失败")
        if args.interval <= 0:
            break
        print(f"[INFO] 等待 {args.interval} 秒后继续...")
        time.sleep(args.interval)

教学要点：

无限循环 + 条件中断。
每次监控后等待指定秒数，避免请求过快。
即使某次失败，也会继续下一次（适合长时间运行）。

三、异常处理与健壮性设计

网络请求异常：捕获 requests.RequestException，打印错误并返回 False，由上层决定是否重试。
JSON 解析异常：若 __ICE_APP_CONTEXT__ 解析失败，尝试第二种模式，否则返回 None 并退出。
字段缺失：所有取值都通过 safe_get 并提供默认值，不会因 KeyError 崩溃。
Cookie 失效：虽然脚本没有主动检测，但用户可根据日志中“未找到页面数据”的提示更新 Cookie。

四、扩展建议

基于本脚本，可以进一步扩展的功能：

数据库存储：将 CSV 替换为 SQLite，支持历史趋势查询。
降价提醒：比较本次价格与上次价格，若降低则发送邮件或微信通知。
多商品管理：维护一个商品 URL 列表，循环监控所有商品。
Web 可视化：使用 Flask + ECharts 展示价格变化曲线。

五、总结

本脚本是一个完整的、可投入使用的爬虫教学案例，涵盖了：

命令行参数解析
Cookie 认证
正则提取嵌入式 JSON
安全的数据访问模式
多备选字段的容错设计
持久化存储

通过学习本脚本，你将掌握如何从零开始设计一个针对动态网页的爬虫，并理解如何编写健壮、可维护的代码。

下一步练习：尝试修改脚本，增加“与上次价格对比并输出变化”的功能，或者将其改造为 Flask API 服务。

附：完整脚本源码（参见用户提供的代码）。建议配合调试工具（如浏览器开发者工具）实际运行，观察每一步的输出，加深理解。

完整源码

python
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
统一商品监控脚本（支持淘宝 / 天猫）
功能：通过商品链接自动解析请求参数，抓取完整数据（基础信息、SKU最低价、参数、保障）
支持单次运行或循环监控
"""

import re
import json
import os
import time
import csv
import argparse
import requests
from urllib.parse import urlparse, parse_qs
from datetime import datetime

# ======================== 配置 ========================
COOKIE_FILE = "cookies.txt"              # Cookie文件路径
MONITOR_INTERVAL = 3600                  # 默认监控间隔（秒）
DEFAULT_CSV_FILE = "item_full_data.csv"  # 默认输出CSV文件名

# 平台对应的请求头（仅必要头，不含任何商品参数）
PLATFORM_HEADERS = {
    "taobao": {
        "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
        "accept-language": "zh-CN,zh;q=0.9",
        "cache-control": "no-cache",
        "pragma": "no-cache",
        "priority": "u=0, i",
        "sec-ch-ua": "\"Google Chrome\";v=\"147\", \"Not.A/Brand\";v=\"8\", \"Chromium\";v=\"147\"",
        "sec-ch-ua-mobile": "?0",
        "sec-ch-ua-platform": "\"Windows\"",
        "sec-fetch-dest": "document",
        "sec-fetch-mode": "navigate",
        "sec-fetch-site": "same-origin",
        "sec-fetch-user": "?1",
        "upgrade-insecure-requests": "1",
        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/147.0.0.0 Safari/537.36"
    },
    "tmall": {
        "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
        "accept-language": "zh-CN,zh;q=0.9",
        "cache-control": "no-cache",
        "pragma": "no-cache",
        "priority": "u=0, i",
        "sec-ch-ua": "\"Google Chrome\";v=\"147\", \"Not.A/Brand\";v=\"8\", \"Chromium\";v=\"147\"",
        "sec-ch-ua-mobile": "?0",
        "sec-ch-ua-platform": "\"Windows\"",
        "sec-fetch-dest": "document",
        "sec-fetch-mode": "navigate",
        "sec-fetch-site": "same-origin",
        "sec-fetch-user": "?1",
        "upgrade-insecure-requests": "1",
        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/147.0.0.0 Safari/537.36",
        "referer": "https://www.tmall.com/",
        "origin": "https://detail.tmall.com"
    }
}

# ======================== 工具函数 ========================
def load_cookie_from_file(file_path):
    """从文本文件读取Cookie字符串，返回字典"""
    try:
        with open(file_path, 'r', encoding='utf-8') as f:
            cookie_str = f.read().strip()
        if not cookie_str:
            raise ValueError("Cookie文件为空")
        cookie_dict = {}
        for item in cookie_str.split(';'):
            item = item.strip()
            if not item or '=' not in item:
                continue
            name, value = item.split('=', 1)
            cookie_dict[name] = value
        print("[INFO] Cookie导入成功")
        return cookie_dict
    except FileNotFoundError:
        print(f"[ERROR] Cookie文件不存在: {file_path}")
        raise
    except Exception as e:
        print(f"[ERROR] 加载Cookie失败: {e}")
        raise

def parse_url_and_params(url):
    """
    从商品链接解析平台、基础URL、请求参数
    返回 (platform, base_url, params_dict)
    """
    parsed = urlparse(url)
    base_url = f"{parsed.scheme}://{parsed.netloc}{parsed.path}"
    query_params = parse_qs(parsed.query)
    params = {k: v[0] for k, v in query_params.items()}
    
    host = parsed.netloc.lower()
    if "taobao.com" in host:
        platform = "taobao"
    elif "tmall.com" in host:
        platform = "tmall"
    else:
        platform = "unknown"
    
    return platform, base_url, params

def fetch_page(url, params, headers, cookies):
    """请求商品页面，返回响应对象"""
    try:
        resp = requests.get(url, headers=headers, cookies=cookies, params=params, timeout=15)
        resp.raise_for_status()
        return resp
    except requests.RequestException as e:
        print(f"[ERROR] 网络请求失败: {e}")
        raise

def extract_ice_context(html):
    """从HTML中提取 __ICE_APP_CONTEXT__ JSON对象"""
    patterns = [
        r'window\.__ICE_APP_CONTEXT__\s*=\s*(\{[\s\S]*?\});',
        r'var\s+b\s*=\s*(\{[\s\S]*?\});'
    ]
    for pattern in patterns:
        match = re.search(pattern, html)
        if match:
            json_str = match.group(1).rstrip(';')
            try:
                return json.loads(json_str)
            except json.JSONDecodeError:
                continue
    print("[ERROR] 未找到 __ICE_APP_CONTEXT__ 或 var b")
    return None

def safe_get(data, *keys, default=''):
    """安全获取嵌套字典的值"""
    temp = data
    for key in keys:
        if isinstance(temp, dict):
            temp = temp.get(key)
            if temp is None:
                return default
        else:
            return default
    return temp if temp is not None else default

def parse_sku_min_price(sku2info):
    """
    从 sku2info 中提取所有SKU的价格，返回最低价（仅最低价）
    """
    real_skus = {k: v for k, v in sku2info.items() if k != '0'}
    min_price = None

    for sku_id, sku_data in real_skus.items():
        price_value = None
        
        # 1. 券后价 subPrice.priceText
        sub_price = sku_data.get('subPrice', {})
        if sub_price:
            price_text = sub_price.get('priceText', '')
            if price_text:
                price_value = _extract_price_from_text(price_text)
        
        # 2. 原价 price.priceText
        if price_value is None:
            price_info = sku_data.get('price', {})
            if price_info:
                price_text = price_info.get('priceText', '')
                if price_text:
                    price_value = _extract_price_from_text(price_text)
        
        # 3. 直接 price 字段
        if price_value is None:
            direct_price = sku_data.get('price')
            if direct_price is not None:
                price_value = _extract_price_from_text(str(direct_price))
        
        # 4. amount 字段
        if price_value is None:
            amount = sku_data.get('amount')
            if amount is not None:
                price_value = _extract_price_from_text(str(amount))
        
        # 5. promotionPrice 字段
        if price_value is None:
            promo = sku_data.get('promotionPrice')
            if promo is not None:
                price_value = _extract_price_from_text(str(promo))
        
        if price_value is not None and price_value > 0:
            if min_price is None or price_value < min_price:
                min_price = price_value

    if min_price is None:
        print("[WARN] 未能从任何SKU中提取到有效价格")
        min_price = 0

    return min_price

def _extract_price_from_text(price_str):
    """从价格字符串中提取浮点数，返回 None 表示失败"""
    if not price_str:
        return None
    cleaned = re.sub(r'[^0-9.]', '', str(price_str))
    if cleaned:
        try:
            return float(cleaned)
        except ValueError:
            return None
    return None

def extract_extension_info(infos):
    """从 componentsVO.extensionInfoVO.infos 中提取保障和参数"""
    result = {
        'guarantee': [],
        'guarantee_new': [],
        'params': {}
    }
    for item in infos:
        item_type = item.get('type')
        if item_type == 'GUARANTEE':
            for sub in item.get('items', []):
                texts = sub.get('text', [])
                result['guarantee'].extend(texts)
        elif item_type == 'GUARANTEE_NEW':
            for sub in item.get('items', []):
                result['guarantee_new'].append({
                    'title': sub.get('title'),
                    'icon': sub.get('icon'),
                    'description': sub.get('text', [''])[0] if sub.get('text') else ''
                })
        elif item_type == 'BASE_PROPS':
            for sub in item.get('items', []):
                param_name = sub.get('title')
                param_values = sub.get('text', [])
                if param_name:
                    if len(param_values) == 1:
                        result['params'][param_name] = param_values[0]
                    else:
                        result['params'][param_name] = param_values
    return result

def append_full_record_to_csv(record, csv_file):
    """将一条完整记录追加到CSV文件"""
    fieldnames = [
        'timestamp', 'item_id', 'platform', 'shop_name', 'title', 'spu_id', 'qr_code', 'images',
        'min_price', 'max_price', 'avg_price', 'total_quantity', 'in_stock_sku', 'out_of_stock_sku', 'total_sku',
        'guarantee', 'guarantee_new', 'params'
    ]
    file_exists = os.path.isfile(csv_file)
    with open(csv_file, 'a', newline='', encoding='utf-8-sig') as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        if not file_exists:
            writer.writeheader()
        record['images'] = json.dumps(record.get('images', []), ensure_ascii=False)
        record['guarantee'] = json.dumps(record.get('guarantee', []), ensure_ascii=False)
        record['guarantee_new'] = json.dumps(record.get('guarantee_new', []), ensure_ascii=False)
        record['params'] = json.dumps(record.get('params', {}), ensure_ascii=False)
        writer.writerow(record)
    print("[INFO] 监控成功，数据已保存")

def monitor_once(url, cookies, csv_file):
    """单次监控流程，参数从URL解析"""
    platform, base_url, params = parse_url_and_params(url)
    if platform == "unknown":
        print("[ERROR] 无法识别平台，请确保链接来自 taobao.com 或 tmall.com")
        return False
    
    headers = PLATFORM_HEADERS.get(platform, PLATFORM_HEADERS["taobao"])
    
    try:
        resp = fetch_page(base_url, params, headers, cookies)
    except Exception:
        return False
    
    data = extract_ice_context(resp.text)
    if not data:
        return False
    
    res = safe_get(data, 'loaderData', 'home', 'data', 'res', default={})
    if not res:
        print("[ERROR] 未找到商品数据 res")
        return False
    
    # 提取基础字段
    shopName = safe_get(res, 'seller', 'shopName')
    title = safe_get(res, 'item', 'title')
    itemId = safe_get(res, 'item', 'itemId')
    qrCode = safe_get(res, 'item', 'qrCode')
    spuId = safe_get(res, 'item', 'spuId')
    images = safe_get(res, 'item', 'images', default=[])
    
    # 提取备选价格（页面右上角显示的价格）
    rightBarPriceText = safe_get(res, 'componentsVO', 'priceVO', 'price', 'priceText', default='')
    
    # 提取SKU最低价
    sku2info = safe_get(res, 'skuCore', 'sku2info', default={})
    if not sku2info:
        print("[WARN] 未获取到SKU信息，将尝试使用备选价格")
        min_price = 0
    else:
        min_price = parse_sku_min_price(sku2info)
    
    # 价格回退逻辑：如果 min_price == 0，则尝试使用备选价格
    use_fallback = False
    if min_price == 0 and rightBarPriceText:
        price_match = re.search(r'(\d+(?:\.\d+)?)', rightBarPriceText)
        if price_match:
            min_price = float(price_match.group(1))
            use_fallback = True
            print(f"[INFO] SKU价格无效，使用备选价格: {min_price} 元")
        else:
            print(f"[WARN] 备选价格文本无法解析: {rightBarPriceText}")
    
    # 提取扩展信息
    extension_infos = safe_get(res, 'componentsVO', 'extensionInfoVO', 'infos', default=[])
    if extension_infos:
        extension = extract_extension_info(extension_infos)
    else:
        extension = {'guarantee': [], 'guarantee_new': [], 'params': {}}
    
    # 构建记录（max_price和avg_price设为0，total_quantity等可保留但不再计算，直接填0）
    record = {
        'timestamp': datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
        'item_id': itemId,
        'platform': platform,
        'shop_name': shopName,
        'title': title,
        'spu_id': spuId,
        'qr_code': qrCode,
        'images': images,
        'min_price': min_price,
        'max_price': 0,
        'avg_price': 0,
        'total_quantity': 0,
        'in_stock_sku': 0,
        'out_of_stock_sku': 0,
        'total_sku': 0,
        'guarantee': extension.get('guarantee', []),
        'guarantee_new': extension.get('guarantee_new', []),
        'params': extension.get('params', {})
    }
    
    append_full_record_to_csv(record, csv_file)
    return True

def main():
    parser = argparse.ArgumentParser(description="统一商品监控脚本（通过商品链接自动提取参数）")
    parser.add_argument("--url", "-u", required=True, help="商品完整链接（淘宝或天猫）")
    parser.add_argument("--cookie", "-c", default=COOKIE_FILE, help=f"Cookie文件路径，默认 {COOKIE_FILE}")
    parser.add_argument("--output", "-o", default=DEFAULT_CSV_FILE, help=f"输出CSV文件路径，默认 {DEFAULT_CSV_FILE}")
    parser.add_argument("--interval", "-t", type=int, default=0, help="监控间隔（秒），0表示只运行一次，默认0")
    args = parser.parse_args()
    
    # 加载Cookie
    try:
        cookies = load_cookie_from_file(args.cookie)
    except Exception:
        return
    
    # 循环或单次执行
    while True:
        success = monitor_once(args.url, cookies, args.output)
        if not success:
            print("[ERROR] 监控失败")
        if args.interval <= 0:
            break
        print(f"[INFO] 等待 {args.interval} 秒后继续...")
        time.sleep(args.interval)

if __name__ == "__main__":
    main()

目录

淘宝/天猫统一商品监控脚本 —— 教学设计文档

📚 文档目标

一、整体设计思路

1.1 需求分析

1.2 设计原则

1.3 模块划分

二、核心模块详解

2.1 参数解析与用户接口

2.2 Cookie 加载与认证

2.3 平台识别与请求头

2.4 页面请求与重试机制

2.5 核心难点：从 HTML 中提取结构化数据

2.6 安全取值函数 safe_get

2.7 SKU 最低价提取算法

2.8 价格回退机制

2.9 参数与保障信息提取

2.10 CSV 存储

2.11 循环监控主流程

三、异常处理与健壮性设计

四、扩展建议

五、总结

完整源码

2.6 安全取值函数 `safe_get`