Python实战，Steam评测爬虫开发与数据分析_资讯

：

《Python实战：如何构建Steam评测爬虫并分析玩家反馈？》

Steam是全球更大的数字游戏发行平台之一,玩家可以在游戏页面发表评测，分享自己的游戏体验，这些评测数据对游戏开发者、市场分析师以及玩家本身都具有重要价值，手动收集和分析大量评测数据效率低下，我们可以利用Python构建一个Steam评测爬虫，自动抓取并分析这些数据。

本文将介绍如何使用Python的requests、BeautifulSoup和selenium等库，构建一个高效的Steam评测爬虫，并展示如何对数据进行初步分析。

Steam评测爬虫的实现

1 准备工作

在开始之前,我们需要安装必要的Python库：

pip install requests beautifulsoup4 selenium pandas

由于Steam的评测页面可能涉及动态加载,我们可能需要使用selenium来模拟浏览器行为。

2 分析Steam评测页面的结构

Steam评测页面的URL通常如下格式：

https://steamcommunity.com/app/{APP_ID}/reviews/?p=1&browsefilter=toprated

APP_ID 是游戏的唯一标识符（Dota 2》的APP_ID是570）。
p=1 表示评测的页码。
browsefilter 可以调整排序方式（如toprated、mostrecent）。

3 使用requests和BeautifulSoup抓取评测数据

我们可以先尝试使用requests获取HTML内容，并用BeautifulSoup解析：

import requests
from bs4 import BeautifulSoup
def get_reviews(app_id, page=1):
    url = f"https://steamcommunity.com/app/{app_id}/reviews/?p={page}&browsefilter=toprated"
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    reviews = []
    for review in soup.select('.apphub_Card'):
        username = review.select_one('.apphub_CardContentAuthorName').text.strip()
        rating = '推荐' if '推荐' in review.select_one('.title').text else '不推荐'
        content = review.select_one('.apphub_CardTextContent').text.strip()
        reviews.append({
            'username': username,
            'rating': rating,
            'content': content
        })
    return reviews
# 示例：抓取《Dota 2》的之一页评测
reviews = get_reviews(570, page=1)
for review in reviews:
    print(review)

4 处理动态加载的评测（使用Selenium）

如果评测数据是通过JavaScript动态加载的,我们可以使用selenium模拟浏览器操作：

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
def get_reviews_selenium(app_id, page=1):
    driver = webdriver.Chrome()
    url = f"https://steamcommunity.com/app/{app_id}/reviews/?p={page}&browsefilter=toprated"
    driver.get(url)
    # 等待评测加载完成
    WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CLASS_NAME, 'apphub_Card'))
    )
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    reviews = []
    for review in soup.select('.apphub_Card'):
        username = review.select_one('.apphub_CardContentAuthorName').text.strip()
        rating = '推荐' if '推荐' in review.select_one('.title').text else '不推荐'
        content = review.select_one('.apphub_CardTextContent').text.strip()
        reviews.append({
            'username': username,
            'rating': rating,
            'content': content
        })
    driver.quit()
    return reviews
# 示例：使用Selenium抓取评测
reviews = get_reviews_selenium(570, page=1)
print(reviews)

数据存储与分析

1 存储评测数据（CSV或数据库）

我们可以使用pandas将评测数据存储为CSV文件：

import pandas as pd
def save_to_csv(reviews, filename='steam_reviews.csv'):
    df = pd.DataFrame(reviews)
    df.to_csv(filename, index=False, encoding='utf-8-sig')
# 保存数据
save_to_csv(reviews)

2 简单数据分析

我们可以统计评测的推荐比例,或进行文本分析（如词频统计）：

from collections import Counter
import jieba  # 中文分词（如果是英文评测，可以用nltk）
# 统计推荐比例
ratings = [review['rating'] for review in reviews]
rating_counts = pd.Series(ratings).value_counts()
print(rating_counts)
# 词频分析（中文示例）
all_text = ' '.join([review['content'] for review in reviews])
words = jieba.lcut(all_text)
word_counts = Counter(words).most_common(20)
print(word_counts)