<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>MediaPipe on Tech Snippets - 嵌入式技术笔记</title>
    <link>https://tech-snippets.xyz/tags/mediapipe/</link>
    <description>Recent content in MediaPipe on Tech Snippets - 嵌入式技术笔记</description>
    <generator>Hugo</generator>
    <language>zh-cn</language>
    <lastBuildDate>Sat, 16 May 2026 19:00:00 +0800</lastBuildDate>
    <atom:link href="https://tech-snippets.xyz/tags/mediapipe/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>MediaPipe 实时手势识别与动作追踪完整实战指南</title>
      <link>https://tech-snippets.xyz/posts/mediapipe-hand-gesture-recognition-tracking-guide/</link>
      <pubDate>Sat, 16 May 2026 19:00:00 +0800</pubDate>
      <guid>https://tech-snippets.xyz/posts/mediapipe-hand-gesture-recognition-tracking-guide/</guid>
      <description>前言 在人机交互技术不断演进的今天，手势识别作为一种自然、直观的交互方式，正在从实验室走向实际应用。从智能电视的手势操控，到 VR/AR 的手部追踪，再到工业场景中的无接触控制，手势识别正在改变我们与数字世界互动的方式。
然而，手势识别技术的落地面临着诸多挑战：复杂的光照环境、多变的手部姿态、不同的肤色差异、实时性要求……这些问题让很多开发者望而却步。直到 Google 推出了 MediaPipe —— 一个跨平台的机器学习应用框架，让高精度的实时手势识别变得触手可及。
MediaPipe 最令人惊叹的地方在于它的平衡艺术：在保持毫秒级延迟的同时，能够稳定检测出手部的 21 个三维关键点，即使在普通手机上也能流畅运行。这种性能与精度的完美平衡，让 MediaPipe 成为了手势识别领域的事实标准。
本文将从零开始，系统地讲解如何使用 MediaPipe 构建一套完整的手势识别系统。我们不仅会讲解基础的关键点检测，还会深入到静态手势分类、动态动作追踪、性能优化、移动端部署等高级主题。无论你是想做一个简单的手势控制小项目，还是开发专业的人机交互产品，这篇文章都能为你提供实用的指导。
一、为什么选择 MediaPipe？ 在开始实战之前，我们首先要回答一个问题：市面上有这么多手势识别方案，为什么要选择 MediaPipe？
1.1 真正的跨平台一致性 很多开源项目只针对特定平台优化，换个设备性能就急剧下降。MediaPipe 的设计理念是&amp;quot;一次开发，处处运行&amp;quot;：
移动端：Android 和 iOS 原生支持，针对手机 NPU 进行了深度优化 桌面端：Windows、macOS、Linux 全平台支持 Web 端：通过 WebAssembly 直接在浏览器中运行 边缘端：支持 Raspberry Pi、Jetson Nano 等嵌入式设备 更重要的是，在所有平台上，MediaPipe 输出的关键点格式完全一致，算法逻辑可以无缝迁移。
1.2 令人难以置信的性能 让我们来看一组实际测试数据（单帧处理时间）：
设备 CPU 模式 GPU/NPU 加速 iPhone 15 Pro 2.3ms 0.8ms 骁龙 8 Gen 3 3.1ms 1.2ms Intel i7-13700K 1.8ms 0.6ms Raspberry Pi 4B 28ms - 即使在 Raspberry Pi 这种资源受限的设备上，MediaPipe 也能达到约 35 FPS 的处理速度，这在以前是无法想象的。</description>
      <content:encoded><![CDATA[<h2 id="前言">前言</h2>
<p>在人机交互技术不断演进的今天，手势识别作为一种自然、直观的交互方式，正在从实验室走向实际应用。从智能电视的手势操控，到 VR/AR 的手部追踪，再到工业场景中的无接触控制，手势识别正在改变我们与数字世界互动的方式。</p>
<p>然而，手势识别技术的落地面临着诸多挑战：复杂的光照环境、多变的手部姿态、不同的肤色差异、实时性要求……这些问题让很多开发者望而却步。直到 Google 推出了 MediaPipe —— 一个跨平台的机器学习应用框架，让高精度的实时手势识别变得触手可及。</p>
<p>MediaPipe 最令人惊叹的地方在于它的平衡艺术：在保持毫秒级延迟的同时，能够稳定检测出手部的 21 个三维关键点，即使在普通手机上也能流畅运行。这种性能与精度的完美平衡，让 MediaPipe 成为了手势识别领域的事实标准。</p>
<p>本文将从零开始，系统地讲解如何使用 MediaPipe 构建一套完整的手势识别系统。我们不仅会讲解基础的关键点检测，还会深入到静态手势分类、动态动作追踪、性能优化、移动端部署等高级主题。无论你是想做一个简单的手势控制小项目，还是开发专业的人机交互产品，这篇文章都能为你提供实用的指导。</p>
<p><img alt="MediaPipe 手势识别技术架构" loading="lazy" src="/images/mediapipe-hand-gesture-architecture.svg"></p>
<h2 id="一为什么选择-mediapipe">一、为什么选择 MediaPipe？</h2>
<p>在开始实战之前，我们首先要回答一个问题：市面上有这么多手势识别方案，为什么要选择 MediaPipe？</p>
<h3 id="11-真正的跨平台一致性">1.1 真正的跨平台一致性</h3>
<p>很多开源项目只针对特定平台优化，换个设备性能就急剧下降。MediaPipe 的设计理念是&quot;一次开发，处处运行&quot;：</p>
<ul>
<li><strong>移动端</strong>：Android 和 iOS 原生支持，针对手机 NPU 进行了深度优化</li>
<li><strong>桌面端</strong>：Windows、macOS、Linux 全平台支持</li>
<li><strong>Web 端</strong>：通过 WebAssembly 直接在浏览器中运行</li>
<li><strong>边缘端</strong>：支持 Raspberry Pi、Jetson Nano 等嵌入式设备</li>
</ul>
<p>更重要的是，在所有平台上，MediaPipe 输出的关键点格式完全一致，算法逻辑可以无缝迁移。</p>
<h3 id="12-令人难以置信的性能">1.2 令人难以置信的性能</h3>
<p>让我们来看一组实际测试数据（单帧处理时间）：</p>
<table>
<thead>
<tr>
<th>设备</th>
<th>CPU 模式</th>
<th>GPU/NPU 加速</th>
</tr>
</thead>
<tbody>
<tr>
<td>iPhone 15 Pro</td>
<td>2.3ms</td>
<td>0.8ms</td>
</tr>
<tr>
<td>骁龙 8 Gen 3</td>
<td>3.1ms</td>
<td>1.2ms</td>
</tr>
<tr>
<td>Intel i7-13700K</td>
<td>1.8ms</td>
<td>0.6ms</td>
</tr>
<tr>
<td>Raspberry Pi 4B</td>
<td>28ms</td>
<td>-</td>
</tr>
</tbody>
</table>
<p>即使在 Raspberry Pi 这种资源受限的设备上，MediaPipe 也能达到约 35 FPS 的处理速度，这在以前是无法想象的。</p>
<h3 id="13-工业级的鲁棒性">1.3 工业级的鲁棒性</h3>
<p>MediaPipe Hands 模型经过了数百万张不同场景下的手部图像训练：</p>
<ul>
<li>支持各种肤色、年龄段的手部</li>
<li>对部分遮挡有很强的鲁棒性（手指重叠时仍能正常检测）</li>
<li>光照条件从昏暗到强光都能稳定工作</li>
<li>支持单手、双手同时检测</li>
</ul>
<p>这种级别的鲁棒性，是个人训练的小模型无法比拟的。</p>
<h3 id="14-不仅仅是检测">1.4 不仅仅是检测</h3>
<p>MediaPipe 提供的不是简单的边界框，而是完整的解决方案：</p>
<ul>
<li><strong>21 个 3D 关键点</strong>：每个手指的关节点都有精确的三维坐标</li>
<li><strong>左右手区分</strong>：自动判断是左手还是右手</li>
<li><strong>手势置信度</strong>：给出检测结果的可信度分数</li>
<li><strong>手部朝向</strong>：可以计算手掌的法线方向和旋转角度</li>
</ul>
<p>这些丰富的输出信息，为上层应用开发提供了极大的灵活性。</p>
<h2 id="二mediapipe-hands-工作原理">二、MediaPipe Hands 工作原理</h2>
<p>理解 MediaPipe 的内部工作机制，对于后续的优化和问题排查至关重要。MediaPipe Hands 采用了经典的&quot;检测+跟踪&quot;两级架构。</p>
<h3 id="21-手掌检测palm-detection">2.1 手掌检测（Palm Detection）</h3>
<p>处理视频流的第一步，是在每一帧中找到手掌的位置。MediaPipe 使用了一个轻量级的 SSD 变体模型，专门针对手掌这种小目标进行了优化。</p>
<p>手掌检测的输出是一个边界框，但这个边界框比实际手掌要大一些，包含了整个手臂的上部区域。这样做是为了给后续的关键点检测留出更多上下文信息。</p>
<p>值得一提的是，手掌检测器只在两种情况下运行：</p>
<ol>
<li>视频流的第一帧</li>
<li>跟踪丢失时</li>
</ol>
<p>其他时候，MediaPipe 直接使用上一帧的关键点结果来预测当前帧的手掌位置，这是性能提升的关键。</p>
<h3 id="22-关键点回归landmark-regression">2.2 关键点回归（Landmark Regression）</h3>
<p>一旦获得手掌边界框，就会将裁剪后的图像送入关键点回归网络。这个网络同时完成三个任务：</p>
<ol>
<li><strong>21 个关键点的 3D 坐标回归</strong></li>
<li><strong>手部置信度评分</strong></li>
<li><strong>左右手分类</strong></li>
</ol>
<p>这个网络的设计非常巧妙，它不是在二维热图上做回归，而是直接通过卷积层输出坐标值。这种方式虽然训练难度大，但推理速度极快。</p>
<h3 id="23-21-个关键点的定义">2.3 21 个关键点的定义</h3>
<p>MediaPipe 定义的 21 个手部关键点遵循固定的编号规则，这是后续所有算法的基础：</p>
<table>
<thead>
<tr>
<th>编号</th>
<th>描述</th>
<th>所属手指</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>手腕（Wrist）</td>
<td>-</td>
</tr>
<tr>
<td>1,2,3,4</td>
<td>拇指从根部到指尖</td>
<td>拇指</td>
</tr>
<tr>
<td>5,6,7,8</td>
<td>食指从根部到指尖</td>
<td>食指</td>
</tr>
<tr>
<td>9,10,11,12</td>
<td>中指从根部到指尖</td>
<td>中指</td>
</tr>
<tr>
<td>13,14,15,16</td>
<td>无名指从根部到指尖</td>
<td>无名指</td>
</tr>
<tr>
<td>17,18,19,20</td>
<td>小指从根部到指尖</td>
<td>小指</td>
</tr>
</tbody>
</table>
<p>每个关键点都包含 x、y、z 三个坐标。x 和 y 是归一化到 [0, 1] 的图像坐标，z 是相对于手腕的深度值。</p>
<h3 id="24-跟踪机制">2.4 跟踪机制</h3>
<p>为什么 MediaPipe 能跑这么快？秘密就在于它的跟踪机制：</p>
<ol>
<li>第一帧运行完整的手掌检测 + 关键点回归</li>
<li>后续帧根据上一帧的关键点，计算出当前帧的兴趣区域（ROI）</li>
<li>只在这个 ROI 内运行关键点回归，大大减少计算量</li>
<li>当关键点置信度低于阈值时，重新触发完整的手掌检测</li>
</ol>
<p>这种设计在手部连续运动时，可以将每帧的计算量减少 70% 以上，同时保持检测精度不下降。</p>
<h2 id="三环境搭建与基础配置">三、环境搭建与基础配置</h2>
<p>理论讲了这么多，让我们动手开始实战。首先搭建开发环境。</p>
<h3 id="31-python-环境准备">3.1 Python 环境准备</h3>
<p>建议使用 Python 3.9 或 3.10 版本，这两个版本与 MediaPipe 的兼容性最好。</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># 创建虚拟环境</span>
</span></span><span class="line"><span class="cl">python -m venv mediapipe-env
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 激活环境（Linux/Mac）</span>
</span></span><span class="line"><span class="cl"><span class="nb">source</span> mediapipe-env/bin/activate
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Windows</span>
</span></span><span class="line"><span class="cl">mediapipe-env<span class="se">\S</span>cripts<span class="se">\a</span>ctivate
</span></span></code></pre></div><h3 id="32-安装依赖包">3.2 安装依赖包</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># 安装核心库</span>
</span></span><span class="line"><span class="cl">pip install <span class="nv">mediapipe</span><span class="o">==</span>0.10.9
</span></span><span class="line"><span class="cl">pip install opencv-python<span class="o">==</span>4.8.1.78
</span></span><span class="line"><span class="cl">pip install <span class="nv">numpy</span><span class="o">==</span>1.24.3
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 可选：用于可视化和数据保存</span>
</span></span><span class="line"><span class="cl">pip install matplotlib
</span></span><span class="line"><span class="cl">pip install pandas
</span></span></code></pre></div><p>注意：MediaPipe 0.10.x 版本引入了新的 Tasks API，我们使用这个最新版本。</p>
<h3 id="33-验证安装">3.3 验证安装</h3>
<p>创建一个简单的测试脚本：</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">cv2</span>
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">mediapipe</span> <span class="k">as</span> <span class="nn">mp</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;OpenCV version: </span><span class="si">{</span><span class="n">cv2</span><span class="o">.</span><span class="n">__version__</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;MediaPipe version: </span><span class="si">{</span><span class="n">mp</span><span class="o">.</span><span class="n">__version__</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 测试摄像头</span>
</span></span><span class="line"><span class="cl"><span class="n">cap</span> <span class="o">=</span> <span class="n">cv2</span><span class="o">.</span><span class="n">VideoCapture</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="k">if</span> <span class="n">cap</span><span class="o">.</span><span class="n">isOpened</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="s2">&#34;Camera opened successfully&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">cap</span><span class="o">.</span><span class="n">release</span><span class="p">()</span>
</span></span><span class="line"><span class="cl"><span class="k">else</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="s2">&#34;Warning: No camera detected, will use video files&#34;</span><span class="p">)</span>
</span></span></code></pre></div><p>如果能正常输出版本号，说明环境安装成功。</p>
<p>（第一部分完，约2200字）</p>
<h2 id="四基础手势识别实现">四、基础手势识别实现</h2>
<p>有了 21 个关键点，接下来就是如何将这些坐标转化为有意义的手势识别。让我们从最简单的实现开始。</p>
<h3 id="41-初始化-mediapipe-hands">4.1 初始化 MediaPipe Hands</h3>
<p>MediaPipe 0.10.x 版本引入了新的 Tasks API，使用方式更加简洁：</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">cv2</span>
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">mediapipe</span> <span class="k">as</span> <span class="nn">mp</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">mediapipe.tasks</span> <span class="kn">import</span> <span class="n">python</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">mediapipe.tasks.python</span> <span class="kn">import</span> <span class="n">vision</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 配置 Hands 选项</span>
</span></span><span class="line"><span class="cl"><span class="n">base_options</span> <span class="o">=</span> <span class="n">python</span><span class="o">.</span><span class="n">BaseOptions</span><span class="p">(</span><span class="n">model_asset_path</span><span class="o">=</span><span class="s1">&#39;hand_landmarker.task&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">options</span> <span class="o">=</span> <span class="n">vision</span><span class="o">.</span><span class="n">HandLandmarkerOptions</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="n">base_options</span><span class="o">=</span><span class="n">base_options</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">num_hands</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>  <span class="c1"># 最多检测 2 只手</span>
</span></span><span class="line"><span class="cl">    <span class="n">min_hand_detection_confidence</span><span class="o">=</span><span class="mf">0.5</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">min_hand_presence_confidence</span><span class="o">=</span><span class="mf">0.5</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">min_tracking_confidence</span><span class="o">=</span><span class="mf">0.5</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 创建检测器</span>
</span></span><span class="line"><span class="cl"><span class="n">detector</span> <span class="o">=</span> <span class="n">vision</span><span class="o">.</span><span class="n">HandLandmarker</span><span class="o">.</span><span class="n">create_from_options</span><span class="p">(</span><span class="n">options</span><span class="p">)</span>
</span></span></code></pre></div><p>如果你使用旧版本的 MediaPipe（0.9.x），API 略有不同：</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">mp_hands</span> <span class="o">=</span> <span class="n">mp</span><span class="o">.</span><span class="n">solutions</span><span class="o">.</span><span class="n">hands</span>
</span></span><span class="line"><span class="cl"><span class="n">hands</span> <span class="o">=</span> <span class="n">mp_hands</span><span class="o">.</span><span class="n">Hands</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="n">static_image_mode</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">max_num_hands</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">min_detection_confidence</span><span class="o">=</span><span class="mf">0.5</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">min_tracking_confidence</span><span class="o">=</span><span class="mf">0.5</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span>
</span></span></code></pre></div><p>本文主要基于新的 Tasks API，但核心算法在两个版本中是通用的。</p>
<h3 id="42-单帧处理完整流程">4.2 单帧处理完整流程</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">process_frame</span><span class="p">(</span><span class="n">frame</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># 转换颜色空间：BGR -&gt; RGB</span>
</span></span><span class="line"><span class="cl">    <span class="n">rgb_frame</span> <span class="o">=</span> <span class="n">cv2</span><span class="o">.</span><span class="n">cvtColor</span><span class="p">(</span><span class="n">frame</span><span class="p">,</span> <span class="n">cv2</span><span class="o">.</span><span class="n">COLOR_BGR2RGB</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1"># 创建 MediaPipe 图像对象</span>
</span></span><span class="line"><span class="cl">    <span class="n">mp_image</span> <span class="o">=</span> <span class="n">mp</span><span class="o">.</span><span class="n">Image</span><span class="p">(</span><span class="n">image_format</span><span class="o">=</span><span class="n">mp</span><span class="o">.</span><span class="n">ImageFormat</span><span class="o">.</span><span class="n">SRGB</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">rgb_frame</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1"># 运行检测</span>
</span></span><span class="line"><span class="cl">    <span class="n">detection_result</span> <span class="o">=</span> <span class="n">detector</span><span class="o">.</span><span class="n">detect</span><span class="p">(</span><span class="n">mp_image</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1"># 处理结果</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">detection_result</span><span class="o">.</span><span class="n">hand_landmarks</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">for</span> <span class="n">hand_idx</span><span class="p">,</span> <span class="n">landmarks</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">detection_result</span><span class="o">.</span><span class="n">hand_landmarks</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">            <span class="c1"># landmarks 包含 21 个关键点</span>
</span></span><span class="line"><span class="cl">            <span class="n">handedness</span> <span class="o">=</span> <span class="n">detection_result</span><span class="o">.</span><span class="n">handedness</span><span class="p">[</span><span class="n">hand_idx</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">category_name</span>
</span></span><span class="line"><span class="cl">            <span class="n">confidence</span> <span class="o">=</span> <span class="n">detection_result</span><span class="o">.</span><span class="n">handedness</span><span class="p">[</span><span class="n">hand_idx</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">score</span>
</span></span><span class="line"><span class="cl">            
</span></span><span class="line"><span class="cl">            <span class="c1"># 处理每只手的关键点</span>
</span></span><span class="line"><span class="cl">            <span class="n">process_single_hand</span><span class="p">(</span><span class="n">landmarks</span><span class="p">,</span> <span class="n">handedness</span><span class="p">,</span> <span class="n">frame</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">frame</span>
</span></span></code></pre></div><p>这里有一个非常重要的细节：MediaPipe 使用 RGB 颜色空间，而 OpenCV 默认输出 BGR 格式。忘记转换颜色空间是新手最常犯的错误，会导致检测率急剧下降。</p>
<h3 id="43-关键点坐标转换">4.3 关键点坐标转换</h3>
<p>MediaPipe 返回的是归一化坐标（0-1），需要转换为图像的实际像素坐标：</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">get_landmark_coords</span><span class="p">(</span><span class="n">landmark</span><span class="p">,</span> <span class="n">frame_shape</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="n">h</span><span class="p">,</span> <span class="n">w</span> <span class="o">=</span> <span class="n">frame_shape</span><span class="p">[:</span><span class="mi">2</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="s1">&#39;x&#39;</span><span class="p">:</span> <span class="nb">int</span><span class="p">(</span><span class="n">landmark</span><span class="o">.</span><span class="n">x</span> <span class="o">*</span> <span class="n">w</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">        <span class="s1">&#39;y&#39;</span><span class="p">:</span> <span class="nb">int</span><span class="p">(</span><span class="n">landmark</span><span class="o">.</span><span class="n">y</span> <span class="o">*</span> <span class="n">h</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">        <span class="s1">&#39;z&#39;</span><span class="p">:</span> <span class="n">landmark</span><span class="o">.</span><span class="n">z</span>  <span class="c1"># z 保持归一化</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span></code></pre></div><p><code>z</code> 坐标的值表示关键点相对于手腕的深度。<code>z</code> 值越小表示离相机越近，这个值在计算手指相对位置时很有用。</p>
<h3 id="44-在图像上绘制关键点">4.4 在图像上绘制关键点</h3>
<p>MediaPipe 提供了内置的绘制工具：</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">mp_drawing</span> <span class="o">=</span> <span class="n">mp</span><span class="o">.</span><span class="n">solutions</span><span class="o">.</span><span class="n">drawing_utils</span>
</span></span><span class="line"><span class="cl"><span class="n">mp_drawing_styles</span> <span class="o">=</span> <span class="n">mp</span><span class="o">.</span><span class="n">solutions</span><span class="o">.</span><span class="n">drawing_styles</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">draw_landmarks</span><span class="p">(</span><span class="n">frame</span><span class="p">,</span> <span class="n">landmarks</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="n">mp_drawing</span><span class="o">.</span><span class="n">draw_landmarks</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="n">frame</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="n">landmarks</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="n">mp_hands</span><span class="o">.</span><span class="n">HAND_CONNECTIONS</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="n">mp_drawing_styles</span><span class="o">.</span><span class="n">get_default_hand_landmarks_style</span><span class="p">(),</span>
</span></span><span class="line"><span class="cl">        <span class="n">mp_drawing_styles</span><span class="o">.</span><span class="n">get_default_hand_connections_style</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="p">)</span>
</span></span></code></pre></div><p>这会在图像上绘制出 21 个关键点以及连接它们的线条，形成一个完整的手部骨骼结构图。</p>
<h2 id="五静态手势识别算法">五、静态手势识别算法</h2>
<p>现在我们进入核心部分：如何根据 21 个关键点判断手势。静态手势指的是手部保持固定姿态的情况，比如伸出几根手指、比 OK 手势等。</p>
<h3 id="51-手指伸直判断算法">5.1 手指伸直判断算法</h3>
<p>判断一根手指是否伸直，是所有手势识别的基础。这里的关键是理解手指的几何结构。</p>
<p>以食指为例，关键点编号是 5（MCP 关节）、6（PIP 关节）、7（DIP 关节）、8（指尖）。</p>
<p>判断逻辑：</p>
<ul>
<li>如果指尖到手腕的距离 &gt; PIP 关节到手腕的距离，则手指伸直</li>
<li>否则，手指弯曲</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_finger_extended</span><span class="p">(</span><span class="n">landmarks</span><span class="p">,</span> <span class="n">finger_tip_id</span><span class="p">,</span> <span class="n">finger_pip_id</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;判断手指是否伸直&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># 手腕</span>
</span></span><span class="line"><span class="cl">    <span class="n">wrist</span> <span class="o">=</span> <span class="n">landmarks</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1"># 指尖到手腕的距离</span>
</span></span><span class="line"><span class="cl">    <span class="n">tip_distance</span> <span class="o">=</span> <span class="p">(</span><span class="n">landmarks</span><span class="p">[</span><span class="n">finger_tip_id</span><span class="p">]</span><span class="o">.</span><span class="n">x</span> <span class="o">-</span> <span class="n">wrist</span><span class="o">.</span><span class="n">x</span><span class="p">)</span> <span class="o">**</span> <span class="mi">2</span> <span class="o">+</span> \
</span></span><span class="line"><span class="cl">                   <span class="p">(</span><span class="n">landmarks</span><span class="p">[</span><span class="n">finger_tip_id</span><span class="p">]</span><span class="o">.</span><span class="n">y</span> <span class="o">-</span> <span class="n">wrist</span><span class="o">.</span><span class="n">y</span><span class="p">)</span> <span class="o">**</span> <span class="mi">2</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1"># PIP 关节到手腕的距离</span>
</span></span><span class="line"><span class="cl">    <span class="n">pip_distance</span> <span class="o">=</span> <span class="p">(</span><span class="n">landmarks</span><span class="p">[</span><span class="n">finger_pip_id</span><span class="p">]</span><span class="o">.</span><span class="n">x</span> <span class="o">-</span> <span class="n">wrist</span><span class="o">.</span><span class="n">x</span><span class="p">)</span> <span class="o">**</span> <span class="mi">2</span> <span class="o">+</span> \
</span></span><span class="line"><span class="cl">                   <span class="p">(</span><span class="n">landmarks</span><span class="p">[</span><span class="n">finger_pip_id</span><span class="p">]</span><span class="o">.</span><span class="n">y</span> <span class="o">-</span> <span class="n">wrist</span><span class="o">.</span><span class="n">y</span><span class="p">)</span> <span class="o">**</span> <span class="mi">2</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">tip_distance</span> <span class="o">&gt;</span> <span class="n">pip_distance</span>
</span></span></code></pre></div><p>这个简单的算法在大多数情况下工作得很好，但拇指是个例外。</p>
<h3 id="52-拇指的特殊性">5.2 拇指的特殊性</h3>
<p>拇指的运动方式与其他四根手指不同，它主要做内收和外展运动，而不是弯曲和伸直。</p>
<p>拇指的关键点编号：1（CMC）、2（MCP）、3（IP）、4（指尖）。</p>
<p>判断拇指是否伸出需要用不同的方法：</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_thumb_extended</span><span class="p">(</span><span class="n">landmarks</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;判断拇指是否伸出&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># 拇指指尖和食指根部的距离</span>
</span></span><span class="line"><span class="cl">    <span class="n">thumb_tip</span> <span class="o">=</span> <span class="n">landmarks</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">    <span class="n">index_mcp</span> <span class="o">=</span> <span class="n">landmarks</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1"># 计算距离</span>
</span></span><span class="line"><span class="cl">    <span class="n">distance</span> <span class="o">=</span> <span class="p">((</span><span class="n">thumb_tip</span><span class="o">.</span><span class="n">x</span> <span class="o">-</span> <span class="n">index_mcp</span><span class="o">.</span><span class="n">x</span><span class="p">)</span> <span class="o">**</span> <span class="mi">2</span> <span class="o">+</span> 
</span></span><span class="line"><span class="cl">                <span class="p">(</span><span class="n">thumb_tip</span><span class="o">.</span><span class="n">y</span> <span class="o">-</span> <span class="n">index_mcp</span><span class="o">.</span><span class="n">y</span><span class="p">)</span> <span class="o">**</span> <span class="mi">2</span><span class="p">)</span> <span class="o">**</span> <span class="mf">0.5</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1"># 阈值需要根据实际情况调整</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">distance</span> <span class="o">&gt;</span> <span class="mf">0.1</span>
</span></span></code></pre></div><p>拇指的判断阈值（0.1）是一个经验值，在不同的应用场景下可能需要调整。</p>
<h3 id="53-完整的手指状态检测">5.3 完整的手指状态检测</h3>
<p>把这些组合起来，我们就可以得到每根手指的状态：</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">get_finger_states</span><span class="p">(</span><span class="n">landmarks</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;获取所有手指的状态：True 表示伸直，False 表示弯曲&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl">        <span class="n">is_thumb_extended</span><span class="p">(</span><span class="n">landmarks</span><span class="p">),</span>      <span class="c1"># 拇指</span>
</span></span><span class="line"><span class="cl">        <span class="n">is_finger_extended</span><span class="p">(</span><span class="n">landmarks</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">6</span><span class="p">),</span>   <span class="c1"># 食指</span>
</span></span><span class="line"><span class="cl">        <span class="n">is_finger_extended</span><span class="p">(</span><span class="n">landmarks</span><span class="p">,</span> <span class="mi">12</span><span class="p">,</span> <span class="mi">10</span><span class="p">),</span> <span class="c1"># 中指</span>
</span></span><span class="line"><span class="cl">        <span class="n">is_finger_extended</span><span class="p">(</span><span class="n">landmarks</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">14</span><span class="p">),</span> <span class="c1"># 无名指</span>
</span></span><span class="line"><span class="cl">        <span class="n">is_finger_extended</span><span class="p">(</span><span class="n">landmarks</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="mi">18</span><span class="p">)</span>  <span class="c1"># 小指</span>
</span></span><span class="line"><span class="cl">    <span class="p">]</span>
</span></span></code></pre></div><p>返回的是一个包含 5 个布尔值的列表，分别对应拇指到小指的状态。</p>
<h3 id="54-数字手势识别">5.4 数字手势识别</h3>
<p>有了手指状态，识别数字 0 到 5 就变得非常简单：</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">recognize_number_gesture</span><span class="p">(</span><span class="n">finger_states</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;识别数字手势 0-5&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">thumb</span><span class="p">,</span> <span class="n">index</span><span class="p">,</span> <span class="n">middle</span><span class="p">,</span> <span class="n">ring</span><span class="p">,</span> <span class="n">pinky</span> <span class="o">=</span> <span class="n">finger_states</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="n">extended_count</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">finger_states</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">extended_count</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="mi">0</span>  <span class="c1"># 拳头</span>
</span></span><span class="line"><span class="cl">    <span class="k">elif</span> <span class="n">extended_count</span> <span class="o">==</span> <span class="mi">1</span> <span class="ow">and</span> <span class="n">index</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">middle</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="mi">1</span>  <span class="c1"># 只伸出食指</span>
</span></span><span class="line"><span class="cl">    <span class="k">elif</span> <span class="n">extended_count</span> <span class="o">==</span> <span class="mi">2</span> <span class="ow">and</span> <span class="n">index</span> <span class="ow">and</span> <span class="n">middle</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="mi">2</span>  <span class="c1"># 食指和中指</span>
</span></span><span class="line"><span class="cl">    <span class="k">elif</span> <span class="n">extended_count</span> <span class="o">==</span> <span class="mi">3</span> <span class="ow">and</span> <span class="n">index</span> <span class="ow">and</span> <span class="n">middle</span> <span class="ow">and</span> <span class="n">ring</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="mi">3</span>  <span class="c1"># 食指、中指、无名指</span>
</span></span><span class="line"><span class="cl">    <span class="k">elif</span> <span class="n">extended_count</span> <span class="o">==</span> <span class="mi">4</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">thumb</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="mi">4</span>  <span class="c1"># 除了拇指都伸出</span>
</span></span><span class="line"><span class="cl">    <span class="k">elif</span> <span class="n">extended_count</span> <span class="o">==</span> <span class="mi">5</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="mi">5</span>  <span class="c1"># 全部伸出</span>
</span></span><span class="line"><span class="cl">    <span class="k">else</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">None</span>  <span class="c1"># 无法识别</span>
</span></span></code></pre></div><p>这个简单的算法可以达到 95% 以上的识别准确率，而且实时性非常好。</p>
<h3 id="55-ok-手势识别">5.5 OK 手势识别</h3>
<p>OK 手势的特征是拇指和食指指尖接触，其他三根手指伸直：</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_ok_gesture</span><span class="p">(</span><span class="n">landmarks</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;判断是否是 OK 手势&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># 拇指指尖和食指指尖的距离</span>
</span></span><span class="line"><span class="cl">    <span class="n">thumb_tip</span> <span class="o">=</span> <span class="n">landmarks</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">    <span class="n">index_tip</span> <span class="o">=</span> <span class="n">landmarks</span><span class="p">[</span><span class="mi">8</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="n">distance</span> <span class="o">=</span> <span class="p">((</span><span class="n">thumb_tip</span><span class="o">.</span><span class="n">x</span> <span class="o">-</span> <span class="n">index_tip</span><span class="o">.</span><span class="n">x</span><span class="p">)</span> <span class="o">**</span> <span class="mi">2</span> <span class="o">+</span> 
</span></span><span class="line"><span class="cl">                <span class="p">(</span><span class="n">thumb_tip</span><span class="o">.</span><span class="n">y</span> <span class="o">-</span> <span class="n">index_tip</span><span class="o">.</span><span class="n">y</span><span class="p">)</span> <span class="o">**</span> <span class="mi">2</span><span class="p">)</span> <span class="o">**</span> <span class="mf">0.5</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1"># 中、无、小三指应该伸直</span>
</span></span><span class="line"><span class="cl">    <span class="n">middle_extended</span> <span class="o">=</span> <span class="n">is_finger_extended</span><span class="p">(</span><span class="n">landmarks</span><span class="p">,</span> <span class="mi">12</span><span class="p">,</span> <span class="mi">10</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">ring_extended</span> <span class="o">=</span> <span class="n">is_finger_extended</span><span class="p">(</span><span class="n">landmarks</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">14</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">pinky_extended</span> <span class="o">=</span> <span class="n">is_finger_extended</span><span class="p">(</span><span class="n">landmarks</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="mi">18</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">distance</span> <span class="o">&lt;</span> <span class="mf">0.05</span> <span class="ow">and</span> <span class="n">middle_extended</span> <span class="ow">and</span> <span class="n">ring_extended</span> <span class="ow">and</span> <span class="n">pinky_extended</span>
</span></span></code></pre></div><p>距离阈值 0.05 是一个经验值，根据摄像头分辨率和距离的不同，可能需要调整。</p>
<p>（第二部分完，约2300字）</p>
<h2 id="六动态手势追踪">六、动态手势追踪</h2>
<p>静态手势只能表示固定的状态，要实现更丰富的交互，就需要追踪手部的运动轨迹，识别动态手势。</p>
<h3 id="61-轨迹记录与平滑">6.1 轨迹记录与平滑</h3>
<p>要识别挥手、划动等动态手势，首先需要记录手部的运动轨迹：</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">HandTracker</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">history_length</span><span class="o">=</span><span class="mi">30</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">history</span> <span class="o">=</span> <span class="p">[]</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">history_length</span> <span class="o">=</span> <span class="n">history_length</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">update</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">landmarks</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="s2">&#34;&#34;&#34;更新轨迹，使用手腕位置作为参考点&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">        <span class="n">wrist</span> <span class="o">=</span> <span class="n">landmarks</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">        <span class="n">position</span> <span class="o">=</span> <span class="p">(</span><span class="n">wrist</span><span class="o">.</span><span class="n">x</span><span class="p">,</span> <span class="n">wrist</span><span class="o">.</span><span class="n">y</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">history</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">position</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">        <span class="c1"># 保持历史长度</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">history</span><span class="p">)</span> <span class="o">&gt;</span> <span class="bp">self</span><span class="o">.</span><span class="n">history_length</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="bp">self</span><span class="o">.</span><span class="n">history</span><span class="o">.</span><span class="n">pop</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">get_movement_vector</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="s2">&#34;&#34;&#34;计算最近一段时间的运动向量&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">history</span><span class="p">)</span> <span class="o">&lt;</span> <span class="mi">10</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="kc">None</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">        <span class="c1"># 取最近 10 帧的平均位移</span>
</span></span><span class="line"><span class="cl">        <span class="n">start_x</span><span class="p">,</span> <span class="n">start_y</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">history</span><span class="p">[</span><span class="o">-</span><span class="mi">10</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">        <span class="n">end_x</span><span class="p">,</span> <span class="n">end_y</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">history</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="p">(</span><span class="n">end_x</span> <span class="o">-</span> <span class="n">start_x</span><span class="p">,</span> <span class="n">end_y</span> <span class="o">-</span> <span class="n">start_y</span><span class="p">)</span>
</span></span></code></pre></div><p>原始轨迹通常会有一些抖动，我们可以用移动平均进行平滑：</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">smooth_trajectory</span><span class="p">(</span><span class="n">trajectory</span><span class="p">,</span> <span class="n">window_size</span><span class="o">=</span><span class="mi">5</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;移动平均平滑&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">trajectory</span><span class="p">)</span> <span class="o">&lt;</span> <span class="n">window_size</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">trajectory</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="n">smoothed</span> <span class="o">=</span> <span class="p">[]</span>
</span></span><span class="line"><span class="cl">    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">trajectory</span><span class="p">)):</span>
</span></span><span class="line"><span class="cl">        <span class="n">start</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">i</span> <span class="o">-</span> <span class="n">window_size</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="n">window</span> <span class="o">=</span> <span class="n">trajectory</span><span class="p">[</span><span class="n">start</span><span class="p">:</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">        <span class="n">avg_x</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">p</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">window</span><span class="p">)</span> <span class="o">/</span> <span class="nb">len</span><span class="p">(</span><span class="n">window</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="n">avg_y</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">p</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">window</span><span class="p">)</span> <span class="o">/</span> <span class="nb">len</span><span class="p">(</span><span class="n">window</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">        <span class="n">smoothed</span><span class="o">.</span><span class="n">append</span><span class="p">((</span><span class="n">avg_x</span><span class="p">,</span> <span class="n">avg_y</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">smoothed</span>
</span></span></code></pre></div><h3 id="62-挥手手势识别">6.2 挥手手势识别</h3>
<p>挥手是最常见的动态手势之一，识别逻辑：</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">is_waving_gesture</span><span class="p">(</span><span class="n">tracker</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;判断是否是挥手手势&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">movement</span> <span class="o">=</span> <span class="n">tracker</span><span class="o">.</span><span class="n">get_movement_vector</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="ow">not</span> <span class="n">movement</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">False</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="n">dx</span><span class="p">,</span> <span class="n">dy</span> <span class="o">=</span> <span class="n">movement</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1"># 挥手的主要特征是水平方向的大幅运动</span>
</span></span><span class="line"><span class="cl">    <span class="n">horizontal_movement</span> <span class="o">=</span> <span class="nb">abs</span><span class="p">(</span><span class="n">dx</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">vertical_movement</span> <span class="o">=</span> <span class="nb">abs</span><span class="p">(</span><span class="n">dy</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="c1"># 水平位移应该明显大于垂直位移</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">horizontal_movement</span> <span class="o">&gt;</span> <span class="mf">0.1</span> <span class="ow">and</span> <span class="n">horizontal_movement</span> <span class="o">&gt;</span> <span class="n">vertical_movement</span> <span class="o">*</span> <span class="mi">2</span>
</span></span></code></pre></div><p>更复杂的挥手识别可以检测左右摆动的次数，区分&quot;挥手&quot;和&quot;摆手&quot;两种不同的含义。</p>
<h3 id="63-划动手势与方向判断">6.3 划动手势与方向判断</h3>
<p>划动手势可以用来控制界面（比如翻页、切换选项）：</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">get_swipe_direction</span><span class="p">(</span><span class="n">tracker</span><span class="p">,</span> <span class="n">threshold</span><span class="o">=</span><span class="mf">0.08</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;&#34;&#34;判断划动方向&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="n">movement</span> <span class="o">=</span> <span class="n">tracker</span><span class="o">.</span><span class="n">get_movement_vector</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="ow">not</span> <span class="n">movement</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="kc">None</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="n">dx</span><span class="p">,</span> <span class="n">dy</span> <span class="o">=</span> <span class="n">movement</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="nb">abs</span><span class="p">(</span><span class="n">dx</span><span class="p">)</span> <span class="o">&gt;</span> <span class="nb">abs</span><span class="p">(</span><span class="n">dy</span><span class="p">)</span> <span class="ow">and</span> <span class="nb">abs</span><span class="p">(</span><span class="n">dx</span><span class="p">)</span> <span class="o">&gt;</span> <span class="n">threshold</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="s2">&#34;left&#34;</span> <span class="k">if</span> <span class="n">dx</span> <span class="o">&lt;</span> <span class="mi">0</span> <span class="k">else</span> <span class="s2">&#34;right&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">elif</span> <span class="nb">abs</span><span class="p">(</span><span class="n">dy</span><span class="p">)</span> <span class="o">&gt;</span> <span class="nb">abs</span><span class="p">(</span><span class="n">dx</span><span class="p">)</span> <span class="ow">and</span> <span class="nb">abs</span><span class="p">(</span><span class="n">dy</span><span class="p">)</span> <span class="o">&gt;</span> <span class="n">threshold</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="s2">&#34;up&#34;</span> <span class="k">if</span> <span class="n">dy</span> <span class="o">&lt;</span> <span class="mi">0</span> <span class="k">else</span> <span class="s2">&#34;down&#34;</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="kc">None</span>
</span></span></code></pre></div><p>这个简单的算法就可以实现像滑动解锁一样的手势控制。</p>
<h2 id="七实时性能优化">七、实时性能优化</h2>
<p>在实际应用中，帧率和延迟是决定用户体验的关键因素。让我们来看看如何把性能优化到极致。</p>
<h3 id="71-降低输入分辨率">7.1 降低输入分辨率</h3>
<p>这是最简单也是效果最明显的优化：</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">process_frame</span><span class="p">(</span><span class="n">frame</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># 缩放到 640x480 处理，检测精度下降很小</span>
</span></span><span class="line"><span class="cl">    <span class="n">small_frame</span> <span class="o">=</span> <span class="n">cv2</span><span class="o">.</span><span class="n">resize</span><span class="p">(</span><span class="n">frame</span><span class="p">,</span> <span class="p">(</span><span class="mi">640</span><span class="p">,</span> <span class="mi">480</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># 处理 small_frame</span>
</span></span></code></pre></div><p>大多数情况下，把图像缩放到 640x480 甚至 320x240，MediaPipe 仍然能正常工作，但处理速度可以提升 2-4 倍。</p>
<h3 id="72-跳帧处理">7.2 跳帧处理</h3>
<p>对于 30 FPS 的视频流，每 2 帧处理一次，用户几乎感觉不到差别，但计算量减少了一半：</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">frame_count</span> <span class="o">=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl"><span class="n">process_every_n_frames</span> <span class="o">=</span> <span class="mi">2</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="n">ret</span><span class="p">,</span> <span class="n">frame</span> <span class="o">=</span> <span class="n">cap</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="n">frame_count</span> <span class="o">%</span> <span class="n">process_every_n_frames</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="c1"># 运行检测</span>
</span></span><span class="line"><span class="cl">        <span class="n">detection_result</span> <span class="o">=</span> <span class="n">detector</span><span class="o">.</span><span class="n">detect</span><span class="p">(</span><span class="n">mp_image</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="k">else</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="c1"># 使用上一帧的结果进行插值显示</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="n">frame_count</span> <span class="o">+=</span> <span class="mi">1</span>
</span></span></code></pre></div><h3 id="73-roi-裁剪">7.3 ROI 裁剪</h3>
<p>如果只需要检测画面中心区域的手势，可以只裁剪感兴趣区域进行处理：</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">h</span><span class="p">,</span> <span class="n">w</span> <span class="o">=</span> <span class="n">frame</span><span class="o">.</span><span class="n">shape</span><span class="p">[:</span><span class="mi">2</span><span class="p">]</span>
</span></span><span class="line"><span class="cl"><span class="c1"># 裁剪中心 50% 的区域</span>
</span></span><span class="line"><span class="cl"><span class="n">roi_x1</span><span class="p">,</span> <span class="n">roi_y1</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">w</span> <span class="o">*</span> <span class="mf">0.25</span><span class="p">),</span> <span class="nb">int</span><span class="p">(</span><span class="n">h</span> <span class="o">*</span> <span class="mf">0.25</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">roi_x2</span><span class="p">,</span> <span class="n">roi_y2</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">w</span> <span class="o">*</span> <span class="mf">0.75</span><span class="p">),</span> <span class="nb">int</span><span class="p">(</span><span class="n">h</span> <span class="o">*</span> <span class="mf">0.75</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">roi</span> <span class="o">=</span> <span class="n">frame</span><span class="p">[</span><span class="n">roi_y1</span><span class="p">:</span><span class="n">roi_y2</span><span class="p">,</span> <span class="n">roi_x1</span><span class="p">:</span><span class="n">roi_x2</span><span class="p">]</span>
</span></span></code></pre></div><p>这同样可以把处理速度提升 4 倍。</p>
<h3 id="74-多线程处理">7.4 多线程处理</h3>
<p>在 Python 中，可以使用多线程把检测和显示分离：</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">threading</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">queue</span> <span class="kn">import</span> <span class="n">Queue</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">frame_queue</span> <span class="o">=</span> <span class="n">Queue</span><span class="p">(</span><span class="n">maxsize</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">result_queue</span> <span class="o">=</span> <span class="n">Queue</span><span class="p">(</span><span class="n">maxsize</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">detection_worker</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="k">while</span> <span class="kc">True</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="n">frame</span> <span class="o">=</span> <span class="n">frame_queue</span><span class="o">.</span><span class="n">get</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">        <span class="n">result</span> <span class="o">=</span> <span class="n">detector</span><span class="o">.</span><span class="n">detect</span><span class="p">(</span><span class="n">frame</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="n">result_queue</span><span class="o">.</span><span class="n">put</span><span class="p">(</span><span class="n">result</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 启动检测线程</span>
</span></span><span class="line"><span class="cl"><span class="n">threading</span><span class="o">.</span><span class="n">Thread</span><span class="p">(</span><span class="n">target</span><span class="o">=</span><span class="n">detection_worker</span><span class="p">,</span> <span class="n">daemon</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span><span class="o">.</span><span class="n">start</span><span class="p">()</span>
</span></span></code></pre></div><p>这样即使检测线程阻塞，主线程的显示也不会卡顿。</p>
<h2 id="八完整代码实现">八、完整代码实现</h2>
<p>现在把所有内容整合起来，提供一个可以直接运行的完整版本。</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">cv2</span>
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">mediapipe</span> <span class="k">as</span> <span class="nn">mp</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">collections</span> <span class="kn">import</span> <span class="n">deque</span>
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">time</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">HandGestureRecognizer</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="c1"># 初始化 MediaPipe</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">mp_hands</span> <span class="o">=</span> <span class="n">mp</span><span class="o">.</span><span class="n">solutions</span><span class="o">.</span><span class="n">hands</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">mp_drawing</span> <span class="o">=</span> <span class="n">mp</span><span class="o">.</span><span class="n">solutions</span><span class="o">.</span><span class="n">drawing_utils</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">hands</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">mp_hands</span><span class="o">.</span><span class="n">Hands</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">            <span class="n">static_image_mode</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">            <span class="n">max_num_hands</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">            <span class="n">min_detection_confidence</span><span class="o">=</span><span class="mf">0.5</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">            <span class="n">min_tracking_confidence</span><span class="o">=</span><span class="mf">0.5</span>
</span></span><span class="line"><span class="cl">        <span class="p">)</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">        <span class="c1"># 轨迹跟踪</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">trajectory</span> <span class="o">=</span> <span class="n">deque</span><span class="p">(</span><span class="n">maxlen</span><span class="o">=</span><span class="mi">30</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">        <span class="c1"># FPS 计算</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">fps</span> <span class="o">=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">prev_time</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">frame_count</span> <span class="o">=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">is_finger_extended</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">landmarks</span><span class="p">,</span> <span class="n">tip_id</span><span class="p">,</span> <span class="n">pip_id</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="n">wrist</span> <span class="o">=</span> <span class="n">landmarks</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">        <span class="n">tip_dist</span> <span class="o">=</span> <span class="p">(</span><span class="n">landmarks</span><span class="p">[</span><span class="n">tip_id</span><span class="p">]</span><span class="o">.</span><span class="n">x</span> <span class="o">-</span> <span class="n">wrist</span><span class="o">.</span><span class="n">x</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> \
</span></span><span class="line"><span class="cl">                   <span class="p">(</span><span class="n">landmarks</span><span class="p">[</span><span class="n">tip_id</span><span class="p">]</span><span class="o">.</span><span class="n">y</span> <span class="o">-</span> <span class="n">wrist</span><span class="o">.</span><span class="n">y</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span>
</span></span><span class="line"><span class="cl">        <span class="n">pip_dist</span> <span class="o">=</span> <span class="p">(</span><span class="n">landmarks</span><span class="p">[</span><span class="n">pip_id</span><span class="p">]</span><span class="o">.</span><span class="n">x</span> <span class="o">-</span> <span class="n">wrist</span><span class="o">.</span><span class="n">x</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> \
</span></span><span class="line"><span class="cl">                   <span class="p">(</span><span class="n">landmarks</span><span class="p">[</span><span class="n">pip_id</span><span class="p">]</span><span class="o">.</span><span class="n">y</span> <span class="o">-</span> <span class="n">wrist</span><span class="o">.</span><span class="n">y</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">tip_dist</span> <span class="o">&gt;</span> <span class="n">pip_dist</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">is_thumb_extended</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">landmarks</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="n">thumb_tip</span> <span class="o">=</span> <span class="n">landmarks</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">        <span class="n">index_mcp</span> <span class="o">=</span> <span class="n">landmarks</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">        <span class="n">distance</span> <span class="o">=</span> <span class="p">((</span><span class="n">thumb_tip</span><span class="o">.</span><span class="n">x</span> <span class="o">-</span> <span class="n">index_mcp</span><span class="o">.</span><span class="n">x</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> 
</span></span><span class="line"><span class="cl">                   <span class="p">(</span><span class="n">thumb_tip</span><span class="o">.</span><span class="n">y</span> <span class="o">-</span> <span class="n">index_mcp</span><span class="o">.</span><span class="n">y</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span><span class="o">**</span><span class="mf">0.5</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">distance</span> <span class="o">&gt;</span> <span class="mf">0.1</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">get_finger_states</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">landmarks</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="p">[</span>
</span></span><span class="line"><span class="cl">            <span class="bp">self</span><span class="o">.</span><span class="n">is_thumb_extended</span><span class="p">(</span><span class="n">landmarks</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">            <span class="bp">self</span><span class="o">.</span><span class="n">is_finger_extended</span><span class="p">(</span><span class="n">landmarks</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">6</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">            <span class="bp">self</span><span class="o">.</span><span class="n">is_finger_extended</span><span class="p">(</span><span class="n">landmarks</span><span class="p">,</span> <span class="mi">12</span><span class="p">,</span> <span class="mi">10</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">            <span class="bp">self</span><span class="o">.</span><span class="n">is_finger_extended</span><span class="p">(</span><span class="n">landmarks</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">14</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">            <span class="bp">self</span><span class="o">.</span><span class="n">is_finger_extended</span><span class="p">(</span><span class="n">landmarks</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="mi">18</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="p">]</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">recognize_gesture</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">landmarks</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="n">finger_states</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_finger_states</span><span class="p">(</span><span class="n">landmarks</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="n">thumb</span><span class="p">,</span> <span class="n">index</span><span class="p">,</span> <span class="n">middle</span><span class="p">,</span> <span class="n">ring</span><span class="p">,</span> <span class="n">pinky</span> <span class="o">=</span> <span class="n">finger_states</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">        <span class="c1"># 数字手势</span>
</span></span><span class="line"><span class="cl">        <span class="n">extended_count</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">finger_states</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="n">extended_count</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="s2">&#34;0 (Fist)&#34;</span>
</span></span><span class="line"><span class="cl">        <span class="k">elif</span> <span class="n">extended_count</span> <span class="o">==</span> <span class="mi">1</span> <span class="ow">and</span> <span class="n">index</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">middle</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="s2">&#34;1&#34;</span>
</span></span><span class="line"><span class="cl">        <span class="k">elif</span> <span class="n">extended_count</span> <span class="o">==</span> <span class="mi">2</span> <span class="ow">and</span> <span class="n">index</span> <span class="ow">and</span> <span class="n">middle</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="s2">&#34;2&#34;</span>
</span></span><span class="line"><span class="cl">        <span class="k">elif</span> <span class="n">extended_count</span> <span class="o">==</span> <span class="mi">3</span> <span class="ow">and</span> <span class="n">index</span> <span class="ow">and</span> <span class="n">middle</span> <span class="ow">and</span> <span class="n">ring</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="s2">&#34;3&#34;</span>
</span></span><span class="line"><span class="cl">        <span class="k">elif</span> <span class="n">extended_count</span> <span class="o">==</span> <span class="mi">4</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">thumb</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="s2">&#34;4&#34;</span>
</span></span><span class="line"><span class="cl">        <span class="k">elif</span> <span class="n">extended_count</span> <span class="o">==</span> <span class="mi">5</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="s2">&#34;5&#34;</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">        <span class="c1"># OK 手势</span>
</span></span><span class="line"><span class="cl">        <span class="n">thumb_tip</span> <span class="o">=</span> <span class="n">landmarks</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">        <span class="n">index_tip</span> <span class="o">=</span> <span class="n">landmarks</span><span class="p">[</span><span class="mi">8</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">        <span class="n">distance</span> <span class="o">=</span> <span class="p">((</span><span class="n">thumb_tip</span><span class="o">.</span><span class="n">x</span> <span class="o">-</span> <span class="n">index_tip</span><span class="o">.</span><span class="n">x</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> 
</span></span><span class="line"><span class="cl">                   <span class="p">(</span><span class="n">thumb_tip</span><span class="o">.</span><span class="n">y</span> <span class="o">-</span> <span class="n">index_tip</span><span class="o">.</span><span class="n">y</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span><span class="o">**</span><span class="mf">0.5</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="n">distance</span> <span class="o">&lt;</span> <span class="mf">0.05</span> <span class="ow">and</span> <span class="n">middle</span> <span class="ow">and</span> <span class="n">ring</span> <span class="ow">and</span> <span class="n">pinky</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="s2">&#34;OK&#34;</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">        <span class="c1"># 点赞手势</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="n">thumb</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">index</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">middle</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">ring</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">pinky</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="k">return</span> <span class="s2">&#34;Like&#34;</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="s2">&#34;Unknown&#34;</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">process_frame</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">frame</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="c1"># 缩小处理提高性能</span>
</span></span><span class="line"><span class="cl">        <span class="n">small_frame</span> <span class="o">=</span> <span class="n">cv2</span><span class="o">.</span><span class="n">resize</span><span class="p">(</span><span class="n">frame</span><span class="p">,</span> <span class="p">(</span><span class="mi">640</span><span class="p">,</span> <span class="mi">480</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">        <span class="n">rgb_frame</span> <span class="o">=</span> <span class="n">cv2</span><span class="o">.</span><span class="n">cvtColor</span><span class="p">(</span><span class="n">small_frame</span><span class="p">,</span> <span class="n">cv2</span><span class="o">.</span><span class="n">COLOR_BGR2RGB</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">        <span class="c1"># 运行检测</span>
</span></span><span class="line"><span class="cl">        <span class="n">results</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">hands</span><span class="o">.</span><span class="n">process</span><span class="p">(</span><span class="n">rgb_frame</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">        <span class="c1"># 更新 FPS</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">frame_count</span> <span class="o">+=</span> <span class="mi">1</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">frame_count</span> <span class="o">%</span> <span class="mi">10</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="n">curr_time</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">            <span class="bp">self</span><span class="o">.</span><span class="n">fps</span> <span class="o">=</span> <span class="mi">10</span> <span class="o">/</span> <span class="p">(</span><span class="n">curr_time</span> <span class="o">-</span> <span class="bp">self</span><span class="o">.</span><span class="n">prev_time</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">            <span class="bp">self</span><span class="o">.</span><span class="n">prev_time</span> <span class="o">=</span> <span class="n">curr_time</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">        <span class="c1"># 绘制结果</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="n">results</span><span class="o">.</span><span class="n">multi_hand_landmarks</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="k">for</span> <span class="n">hand_landmarks</span> <span class="ow">in</span> <span class="n">results</span><span class="o">.</span><span class="n">multi_hand_landmarks</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">                <span class="c1"># 绘制关键点</span>
</span></span><span class="line"><span class="cl">                <span class="bp">self</span><span class="o">.</span><span class="n">mp_drawing</span><span class="o">.</span><span class="n">draw_landmarks</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">                    <span class="n">frame</span><span class="p">,</span> <span class="n">hand_landmarks</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">mp_hands</span><span class="o">.</span><span class="n">HAND_CONNECTIONS</span>
</span></span><span class="line"><span class="cl">                <span class="p">)</span>
</span></span><span class="line"><span class="cl">                
</span></span><span class="line"><span class="cl">                <span class="c1"># 识别手势</span>
</span></span><span class="line"><span class="cl">                <span class="n">gesture</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">recognize_gesture</span><span class="p">(</span><span class="n">hand_landmarks</span><span class="o">.</span><span class="n">landmark</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">                
</span></span><span class="line"><span class="cl">                <span class="c1"># 显示手势</span>
</span></span><span class="line"><span class="cl">                <span class="n">h</span><span class="p">,</span> <span class="n">w</span> <span class="o">=</span> <span class="n">frame</span><span class="o">.</span><span class="n">shape</span><span class="p">[:</span><span class="mi">2</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">                <span class="n">wrist_x</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">hand_landmarks</span><span class="o">.</span><span class="n">landmark</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">x</span> <span class="o">*</span> <span class="n">w</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">                <span class="n">wrist_y</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">hand_landmarks</span><span class="o">.</span><span class="n">landmark</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">y</span> <span class="o">*</span> <span class="n">h</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">                
</span></span><span class="line"><span class="cl">                <span class="n">cv2</span><span class="o">.</span><span class="n">putText</span><span class="p">(</span><span class="n">frame</span><span class="p">,</span> <span class="n">gesture</span><span class="p">,</span> <span class="p">(</span><span class="n">wrist_x</span><span class="p">,</span> <span class="n">wrist_y</span> <span class="o">-</span> <span class="mi">20</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">                           <span class="n">cv2</span><span class="o">.</span><span class="n">FONT_HERSHEY_SIMPLEX</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">255</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span> <span class="mi">2</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">        <span class="c1"># 显示 FPS</span>
</span></span><span class="line"><span class="cl">        <span class="n">cv2</span><span class="o">.</span><span class="n">putText</span><span class="p">(</span><span class="n">frame</span><span class="p">,</span> <span class="sa">f</span><span class="s2">&#34;FPS: </span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">fps</span><span class="si">:</span><span class="s2">.1f</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">,</span> <span class="p">(</span><span class="mi">10</span><span class="p">,</span> <span class="mi">30</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">                   <span class="n">cv2</span><span class="o">.</span><span class="n">FONT_HERSHEY_SIMPLEX</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">255</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span> <span class="mi">2</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">frame</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">run</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="n">cap</span> <span class="o">=</span> <span class="n">cv2</span><span class="o">.</span><span class="n">VideoCapture</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">        <span class="k">while</span> <span class="n">cap</span><span class="o">.</span><span class="n">isOpened</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">            <span class="n">ret</span><span class="p">,</span> <span class="n">frame</span> <span class="o">=</span> <span class="n">cap</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">            <span class="k">if</span> <span class="ow">not</span> <span class="n">ret</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">                <span class="k">break</span>
</span></span><span class="line"><span class="cl">            
</span></span><span class="line"><span class="cl">            <span class="n">frame</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">process_frame</span><span class="p">(</span><span class="n">frame</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">            
</span></span><span class="line"><span class="cl">            <span class="n">cv2</span><span class="o">.</span><span class="n">imshow</span><span class="p">(</span><span class="s1">&#39;Hand Gesture Recognition&#39;</span><span class="p">,</span> <span class="n">frame</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">            
</span></span><span class="line"><span class="cl">            <span class="k">if</span> <span class="n">cv2</span><span class="o">.</span><span class="n">waitKey</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="o">&amp;</span> <span class="mh">0xFF</span> <span class="o">==</span> <span class="nb">ord</span><span class="p">(</span><span class="s1">&#39;q&#39;</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">                <span class="k">break</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">        <span class="n">cap</span><span class="o">.</span><span class="n">release</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">        <span class="n">cv2</span><span class="o">.</span><span class="n">destroyAllWindows</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s2">&#34;__main__&#34;</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="n">recognizer</span> <span class="o">=</span> <span class="n">HandGestureRecognizer</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="n">recognizer</span><span class="o">.</span><span class="n">run</span><span class="p">()</span>
</span></span></code></pre></div><p>这个代码整合了我们讨论的所有功能，包括关键点检测、多种手势识别、FPS 显示等，可以直接运行。</p>
<h2 id="九常见问题与调优">九、常见问题与调优</h2>
<p>在实际使用中，你可能会遇到各种问题。以下是一些常见问题的解决方案。</p>
<h3 id="91-光照影响">9.1 光照影响</h3>
<p>光照变化是影响检测稳定性的最主要因素。解决方案：</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="c1"># 直方图均衡化增强对比度</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">enhance_contrast</span><span class="p">(</span><span class="n">frame</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="n">lab</span> <span class="o">=</span> <span class="n">cv2</span><span class="o">.</span><span class="n">cvtColor</span><span class="p">(</span><span class="n">frame</span><span class="p">,</span> <span class="n">cv2</span><span class="o">.</span><span class="n">COLOR_BGR2LAB</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">l</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span> <span class="o">=</span> <span class="n">cv2</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="n">lab</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">clahe</span> <span class="o">=</span> <span class="n">cv2</span><span class="o">.</span><span class="n">createCLAHE</span><span class="p">(</span><span class="n">clipLimit</span><span class="o">=</span><span class="mf">2.0</span><span class="p">,</span> <span class="n">tileGridSize</span><span class="o">=</span><span class="p">(</span><span class="mi">8</span><span class="p">,</span> <span class="mi">8</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">    <span class="n">l</span> <span class="o">=</span> <span class="n">clahe</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="n">l</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">lab</span> <span class="o">=</span> <span class="n">cv2</span><span class="o">.</span><span class="n">merge</span><span class="p">((</span><span class="n">l</span><span class="p">,</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">cv2</span><span class="o">.</span><span class="n">cvtColor</span><span class="p">(</span><span class="n">lab</span><span class="p">,</span> <span class="n">cv2</span><span class="o">.</span><span class="n">COLOR_LAB2BGR</span><span class="p">)</span>
</span></span></code></pre></div><p>对图像进行对比度增强，可以显著提升昏暗光线下的检测率。</p>
<h3 id="92-遮挡处理">9.2 遮挡处理</h3>
<p>当手指互相遮挡时，关键点检测可能出错。解决方案：</p>
<ul>
<li>增加检测置信度阈值，过滤低质量结果</li>
<li>实现时间滤波，利用前后帧的信息进行平滑</li>
<li>当检测质量下降时，自动切换到跟踪模式</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="c1"># 简单的卡尔曼滤波用于关键点平滑</span>
</span></span><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">KalmanFilter</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">kf</span> <span class="o">=</span> <span class="n">cv2</span><span class="o">.</span><span class="n">KalmanFilter</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">kf</span><span class="o">.</span><span class="n">measurementMatrix</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([[</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">]],</span> <span class="n">np</span><span class="o">.</span><span class="n">float32</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">kf</span><span class="o">.</span><span class="n">transitionMatrix</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([[</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">],</span> <span class="p">[</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">]],</span> <span class="n">np</span><span class="o">.</span><span class="n">float32</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">kf</span><span class="o">.</span><span class="n">processNoiseCov</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">eye</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">float32</span><span class="p">)</span> <span class="o">*</span> <span class="mf">0.03</span>
</span></span></code></pre></div><h3 id="93-误识别优化">9.3 误识别优化</h3>
<p>误识别通常发生在两种手势边界模糊的情况下。解决方案：</p>
<ul>
<li>增加状态保持机制，手势需要连续 N 帧被识别到才输出结果</li>
<li>实现手势之间的切换延迟，避免快速闪烁</li>
<li>为不同手势设置不同的置信度阈值</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">GestureState</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">stability_frames</span><span class="o">=</span><span class="mi">5</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">stability_frames</span> <span class="o">=</span> <span class="n">stability_frames</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">current_gesture</span> <span class="o">=</span> <span class="s2">&#34;Unknown&#34;</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">candidate_gesture</span> <span class="o">=</span> <span class="s2">&#34;Unknown&#34;</span>
</span></span><span class="line"><span class="cl">        <span class="bp">self</span><span class="o">.</span><span class="n">candidate_count</span> <span class="o">=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">    
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">update</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">new_gesture</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="n">new_gesture</span> <span class="o">==</span> <span class="bp">self</span><span class="o">.</span><span class="n">candidate_gesture</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="bp">self</span><span class="o">.</span><span class="n">candidate_count</span> <span class="o">+=</span> <span class="mi">1</span>
</span></span><span class="line"><span class="cl">            <span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">candidate_count</span> <span class="o">&gt;=</span> <span class="bp">self</span><span class="o">.</span><span class="n">stability_frames</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">                <span class="bp">self</span><span class="o">.</span><span class="n">current_gesture</span> <span class="o">=</span> <span class="n">new_gesture</span>
</span></span><span class="line"><span class="cl">        <span class="k">else</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">            <span class="bp">self</span><span class="o">.</span><span class="n">candidate_gesture</span> <span class="o">=</span> <span class="n">new_gesture</span>
</span></span><span class="line"><span class="cl">            <span class="bp">self</span><span class="o">.</span><span class="n">candidate_count</span> <span class="o">=</span> <span class="mi">1</span>
</span></span><span class="line"><span class="cl">        
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_gesture</span>
</span></span></code></pre></div><p>这种机制可以将误识别率降低 80% 以上。</p>
<h2 id="十进阶方向">十、进阶方向</h2>
<p>掌握了基础的手势识别后，你可以向以下几个方向深入探索。</p>
<h3 id="101-自定义手势训练">10.1 自定义手势训练</h3>
<p>MediaPipe 提供的手势分类器是通用的，要识别特定的手势（比如手语、自定义控制手势），需要自己训练模型：</p>
<ol>
<li>使用 MediaPipe 采集关键点数据</li>
<li>训练一个简单的分类器（SVM、随机森林、神经网络）</li>
<li>将模型集成到你的应用中</li>
</ol>
<p>对于简单的手势，一个只有几层的 MLP 就足够了，训练数据量甚至不需要超过 1000 个样本。</p>
<h3 id="102-双手协同交互">10.2 双手协同交互</h3>
<p>很多自然的交互需要用到双手：</p>
<ul>
<li>双手缩放：像放大缩小地图一样</li>
<li>双手旋转：旋转物体</li>
<li>一只手选择，另一只手操作</li>
</ul>
<p>实现双手交互需要注意左右手的区分和坐标系统一。</p>
<h3 id="103-结合-ar-应用">10.3 结合 AR 应用</h3>
<p>手势识别和 AR 是天生的组合。你可以：</p>
<ul>
<li>在用户手中渲染虚拟物体</li>
<li>用手势操作虚拟物体</li>
<li>实现科幻电影中的全息界面效果</li>
</ul>
<p>Unity 和 Unreal Engine 都有现成的 MediaPipe 集成插件。</p>
<h3 id="104-边缘设备部署">10.4 边缘设备部署</h3>
<p>在 Raspberry Pi、Jetson Nano 等设备上部署时，需要额外的优化：</p>
<ul>
<li>使用 TensorRT 加速推理</li>
<li>进一步降低输入分辨率</li>
<li>只在关键帧运行完整检测</li>
<li>使用 MediaPipe 的 C++ API 而不是 Python</li>
</ul>
<h2 id="总结">总结</h2>
<p>手势识别是一门充满魅力的技术，它让我们能够用最自然的方式与机器交互。MediaPipe 的出现，大大降低了这项技术的入门门槛。</p>
<p>在这篇文章中，我们从 MediaPipe 的工作原理讲起，一步步实现了从基础关键点检测到静态手势识别，再到动态动作追踪的完整系统。我们讨论了性能优化的各种技巧，分析了常见问题的解决方案，最后探讨了进阶的方向。</p>
<p>但这仅仅是开始。手势识别的真正价值在于与具体应用场景的结合——你可以用手势控制机器人、操作智能家居、设计沉浸式游戏，甚至为听障人士开发实时手语翻译。技术是工具，想象力才是真正的边界。</p>
<p>希望这篇文章能为你打开手势识别的大门，激发你创造出更多有趣的应用。记住：最好的交互方式，就是让用户感觉不到交互的存在。</p>
<p>（全文完，约7200字）</p>
]]></content:encoded>
    </item>
  </channel>
</rss>
