GitLeaks使用教程#

软件介绍#

GitLeaks是一款专门用于检测Git仓库中敏感信息泄露的工具。它能够扫描Git历史记录，识别泄露的API密钥、密码、证书等敏感信息，是代码安全和DevSecOps流程中的重要工具。

主要功能#

Git仓库敏感信息扫描
多种敏感信息模式匹配
支持自定义规则
扫描Git历史记录
支持多种输出格式
CI/CD集成
高性能扫描
实时监控

适用场景#

代码安全审计
敏感信息泄露检测
DevSecOps流程集成
安全合规检查
代码仓库安全
威胁情报收集

入门级使用#

安装GitLeaks#

使用Go安装#

# 安装Go语言（如果未安装）
wget https://go.dev/dl/go1.21.5.linux-amd64.tar.gz
tar -C /usr/local -xzf go1.21.5.linux-amd64.tar.gz
export PATH=$PATH:/usr/local/go/bin

# 安装GitLeaks
go install github.com/zricethezav/gitleaks/v8@latest

# 验证安装
gitleaks --version

使用Homebrew安装#

# 安装Homebrew（如果未安装）
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# 安装GitLeaks
brew install gitleaks

# 验证安装
gitleaks --version

使用Docker安装#

# 使用Docker运行
docker run --rm -v $(pwd):/path zricethezav/gitleaks:latest --source=/path

# 或构建本地镜像
git clone https://github.com/zricethezav/gitleaks.git
cd gitleaks
docker build -t gitleaks .
docker run --rm -v $(pwd):/path gitleaks --source=/path

基本扫描#

扫描当前目录#

# 扫描当前目录
gitleaks detect --source .

# 扫描特定目录
gitleaks detect --source /path/to/project

# 扫描并显示详细信息
gitleaks detect --source . --verbose

扫描Git仓库#

# 扫描Git仓库
gitleaks detect --source . --git

# 扫描远程仓库
gitleaks detect --source https://github.com/user/repo.git --git

# 扫描特定分支
gitleaks detect --source . --git --branch develop

扫描Git历史#

# 扫描所有Git历史
gitleaks detect --source . --git --history

# 扫描最近N次提交
gitleaks detect --source . --git --history --depth 100

# 扫描特定提交
gitleaks detect --source . --git --commit abc123

初级使用#

扫描选项#

排除文件和目录#

# 排除特定文件
gitleaks detect --source . --exclude "node_modules"

# 排除多个文件
gitleaks detect --source . --exclude "node_modules,vendor"

# 使用正则表达式排除
gitleaks detect --source . --exclude ".*\\.min\\.js"

# 排除目录
gitleaks detect --source . --exclude "*/test/*"

指定扫描深度#

# 扫描当前状态
gitleaks detect --source . --no-git

# 扫描最近50次提交
gitleaks detect --source . --git --depth 50

# 扫描所有历史
gitleaks detect --source . --git --history

输出格式#

输出为JSON#

# 输出为JSON格式
gitleaks detect --source . --report-format json

# 指定输出文件
gitleaks detect --source . --report-format json --report-path report.json

# 美化JSON输出
gitleaks detect --source . --report-format json | jq .

输出为CSV#

# 输出为CSV格式
gitleaks detect --source . --report-format csv

# 指定输出文件
gitleaks detect --source . --report-format csv --report-path report.csv

输出为SARIF#

# 输出为SARIF格式（用于GitHub）
gitleaks detect --source . --report-format sarif

# 指定输出文件
gitleaks detect --source . --report-format sarif --report-path report.sarif

中级使用#

自定义规则#

使用自定义配置文件#

# 创建自定义配置文件
cat > gitleaks-config.toml << EOF
# GitLeaks自定义配置

title = "Custom GitLeaks Config"

[[rules]]
    description = "AWS Access Key"
    regex = '''AKIA[0-9A-Z]{16}'''
    tags = ["key", "AWS"]

[[rules]]
    description = "GitHub Token"
    regex = '''ghp_[a-zA-Z0-9]{36}'''
    tags = ["key", "GitHub"]

[[rules]]
    description = "Slack Token"
    regex = '''xox[baprs]-[a-zA-Z0-9-]+'''
    tags = ["key", "Slack"]
EOF

# 使用自定义配置
gitleaks detect --source . --config gitleaks-config.toml

添加自定义规则#

# 添加自定义规则
cat > custom-rules.toml << EOF
[[rules]]
    description = "Custom API Key"
    regex = '''custom_key_[a-zA-Z0-9]{32}'''
    tags = ["key", "custom"]

[[rules]]
    description = "Database Password"
    regex = '''password\s*=\s*['"]([^'"]+)['"]'''
    tags = ["password", "database"]
EOF

# 使用自定义规则
gitleaks detect --source . --config custom-rules.toml

高级扫描#

扫描多个仓库#

#!/bin/bash
# 扫描多个Git仓库脚本

REPOS_DIR="/path/to/repos"
REPORTS_DIR="./reports"
mkdir -p "$REPORTS_DIR"

# 遍历所有仓库
for repo in "$REPOS_DIR"/*/; do
    repo_name=$(basename "$repo")
    echo "扫描仓库: $repo_name"
    
    # 运行扫描
    gitleaks detect --source "$repo" --git \
        --report-format json \
        --report-path "$REPORTS_DIR/${repo_name}.json"
done

echo "批量扫描完成"

并行扫描#

#!/bin/bash
# 并行扫描脚本

REPOS_DIR="/path/to/repos"
REPORTS_DIR="./reports"
mkdir -p "$REPORTS_DIR"

# 并行扫描所有仓库
for repo in "$REPOS_DIR"/*/; do
    repo_name=$(basename "$repo")
    echo "扫描仓库: $repo_name"
    
    # 后台运行扫描
    gitleaks detect --source "$repo" --git \
        --report-format json \
        --report-path "$REPORTS_DIR/${repo_name}.json" &
done

# 等待所有扫描完成
wait

echo "并行扫描完成"

中上级使用#

CI/CD集成#

GitHub Actions集成#

# .github/workflows/gitleaks.yml
name: GitLeaks Scan

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]
  schedule:
    - cron: '0 0 * * 0'  # 每周运行一次

jobs:
  gitleaks:
    runs-on: ubuntu-latest
    
    steps:
    - uses: actions/checkout@v2
      with:
        fetch-depth: 0  # 获取完整历史
    
    - name: Run GitLeaks
      uses: gitleaks/gitleaks-action@v2
      env:
        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        GITLEAKS_LICENSE: ${{ secrets.GITLEAKS_LICENSE }}
      with:
        report-format: json
        report-path: gitleaks-report.json
    
    - name: Upload GitLeaks Report
      uses: actions/upload-artifact@v2
      with:
        name: gitleaks-report
        path: gitleaks-report.json

GitLab CI集成#

# .gitlab-ci.yml
stages:
  - security

gitleaks:
  stage: security
  image: zricethezav/gitleaks:latest
  
  script:
    - gitleaks detect --source . --git --report-format json --report-path gitleaks-report.json
  
  artifacts:
    paths:
      - gitleaks-report.json
    expire_in: 1 week
  
  allow_failure: true

Jenkins集成#

// Jenkinsfile
pipeline {
    agent any
    
    stages {
        stage('GitLeaks Scan') {
            steps {
                script {
                    // 运行GitLeaks
                    sh 'gitleaks detect --source . --git --report-format json --report-path gitleaks-report.json'
                    
                    // 解析结果
                    def report = readJSON file: 'gitleaks-report.json'
                    
                    if (report.size() > 0) {
                        error "发现 ${report.size()} 个敏感信息泄露"
                    }
                }
            }
        }
    }
    
    post {
        always {
            // 归档报告
            archiveArtifacts artifacts: 'gitleaks-report.json', fingerprint: true
        }
    }
}

自动化脚本#

定期扫描脚本#

#!/bin/bash
# 定期扫描脚本

REPOS_DIR="/path/to/repos"
REPORTS_DIR="./reports"
LOG_FILE="gitleaks-scan.log"

# 记录日志
log() {
    echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> "$LOG_FILE"
}

# 扫描函数
scan_repo() {
    local repo_path=$1
    local repo_name=$(basename "$repo_path")
    
    log "开始扫描仓库: $repo_name"
    
    # 运行扫描
    gitleaks detect --source "$repo_path" --git \
        --report-format json \
        --report-path "$REPORTS_DIR/${repo_name}_$(date +%Y%m%d).json"
    
    # 检查结果
    if [ $? -eq 0 ]; then
        log "仓库 $repo_name 扫描成功"
    else
        log "仓库 $repo_name 扫描失败"
    fi
}

# 创建报告目录
mkdir -p "$REPORTS_DIR"

# 扫描所有仓库
for repo in "$REPOS_DIR"/*/; do
    scan_repo "$repo"
done

log "定期扫描完成"

监控脚本#

#!/bin/bash
# 实时监控脚本

WATCH_DIR="/path/to/repos"
REPORTS_DIR="./reports"
LOG_FILE="gitleaks-monitor.log"

# 记录日志
log() {
    echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> "$LOG_FILE"
}

# 监控函数
monitor_repo() {
    local repo_path=$1
    local repo_name=$(basename "$repo_path")
    
    log "监控仓库: $repo_name"
    
    # 进入仓库目录
    cd "$repo_path"
    
    # 监控Git提交
    while true; do
        # 检查是否有新提交
        NEW_COMMIT=$(git rev-parse HEAD)
        
        if [ "$NEW_COMMIT" != "$LAST_COMMIT" ]; then
            log "检测到新提交: $NEW_COMMIT"
            
            # 运行扫描
            gitleaks detect --source . --git \
                --report-format json \
                --report-path "$REPORTS_DIR/${repo_name}_$(date +%Y%m%d_%H%M%S).json"
            
            LAST_COMMIT=$NEW_COMMIT
        fi
        
        sleep 60  # 每分钟检查一次
    done
}

# 创建报告目录
mkdir -p "$REPORTS_DIR"

# 监控所有仓库
for repo in "$WATCH_DIR"/*/; do
    monitor_repo "$repo" &
done

log "监控已启动"

高级使用#

高级分析#

敏感信息分类#

#!/usr/bin/env python3
import json
from collections import defaultdict

def classify_leaks(report_file):
    """分类敏感信息泄露"""
    
    with open(report_file) as f:
        report = json.load(f)
    
    classifications = defaultdict(list)
    
    for leak in report:
        leak_type = leak.get('rule', 'unknown')
        file_path = leak.get('file', 'unknown')
        line = leak.get('line', 0)
        
        classifications[leak_type].append({
            'file': file_path,
            'line': line,
            'secret': leak.get('secret', '')[:20] + '...'
        })
    
    return classifications

# 使用示例
classifications = classify_leaks('gitleaks-report.json')

for leak_type, leaks in classifications.items():
    print(f"\n=== {leak_type} ===")
    for leak in leaks:
        print(f"  文件: {leak['file']}")
        print(f"  行号: {leak['line']}")
        print(f"  密钥: {leak['secret']}")
        print()

趋势分析#

#!/usr/bin/env python3
import json
import os
from datetime import datetime
import matplotlib.pyplot as plt

def analyze_trend(reports_dir):
    """分析敏感信息泄露趋势"""
    
    trend_data = []
    
    # 读取所有报告
    for filename in sorted(os.listdir(reports_dir)):
        if filename.endswith('.json'):
            filepath = os.path.join(reports_dir, filename)
            
            with open(filepath) as f:
                report = json.load(f)
            
            # 提取时间戳
            timestamp = datetime.strptime(filename, 'repo_%Y%m%d.json')
            
            # 统计泄露
            trend_data.append({
                'timestamp': timestamp,
                'total': len(report),
                'api_keys': len([r for r in report if 'key' in r.get('rule', '').lower()]),
                'passwords': len([r for r in report if 'password' in r.get('rule', '').lower()]),
                'tokens': len([r for r in report if 'token' in r.get('rule', '').lower()])
            })
    
    return trend_data

def plot_trend(trend_data):
    """绘制趋势图"""
    
    timestamps = [d['timestamp'] for d in trend_data]
    total = [d['total'] for d in trend_data]
    api_keys = [d['api_keys'] for d in trend_data]
    passwords = [d['passwords'] for d in trend_data]
    tokens = [d['tokens'] for d in trend_data]
    
    plt.figure(figsize=(12, 6))
    plt.plot(timestamps, total, label='Total', color='red')
    plt.plot(timestamps, api_keys, label='API Keys', color='blue')
    plt.plot(timestamps, passwords, label='Passwords', color='green')
    plt.plot(timestamps, tokens, label='Tokens', color='orange')
    
    plt.xlabel('Time')
    plt.ylabel('Leak Count')
    plt.title('Sensitive Information Leak Trend')
    plt.legend()
    plt.xticks(rotation=45)
    plt.tight_layout()
    
    plt.savefig('leak-trend.png', dpi=300)
    plt.close()

# 使用示例
trend_data = analyze_trend('./reports')
plot_trend(trend_data)

自定义规则开发#

创建复杂规则#

# advanced-rules.toml

[[rules]]
    description = "AWS Secret Access Key"
    regex = '''(?i)aws_secret_access_key\s*=\s*['"]([A-Za-z0-9/+=]{40})['"]'''
    entropy = 3.5
    secretGroup = 1
    tags = ["key", "AWS", "secret"]

[[rules]]
    description = "Google Cloud Service Account Key"
    regex = '''"type":\s*"service_account"[^}]*"private_key":\s*"-----BEGIN PRIVATE KEY-----'''
    entropy = 4.0
    tags = ["key", "GCP", "service_account"]

[[rules]]
    description = "Database Connection String"
    regex = '''(?i)(mysql|postgresql|mongodb)://[^:]+:[^@]+@[^/]+/[^\s"']+'''
    entropy = 3.0
    tags = ["database", "connection_string"]

[[rules]]
    description = "Private Key File"
    regex = '''-----BEGIN (RSA|EC|DSA|OPENSSH) PRIVATE KEY-----'''
    entropy = 4.5
    tags = ["key", "private_key"]

[[rules]]
    description = "JWT Token"
    regex = '''eyJ[a-zA-Z0-9_-]*\.eyJ[a-zA-Z0-9_-]*\.[a-zA-Z0-9_-]*'''
    entropy = 3.5
    tags = ["token", "JWT"]

[[rules]]
    description = "Base64 Encoded Secret"
    regex = '''[A-Za-z0-9+/]{32,}={0,2}'''
    entropy = 4.0
    tags = ["base64", "secret"]

使用高级规则#

# 使用高级规则扫描
gitleaks detect --source . --config advanced-rules.toml --verbose

# 结合多个规则文件
gitleaks detect --source . --config default-rules.toml --config advanced-rules.toml

大师级使用#

企业级部署#

集中式扫描服务#

#!/bin/bash
# 集中式扫描服务脚本

SCAN_QUEUE="/path/to/scan_queue"
SCAN_RESULTS="/path/to/scan_results"
LOG_FILE="/var/log/gitleaks-service.log"

# 初始化队列
mkdir -p "$SCAN_QUEUE" "$SCAN_RESULTS"

# 扫描函数
scan_repo() {
    local repo_path=$1
    local repo_name=$(basename "$repo_path")
    local result_dir="$SCAN_RESULTS/$repo_name"
    
    echo "$(date): 开始扫描 $repo_name" >> "$LOG_FILE"
    
    # 创建结果目录
    mkdir -p "$result_dir"
    
    # 运行扫描
    gitleaks detect --source "$repo_path" --git \
        --report-format json \
        --report-path "$result_dir/scan_$(date +%Y%m%d_%H%M%S).json"
    
    # 记录结果
    if [ $? -eq 0 ]; then
        echo "$(date): $repo_name 扫描成功" >> "$LOG_FILE"
    else
        echo "$(date): $repo_name 扫描失败" >> "$LOG_FILE"
    fi
}

# 监控队列
while true; do
    for task in "$SCAN_QUEUE"/*; do
        if [ -f "$task" ]; then
            repo_path=$(cat "$task")
            scan_repo "$repo_path"
            
            # 移除已处理的任务
            rm "$task"
        fi
    done
    
    sleep 60  # 每分钟检查一次
done

分布式扫描#

#!/usr/bin/env python3
import subprocess
import multiprocessing
import os

def scan_repo(repo_path, output_dir):
    """扫描单个仓库"""
    repo_name = os.path.basename(repo_path)
    output_file = os.path.join(output_dir, f"{repo_name}.json")
    
    cmd = [
        "gitleaks", "detect",
        "--source", repo_path,
        "--git",
        "--report-format", "json",
        "--report-path", output_file
    ]
    
    result = subprocess.run(cmd, capture_output=True, text=True)
    return result.returncode, result.stdout, result.stderr

def distributed_scan(repos, output_base_dir, workers=4):
    """分布式扫描"""
    
    os.makedirs(output_base_dir, exist_ok=True)
    
    # 创建进程池
    with multiprocessing.Pool(workers) as pool:
        results = []
        
        for repo in repos:
            result = pool.apply_async(scan_repo, (repo, output_base_dir))
            results.append(result)
        
        # 等待所有任务完成
        for result in results:
            returncode, stdout, stderr = result.get()
            
            if returncode != 0:
                print(f"扫描失败: {stderr}")
            else:
                print(f"扫描成功: {stdout}")

# 使用示例
repos = [
    "/path/to/repo1",
    "/path/to/repo2",
    "/path/to/repo3"
]

distributed_scan(repos, "./scan_results", workers=4)

高级集成#

与Jira集成#

#!/usr/bin/env python3
import json
from jira import JIRA

def create_jira_issue(leak, project_key):
    """创建Jira问题"""
    
    jira = JIRA(server='https://your-jira-instance.com',
                basic_auth=('username', 'password'))
    
    issue_dict = {
        'project': {'key': project_key},
        'summary': f"Sensitive Information Leak: {leak['rule']}",
        'description': f"""
        File: {leak['file']}
        Line: {leak['line']}
        
        Secret: {leak['secret']}
        
        Rule: {leak['rule']}
        
        Please investigate and remediate this leak immediately.
        """,
        'issuetype': {'name': 'Bug'},
        'priority': {'name': 'High'}
    }
    
    new_issue = jira.create_issue(fields=issue_dict)
    return new_issue.key

def create_jira_issues(report_file, project_key):
    """为所有泄露创建Jira问题"""
    
    with open(report_file) as f:
        report = json.load(f)
    
    for leak in report:
        issue_key = create_jira_issue(leak, project_key)
        print(f"Created Jira issue: {issue_key}")

# 使用示例
create_jira_issues('gitleaks-report.json', 'SEC')

与Slack集成#

#!/usr/bin/env python3
import json
import requests
from datetime import datetime

def send_slack_notification(webhook_url, message):
    """发送Slack通知"""
    
    payload = {
        'text': message,
        'username': 'GitLeaks Bot',
        'icon_emoji': ':shield:'
    }
    
    response = requests.post(webhook_url, json=payload)
    return response.status_code == 200

def create_slack_message(report):
    """创建Slack消息"""
    
    total_leaks = len(report)
    high_severity = len([r for r in report if 'key' in r.get('rule', '').lower()])
    
    message = f"""
    🚨 *Sensitive Information Leaks Detected*
    
    *Scan Time:* {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
    
    *Summary:*
    • Total Leaks: {total_leaks}
    • High Severity: {high_severity}
    
    *Top Leaks:*
    """
    
    for leak in report[:5]:
        message += f"\n• {leak['rule']} in {leak['file']}:{leak['line']}"
    
    if total_leaks > 5:
        message += f"\n• ... and {total_leaks - 5} more"
    
    return message

# 使用示例
with open('gitleaks-report.json') as f:
    report = json.load(f)

message = create_slack_message(report)
webhook_url = 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'

send_slack_notification(webhook_url, message)

实战案例#

案例1: 企业代码库安全审计#

场景描述#

对企业代码库进行全面的安全审计，识别所有敏感信息泄露。

实施步骤#

#!/bin/bash
# 企业代码库安全审计脚本

REPOS_DIR="/path/to/repos"
REPORTS_DIR="./audit_reports"
LOG_FILE="audit.log"

echo "=== 企业代码库安全审计 ==="
echo "仓库目录: $REPOS_DIR"
echo "报告目录: $REPORTS_DIR"

# 创建报告目录
mkdir -p "$REPORTS_DIR"

# 审计函数
audit_repo() {
    local repo_path=$1
    local repo_name=$(basename "$repo_path")
    
    echo "审计仓库: $repo_name"
    
    # 运行扫描
    gitleaks detect --source "$repo_path" --git \
        --report-format json \
        --report-path "$REPORTS_DIR/${repo_name}.json"
    
    # 分析结果
    if [ -s "$REPORTS_DIR/${repo_name}.json" ]; then
        leak_count=$(jq length "$REPORTS_DIR/${repo_name}.json")
        echo "  发现 $leak_count 个敏感信息泄露"
        
        # 提取泄露类型
        jq -r '.[].rule' "$REPORTS_DIR/${repo_name}.json" | sort | uniq -c | sort -rn > "$REPORTS_DIR/${repo_name}_summary.txt"
    else
        echo "  未发现敏感信息泄露"
    fi
}

# 审计所有仓库
for repo in "$REPOS_DIR"/*/; do
    audit_repo "$repo"
done

# 生成综合报告
echo "生成综合报告..."

cat > "$REPORTS_DIR/audit_summary.txt" << EOF
企业代码库安全审计报告
====================
审计时间: $(date)

仓库统计:
$(for report in "$REPORTS_DIR"/*.json; do
    repo_name=$(basename "$report" .json)
    leak_count=$(jq length "$report")
    echo "- $repo_name: $leak_count 个泄露"
done)

详细报告请查看各仓库的报告文件。
EOF

echo ""
echo "安全审计完成！"
echo "综合报告: $REPORTS_DIR/audit_summary.txt"

案例2: CI/CD安全门禁#

场景描述#

在CI/CD流程中设置安全门禁，确保没有敏感信息泄露的代码才能合并。

实施步骤#

#!/bin/bash
# CI/CD安全门禁脚本

set -e  # 遇到错误立即退出

PROJECT_DIR="${1:-.}"
MAX_LEAKS=0

echo "=== CI/CD敏感信息泄露检查 ==="
echo "项目目录: $PROJECT_DIR"
echo "最大允许泄露数: $MAX_LEAKS"

# 运行扫描
echo "运行GitLeaks扫描..."
gitleaks detect --source "$PROJECT_DIR" --git \
    --report-format json \
    --report-path gitleaks-report.json

# 分析结果
echo "分析扫描结果..."

LEAK_COUNT=$(jq length gitleaks-report.json)

echo "敏感信息泄露统计:"
echo "  总数: $LEAK_COUNT"

# 检查阈值
if [ "$LEAK_COUNT" -gt "$MAX_LEAKS" ]; then
    echo ""
    echo "❌ 错误: 敏感信息泄露数量 ($LEAK_COUNT) 超过阈值 ($MAX_LEAKS)"
    echo ""
    echo "泄露详情:"
    jq -r '.[] | "  - \(.rule) 在 \(.file):\(.line)"' gitleaks-report.json
    echo ""
    echo "请修复泄露后重新提交。"
    exit 1
else
    echo ""
    echo "✅ 安全检查通过！"
    exit 0
fi

总结#

GitLeaks是一款功能强大的敏感信息泄露检测工具，为代码安全和DevSecOps流程提供了全面的支持。

核心优势#

全面扫描: 扫描Git历史记录中的敏感信息
高性能: 快速扫描大型代码库
灵活配置: 支持自定义规则和配置
易于集成: 可轻松集成到CI/CD流程
多种输出: 支持多种报告格式

应用场景#

代码安全审计
敏感信息泄露检测
DevSecOps流程集成
安全合规检查
代码仓库安全

最佳实践#

定期扫描: 建立定期扫描机制，及时发现泄露
CI/CD集成: 将扫描集成到构建流程
自定义规则: 根据项目需求定制扫描规则
及时修复: 发现泄露后立即修复
持续监控: 建立持续监控机制

注意事项#

扫描可能会消耗较多时间和资源
需要定期更新规则库
某些误报需要人工审查
注意扫描结果的保密性
结合其他安全工具使用效果更佳