GitLeaks使用教程#
软件介绍#
GitLeaks是一款专门用于检测Git仓库中敏感信息泄露的工具。它能够扫描Git历史记录,识别泄露的API密钥、密码、证书等敏感信息,是代码安全和DevSecOps流程中的重要工具。
主要功能#
- Git仓库敏感信息扫描
- 多种敏感信息模式匹配
- 支持自定义规则
- 扫描Git历史记录
- 支持多种输出格式
- CI/CD集成
- 高性能扫描
- 实时监控
适用场景#
- 代码安全审计
- 敏感信息泄露检测
- DevSecOps流程集成
- 安全合规检查
- 代码仓库安全
- 威胁情报收集
入门级使用#
安装GitLeaks#
使用Go安装#
# 安装Go语言(如果未安装)
wget https://go.dev/dl/go1.21.5.linux-amd64.tar.gz
tar -C /usr/local -xzf go1.21.5.linux-amd64.tar.gz
export PATH=$PATH:/usr/local/go/bin
# 安装GitLeaks
go install github.com/zricethezav/gitleaks/v8@latest
# 验证安装
gitleaks --version使用Homebrew安装#
# 安装Homebrew(如果未安装)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# 安装GitLeaks
brew install gitleaks
# 验证安装
gitleaks --version使用Docker安装#
# 使用Docker运行
docker run --rm -v $(pwd):/path zricethezav/gitleaks:latest --source=/path
# 或构建本地镜像
git clone https://github.com/zricethezav/gitleaks.git
cd gitleaks
docker build -t gitleaks .
docker run --rm -v $(pwd):/path gitleaks --source=/path基本扫描#
扫描当前目录#
# 扫描当前目录
gitleaks detect --source .
# 扫描特定目录
gitleaks detect --source /path/to/project
# 扫描并显示详细信息
gitleaks detect --source . --verbose扫描Git仓库#
# 扫描Git仓库
gitleaks detect --source . --git
# 扫描远程仓库
gitleaks detect --source https://github.com/user/repo.git --git
# 扫描特定分支
gitleaks detect --source . --git --branch develop扫描Git历史#
# 扫描所有Git历史
gitleaks detect --source . --git --history
# 扫描最近N次提交
gitleaks detect --source . --git --history --depth 100
# 扫描特定提交
gitleaks detect --source . --git --commit abc123初级使用#
扫描选项#
排除文件和目录#
# 排除特定文件
gitleaks detect --source . --exclude "node_modules"
# 排除多个文件
gitleaks detect --source . --exclude "node_modules,vendor"
# 使用正则表达式排除
gitleaks detect --source . --exclude ".*\\.min\\.js"
# 排除目录
gitleaks detect --source . --exclude "*/test/*"指定扫描深度#
# 扫描当前状态
gitleaks detect --source . --no-git
# 扫描最近50次提交
gitleaks detect --source . --git --depth 50
# 扫描所有历史
gitleaks detect --source . --git --history输出格式#
输出为JSON#
# 输出为JSON格式
gitleaks detect --source . --report-format json
# 指定输出文件
gitleaks detect --source . --report-format json --report-path report.json
# 美化JSON输出
gitleaks detect --source . --report-format json | jq .输出为CSV#
# 输出为CSV格式
gitleaks detect --source . --report-format csv
# 指定输出文件
gitleaks detect --source . --report-format csv --report-path report.csv输出为SARIF#
# 输出为SARIF格式(用于GitHub)
gitleaks detect --source . --report-format sarif
# 指定输出文件
gitleaks detect --source . --report-format sarif --report-path report.sarif中级使用#
自定义规则#
使用自定义配置文件#
# 创建自定义配置文件
cat > gitleaks-config.toml << EOF
# GitLeaks自定义配置
title = "Custom GitLeaks Config"
[[rules]]
description = "AWS Access Key"
regex = '''AKIA[0-9A-Z]{16}'''
tags = ["key", "AWS"]
[[rules]]
description = "GitHub Token"
regex = '''ghp_[a-zA-Z0-9]{36}'''
tags = ["key", "GitHub"]
[[rules]]
description = "Slack Token"
regex = '''xox[baprs]-[a-zA-Z0-9-]+'''
tags = ["key", "Slack"]
EOF
# 使用自定义配置
gitleaks detect --source . --config gitleaks-config.toml添加自定义规则#
# 添加自定义规则
cat > custom-rules.toml << EOF
[[rules]]
description = "Custom API Key"
regex = '''custom_key_[a-zA-Z0-9]{32}'''
tags = ["key", "custom"]
[[rules]]
description = "Database Password"
regex = '''password\s*=\s*['"]([^'"]+)['"]'''
tags = ["password", "database"]
EOF
# 使用自定义规则
gitleaks detect --source . --config custom-rules.toml高级扫描#
扫描多个仓库#
#!/bin/bash
# 扫描多个Git仓库脚本
REPOS_DIR="/path/to/repos"
REPORTS_DIR="./reports"
mkdir -p "$REPORTS_DIR"
# 遍历所有仓库
for repo in "$REPOS_DIR"/*/; do
repo_name=$(basename "$repo")
echo "扫描仓库: $repo_name"
# 运行扫描
gitleaks detect --source "$repo" --git \
--report-format json \
--report-path "$REPORTS_DIR/${repo_name}.json"
done
echo "批量扫描完成"并行扫描#
#!/bin/bash
# 并行扫描脚本
REPOS_DIR="/path/to/repos"
REPORTS_DIR="./reports"
mkdir -p "$REPORTS_DIR"
# 并行扫描所有仓库
for repo in "$REPOS_DIR"/*/; do
repo_name=$(basename "$repo")
echo "扫描仓库: $repo_name"
# 后台运行扫描
gitleaks detect --source "$repo" --git \
--report-format json \
--report-path "$REPORTS_DIR/${repo_name}.json" &
done
# 等待所有扫描完成
wait
echo "并行扫描完成"中上级使用#
CI/CD集成#
GitHub Actions集成#
# .github/workflows/gitleaks.yml
name: GitLeaks Scan
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
schedule:
- cron: '0 0 * * 0' # 每周运行一次
jobs:
gitleaks:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0 # 获取完整历史
- name: Run GitLeaks
uses: gitleaks/gitleaks-action@v2
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITLEAKS_LICENSE: ${{ secrets.GITLEAKS_LICENSE }}
with:
report-format: json
report-path: gitleaks-report.json
- name: Upload GitLeaks Report
uses: actions/upload-artifact@v2
with:
name: gitleaks-report
path: gitleaks-report.jsonGitLab CI集成#
# .gitlab-ci.yml
stages:
- security
gitleaks:
stage: security
image: zricethezav/gitleaks:latest
script:
- gitleaks detect --source . --git --report-format json --report-path gitleaks-report.json
artifacts:
paths:
- gitleaks-report.json
expire_in: 1 week
allow_failure: trueJenkins集成#
// Jenkinsfile
pipeline {
agent any
stages {
stage('GitLeaks Scan') {
steps {
script {
// 运行GitLeaks
sh 'gitleaks detect --source . --git --report-format json --report-path gitleaks-report.json'
// 解析结果
def report = readJSON file: 'gitleaks-report.json'
if (report.size() > 0) {
error "发现 ${report.size()} 个敏感信息泄露"
}
}
}
}
}
post {
always {
// 归档报告
archiveArtifacts artifacts: 'gitleaks-report.json', fingerprint: true
}
}
}自动化脚本#
定期扫描脚本#
#!/bin/bash
# 定期扫描脚本
REPOS_DIR="/path/to/repos"
REPORTS_DIR="./reports"
LOG_FILE="gitleaks-scan.log"
# 记录日志
log() {
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> "$LOG_FILE"
}
# 扫描函数
scan_repo() {
local repo_path=$1
local repo_name=$(basename "$repo_path")
log "开始扫描仓库: $repo_name"
# 运行扫描
gitleaks detect --source "$repo_path" --git \
--report-format json \
--report-path "$REPORTS_DIR/${repo_name}_$(date +%Y%m%d).json"
# 检查结果
if [ $? -eq 0 ]; then
log "仓库 $repo_name 扫描成功"
else
log "仓库 $repo_name 扫描失败"
fi
}
# 创建报告目录
mkdir -p "$REPORTS_DIR"
# 扫描所有仓库
for repo in "$REPOS_DIR"/*/; do
scan_repo "$repo"
done
log "定期扫描完成"监控脚本#
#!/bin/bash
# 实时监控脚本
WATCH_DIR="/path/to/repos"
REPORTS_DIR="./reports"
LOG_FILE="gitleaks-monitor.log"
# 记录日志
log() {
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> "$LOG_FILE"
}
# 监控函数
monitor_repo() {
local repo_path=$1
local repo_name=$(basename "$repo_path")
log "监控仓库: $repo_name"
# 进入仓库目录
cd "$repo_path"
# 监控Git提交
while true; do
# 检查是否有新提交
NEW_COMMIT=$(git rev-parse HEAD)
if [ "$NEW_COMMIT" != "$LAST_COMMIT" ]; then
log "检测到新提交: $NEW_COMMIT"
# 运行扫描
gitleaks detect --source . --git \
--report-format json \
--report-path "$REPORTS_DIR/${repo_name}_$(date +%Y%m%d_%H%M%S).json"
LAST_COMMIT=$NEW_COMMIT
fi
sleep 60 # 每分钟检查一次
done
}
# 创建报告目录
mkdir -p "$REPORTS_DIR"
# 监控所有仓库
for repo in "$WATCH_DIR"/*/; do
monitor_repo "$repo" &
done
log "监控已启动"高级使用#
高级分析#
敏感信息分类#
#!/usr/bin/env python3
import json
from collections import defaultdict
def classify_leaks(report_file):
"""分类敏感信息泄露"""
with open(report_file) as f:
report = json.load(f)
classifications = defaultdict(list)
for leak in report:
leak_type = leak.get('rule', 'unknown')
file_path = leak.get('file', 'unknown')
line = leak.get('line', 0)
classifications[leak_type].append({
'file': file_path,
'line': line,
'secret': leak.get('secret', '')[:20] + '...'
})
return classifications
# 使用示例
classifications = classify_leaks('gitleaks-report.json')
for leak_type, leaks in classifications.items():
print(f"\n=== {leak_type} ===")
for leak in leaks:
print(f" 文件: {leak['file']}")
print(f" 行号: {leak['line']}")
print(f" 密钥: {leak['secret']}")
print()趋势分析#
#!/usr/bin/env python3
import json
import os
from datetime import datetime
import matplotlib.pyplot as plt
def analyze_trend(reports_dir):
"""分析敏感信息泄露趋势"""
trend_data = []
# 读取所有报告
for filename in sorted(os.listdir(reports_dir)):
if filename.endswith('.json'):
filepath = os.path.join(reports_dir, filename)
with open(filepath) as f:
report = json.load(f)
# 提取时间戳
timestamp = datetime.strptime(filename, 'repo_%Y%m%d.json')
# 统计泄露
trend_data.append({
'timestamp': timestamp,
'total': len(report),
'api_keys': len([r for r in report if 'key' in r.get('rule', '').lower()]),
'passwords': len([r for r in report if 'password' in r.get('rule', '').lower()]),
'tokens': len([r for r in report if 'token' in r.get('rule', '').lower()])
})
return trend_data
def plot_trend(trend_data):
"""绘制趋势图"""
timestamps = [d['timestamp'] for d in trend_data]
total = [d['total'] for d in trend_data]
api_keys = [d['api_keys'] for d in trend_data]
passwords = [d['passwords'] for d in trend_data]
tokens = [d['tokens'] for d in trend_data]
plt.figure(figsize=(12, 6))
plt.plot(timestamps, total, label='Total', color='red')
plt.plot(timestamps, api_keys, label='API Keys', color='blue')
plt.plot(timestamps, passwords, label='Passwords', color='green')
plt.plot(timestamps, tokens, label='Tokens', color='orange')
plt.xlabel('Time')
plt.ylabel('Leak Count')
plt.title('Sensitive Information Leak Trend')
plt.legend()
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig('leak-trend.png', dpi=300)
plt.close()
# 使用示例
trend_data = analyze_trend('./reports')
plot_trend(trend_data)自定义规则开发#
创建复杂规则#
# advanced-rules.toml
[[rules]]
description = "AWS Secret Access Key"
regex = '''(?i)aws_secret_access_key\s*=\s*['"]([A-Za-z0-9/+=]{40})['"]'''
entropy = 3.5
secretGroup = 1
tags = ["key", "AWS", "secret"]
[[rules]]
description = "Google Cloud Service Account Key"
regex = '''"type":\s*"service_account"[^}]*"private_key":\s*"-----BEGIN PRIVATE KEY-----'''
entropy = 4.0
tags = ["key", "GCP", "service_account"]
[[rules]]
description = "Database Connection String"
regex = '''(?i)(mysql|postgresql|mongodb)://[^:]+:[^@]+@[^/]+/[^\s"']+'''
entropy = 3.0
tags = ["database", "connection_string"]
[[rules]]
description = "Private Key File"
regex = '''-----BEGIN (RSA|EC|DSA|OPENSSH) PRIVATE KEY-----'''
entropy = 4.5
tags = ["key", "private_key"]
[[rules]]
description = "JWT Token"
regex = '''eyJ[a-zA-Z0-9_-]*\.eyJ[a-zA-Z0-9_-]*\.[a-zA-Z0-9_-]*'''
entropy = 3.5
tags = ["token", "JWT"]
[[rules]]
description = "Base64 Encoded Secret"
regex = '''[A-Za-z0-9+/]{32,}={0,2}'''
entropy = 4.0
tags = ["base64", "secret"]使用高级规则#
# 使用高级规则扫描
gitleaks detect --source . --config advanced-rules.toml --verbose
# 结合多个规则文件
gitleaks detect --source . --config default-rules.toml --config advanced-rules.toml大师级使用#
企业级部署#
集中式扫描服务#
#!/bin/bash
# 集中式扫描服务脚本
SCAN_QUEUE="/path/to/scan_queue"
SCAN_RESULTS="/path/to/scan_results"
LOG_FILE="/var/log/gitleaks-service.log"
# 初始化队列
mkdir -p "$SCAN_QUEUE" "$SCAN_RESULTS"
# 扫描函数
scan_repo() {
local repo_path=$1
local repo_name=$(basename "$repo_path")
local result_dir="$SCAN_RESULTS/$repo_name"
echo "$(date): 开始扫描 $repo_name" >> "$LOG_FILE"
# 创建结果目录
mkdir -p "$result_dir"
# 运行扫描
gitleaks detect --source "$repo_path" --git \
--report-format json \
--report-path "$result_dir/scan_$(date +%Y%m%d_%H%M%S).json"
# 记录结果
if [ $? -eq 0 ]; then
echo "$(date): $repo_name 扫描成功" >> "$LOG_FILE"
else
echo "$(date): $repo_name 扫描失败" >> "$LOG_FILE"
fi
}
# 监控队列
while true; do
for task in "$SCAN_QUEUE"/*; do
if [ -f "$task" ]; then
repo_path=$(cat "$task")
scan_repo "$repo_path"
# 移除已处理的任务
rm "$task"
fi
done
sleep 60 # 每分钟检查一次
done分布式扫描#
#!/usr/bin/env python3
import subprocess
import multiprocessing
import os
def scan_repo(repo_path, output_dir):
"""扫描单个仓库"""
repo_name = os.path.basename(repo_path)
output_file = os.path.join(output_dir, f"{repo_name}.json")
cmd = [
"gitleaks", "detect",
"--source", repo_path,
"--git",
"--report-format", "json",
"--report-path", output_file
]
result = subprocess.run(cmd, capture_output=True, text=True)
return result.returncode, result.stdout, result.stderr
def distributed_scan(repos, output_base_dir, workers=4):
"""分布式扫描"""
os.makedirs(output_base_dir, exist_ok=True)
# 创建进程池
with multiprocessing.Pool(workers) as pool:
results = []
for repo in repos:
result = pool.apply_async(scan_repo, (repo, output_base_dir))
results.append(result)
# 等待所有任务完成
for result in results:
returncode, stdout, stderr = result.get()
if returncode != 0:
print(f"扫描失败: {stderr}")
else:
print(f"扫描成功: {stdout}")
# 使用示例
repos = [
"/path/to/repo1",
"/path/to/repo2",
"/path/to/repo3"
]
distributed_scan(repos, "./scan_results", workers=4)高级集成#
与Jira集成#
#!/usr/bin/env python3
import json
from jira import JIRA
def create_jira_issue(leak, project_key):
"""创建Jira问题"""
jira = JIRA(server='https://your-jira-instance.com',
basic_auth=('username', 'password'))
issue_dict = {
'project': {'key': project_key},
'summary': f"Sensitive Information Leak: {leak['rule']}",
'description': f"""
File: {leak['file']}
Line: {leak['line']}
Secret: {leak['secret']}
Rule: {leak['rule']}
Please investigate and remediate this leak immediately.
""",
'issuetype': {'name': 'Bug'},
'priority': {'name': 'High'}
}
new_issue = jira.create_issue(fields=issue_dict)
return new_issue.key
def create_jira_issues(report_file, project_key):
"""为所有泄露创建Jira问题"""
with open(report_file) as f:
report = json.load(f)
for leak in report:
issue_key = create_jira_issue(leak, project_key)
print(f"Created Jira issue: {issue_key}")
# 使用示例
create_jira_issues('gitleaks-report.json', 'SEC')与Slack集成#
#!/usr/bin/env python3
import json
import requests
from datetime import datetime
def send_slack_notification(webhook_url, message):
"""发送Slack通知"""
payload = {
'text': message,
'username': 'GitLeaks Bot',
'icon_emoji': ':shield:'
}
response = requests.post(webhook_url, json=payload)
return response.status_code == 200
def create_slack_message(report):
"""创建Slack消息"""
total_leaks = len(report)
high_severity = len([r for r in report if 'key' in r.get('rule', '').lower()])
message = f"""
🚨 *Sensitive Information Leaks Detected*
*Scan Time:* {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
*Summary:*
• Total Leaks: {total_leaks}
• High Severity: {high_severity}
*Top Leaks:*
"""
for leak in report[:5]:
message += f"\n• {leak['rule']} in {leak['file']}:{leak['line']}"
if total_leaks > 5:
message += f"\n• ... and {total_leaks - 5} more"
return message
# 使用示例
with open('gitleaks-report.json') as f:
report = json.load(f)
message = create_slack_message(report)
webhook_url = 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'
send_slack_notification(webhook_url, message)实战案例#
案例1: 企业代码库安全审计#
场景描述#
对企业代码库进行全面的安全审计,识别所有敏感信息泄露。
实施步骤#
#!/bin/bash
# 企业代码库安全审计脚本
REPOS_DIR="/path/to/repos"
REPORTS_DIR="./audit_reports"
LOG_FILE="audit.log"
echo "=== 企业代码库安全审计 ==="
echo "仓库目录: $REPOS_DIR"
echo "报告目录: $REPORTS_DIR"
# 创建报告目录
mkdir -p "$REPORTS_DIR"
# 审计函数
audit_repo() {
local repo_path=$1
local repo_name=$(basename "$repo_path")
echo "审计仓库: $repo_name"
# 运行扫描
gitleaks detect --source "$repo_path" --git \
--report-format json \
--report-path "$REPORTS_DIR/${repo_name}.json"
# 分析结果
if [ -s "$REPORTS_DIR/${repo_name}.json" ]; then
leak_count=$(jq length "$REPORTS_DIR/${repo_name}.json")
echo " 发现 $leak_count 个敏感信息泄露"
# 提取泄露类型
jq -r '.[].rule' "$REPORTS_DIR/${repo_name}.json" | sort | uniq -c | sort -rn > "$REPORTS_DIR/${repo_name}_summary.txt"
else
echo " 未发现敏感信息泄露"
fi
}
# 审计所有仓库
for repo in "$REPOS_DIR"/*/; do
audit_repo "$repo"
done
# 生成综合报告
echo "生成综合报告..."
cat > "$REPORTS_DIR/audit_summary.txt" << EOF
企业代码库安全审计报告
====================
审计时间: $(date)
仓库统计:
$(for report in "$REPORTS_DIR"/*.json; do
repo_name=$(basename "$report" .json)
leak_count=$(jq length "$report")
echo "- $repo_name: $leak_count 个泄露"
done)
详细报告请查看各仓库的报告文件。
EOF
echo ""
echo "安全审计完成!"
echo "综合报告: $REPORTS_DIR/audit_summary.txt"案例2: CI/CD安全门禁#
场景描述#
在CI/CD流程中设置安全门禁,确保没有敏感信息泄露的代码才能合并。
实施步骤#
#!/bin/bash
# CI/CD安全门禁脚本
set -e # 遇到错误立即退出
PROJECT_DIR="${1:-.}"
MAX_LEAKS=0
echo "=== CI/CD敏感信息泄露检查 ==="
echo "项目目录: $PROJECT_DIR"
echo "最大允许泄露数: $MAX_LEAKS"
# 运行扫描
echo "运行GitLeaks扫描..."
gitleaks detect --source "$PROJECT_DIR" --git \
--report-format json \
--report-path gitleaks-report.json
# 分析结果
echo "分析扫描结果..."
LEAK_COUNT=$(jq length gitleaks-report.json)
echo "敏感信息泄露统计:"
echo " 总数: $LEAK_COUNT"
# 检查阈值
if [ "$LEAK_COUNT" -gt "$MAX_LEAKS" ]; then
echo ""
echo "❌ 错误: 敏感信息泄露数量 ($LEAK_COUNT) 超过阈值 ($MAX_LEAKS)"
echo ""
echo "泄露详情:"
jq -r '.[] | " - \(.rule) 在 \(.file):\(.line)"' gitleaks-report.json
echo ""
echo "请修复泄露后重新提交。"
exit 1
else
echo ""
echo "✅ 安全检查通过!"
exit 0
fi总结#
GitLeaks是一款功能强大的敏感信息泄露检测工具,为代码安全和DevSecOps流程提供了全面的支持。
核心优势#
- 全面扫描: 扫描Git历史记录中的敏感信息
- 高性能: 快速扫描大型代码库
- 灵活配置: 支持自定义规则和配置
- 易于集成: 可轻松集成到CI/CD流程
- 多种输出: 支持多种报告格式
应用场景#
- 代码安全审计
- 敏感信息泄露检测
- DevSecOps流程集成
- 安全合规检查
- 代码仓库安全
最佳实践#
- 定期扫描: 建立定期扫描机制,及时发现泄露
- CI/CD集成: 将扫描集成到构建流程
- 自定义规则: 根据项目需求定制扫描规则
- 及时修复: 发现泄露后立即修复
- 持续监控: 建立持续监控机制
注意事项#
- 扫描可能会消耗较多时间和资源
- 需要定期更新规则库
- 某些误报需要人工审查
- 注意扫描结果的保密性
- 结合其他安全工具使用效果更佳