云栈社区»论坛 › 技术文档「 Note & Doc 」 › Shell脚本10大实战技巧提升开发效率

发回帖发新帖

2145 积分	0 好友	281 主题

发消息

Shell脚本10大实战技巧提升开发效率

发表于 2026-1-20 10:21:50 | 查看: 80| 回复: 0

Shell脚本10大实战技巧提升开发效率

终端界面显示bash脚本与shellshock漏洞利用命令

技术特点

Shell脚本有几个其他语言难以替代的优势：

无处不在的执行环境
任何Linux/Unix系统都自带Bash或其他Shell，不需要安装任何运行时。这在生产环境中太重要了——特别是那些网络隔离的环境，装个Python都要层层审批。

管道和重定向的原生支持
说真的，处理文本数据时，cat file | grep pattern | awk '{print $2}' | sort | uniq -c 这种写法，用其他语言写可能要十几行代码。

系统级操作的天然契合
文件操作、进程管理、网络配置，这些系统管理任务用Shell写最直观。毕竟Shell本身就是操作系统的用户接口。

快速原型和一次性任务
有时候就是需要快速写个脚本跑一次，花时间写Python再配虚拟环境完全没必要。

但Shell也有明显的短板：

复杂数据结构支持弱（虽然Bash 4.x后有关联数组，但用起来还是别扭）
错误处理机制原始
调试手段有限
代码可读性容易崩塌

今天要讲的技巧，很大程度上就是在弥补这些短板。

适用场景

根据我的经验，以下场景Shell脚本是首选：

场景	推荐度	说明
系统初始化和配置	极高	装机脚本、环境配置等
日志分析和文本处理	高	结合awk/sed/grep威力巨大
定时任务和cron作业	高	简单直接，依赖少
CI/CD流水线脚本	高	Jenkins、GitLab CI等都是Shell为主
批量运维操作	中高	配合ansible或直接SSH批量执行
监控数据采集	中	简单指标采集足够用
复杂业务逻辑	低	超过500行建议换Python/Go
需要并发控制的任务	中低	Shell的并发控制比较原始

有个我自己的经验法则：脚本超过500行，或者需要处理JSON/YAML等结构化数据，就该考虑换语言了

在开始写脚本之前，我一般会做这些准备：

1. 确认执行环境

# 查看目标服务器的Shell版本和类型
cat /etc/shells
echo $SHELL
bash --version

# 查看是否有必要的工具
which awk sed grep xargs

2. 创建项目结构

对于复杂一点的脚本项目，我会这样组织：

my-scripts/
├── bin/                    # 可执行脚本
│   ├── deploy.sh
│   └── backup.sh
├── lib/                    # 函数库
│   ├── common.sh
│   ├── logging.sh
│   └── utils.sh
├── conf/                   # 配置文件
│   ├── default.conf
│   └── production.conf
├── tests/                  # 测试文件
│   ├── test_common.bats
│   └── test_utils.bats
├── docs/                   # 文档
│   └── README.md
└── Makefile                # 构建和测试入口

3. 设置开发环境

我个人的VS Code配置：

{
  "shellcheck.enable": true,
  "shellcheck.run": "onSave",
  "shellformat.flag": "-i 2 -ci",
  "[shellscript]": {
    "editor.defaultFormatter": "foxundermoon.shell-format",
    "editor.formatOnSave": true,
    "editor.tabSize": 2
  }
}

用vim的话，加上这些配置：

" ~/.vimrc
autocmd FileType sh setlocal tabstop=2 shiftwidth=2 expandtab
autocmd BufWritePost *.sh silent !shellcheck -x %

以下是多年总结的Shell编程10个技巧:

set -euo pipefail 严格模式：避免最常见的Shell陷阱，让脚本fail-fast
函数封装和模块化：提高代码复用性和可维护性
参数解析：让脚本专业、易用
日志记录框架：方便调试和问题追踪
错误处理和trap：优雅地处理异常情况
并行执行：大幅提升批量任务效率
配置文件管理：分离代码和配置，提高灵活性
单元测试：保证代码质量，方便重构
脚本模板：标准化开发，减少重复劳动
shellcheck静态检查：自动发现潜在问题

技巧1: set -euo pipefail 严格模式

这是我认为最重要的技巧，没有之一。

默认行为的坑

先看看Shell默认行为有多危险：

#!/bin/bash
# 危险示例：默认行为

# 假设这个变量没设置
cd $WORKDIR        # 如果WORKDIR为空，会cd到$HOME！
rm -rf *           # 然后删除$HOME下所有文件...

# 这个命令失败了
cp /nonexistent/file /tmp/
echo "继续执行..."  # 居然还会执行这行！

# 管道中的错误被吞掉
cat /nonexistent | grep pattern  # 整个命令返回0（grep的返回码）
echo "上面的错误你根本不知道"

我2015年删库那次，就是类似的问题。$LOG_DIR 变量没设置，导致 rm -rf ${LOG_DIR}/* 变成了 rm -rf /*。

严格模式详解

#!/usr/bin/env bash

# 推荐的严格模式
set -euo pipefail
IFS=$'\n\t'

逐个解释：

set -e (errexit)
命令返回非零状态时立即退出。

set -e

echo "开始"
false               # 返回1，脚本立即退出
echo "这行不会执行"

但要注意，有些情况下 -e 不生效：

set -e

# 这些情况命令失败不会退出
if ! command_that_might_fail; then
    echo "命令失败了"
fi

command_that_might_fail || true

command_that_might_fail || handle_error

# 管道中间的命令失败不会触发（需要pipefail）
failing_command | grep something

set -u (nounset)
使用未定义变量时报错退出。

set -u

echo "$UNDEFINED_VAR"  # 报错：UNDEFINED_VAR: unbound variable

处理可选变量的方式：

set -u

# 提供默认值
echo "${OPTIONAL_VAR:-default_value}"

# 检查变量是否设置
if [[ -v OPTIONAL_VAR ]]; then
    echo "变量已设置: $OPTIONAL_VAR"
fi

# 数组的特殊处理
declare -a my_array=()
echo "${my_array[@]:-}"  # 避免空数组报错

set -o pipefail
管道中任何命令失败，整个管道返回失败。

set -o pipefail

# 现在第一个命令的失败会被检测到
cat /nonexistent | grep pattern
echo $?  # 返回1，而不是0

IFS=$'\n\t'
修改字段分隔符，避免空格引起的问题。

# 默认IFS包含空格
IFS=$' \t\n'
files="file 1.txt file2.txt"
for f in $files; do
    echo "-> $f"
done
# 输出4行，因为空格也是分隔符

# 修改IFS
IFS=$'\n\t'
for f in $files; do
    echo "-> $f"
done
# 输出1行

完整的严格模式模板

#!/usr/bin/env bash
#
# 脚本描述
#

# 严格模式
set -euo pipefail
IFS=$'\n\t'

# 调试模式（取消注释启用）
# set -x

# 错误处理
trap 'echo "Error on line $LINENO. Exit code: $?" >&2' ERR

# 退出清理
cleanup() {
    rm -f "${TEMP_FILE:-}"
}
trap cleanup EXIT

# 主逻辑
main() {
    # ...
}

main "$@"

技巧2: 函数封装和模块化

刚开始写Shell时，我喜欢从头写到尾，一个函数都不用。后来维护这些脚本简直是噩梦。

函数的基本规范

#!/usr/bin/env bash

#######################################
# 检查指定端口是否被监听
# Globals:
#   无
# Arguments:
#   $1 - 端口号
#   $2 - 超时时间（秒），可选，默认5
# Outputs:
#   成功时无输出
#   失败时输出错误信息到stderr
# Returns:
#   0 - 端口正在监听
#   1 - 端口未监听或检查超时
#######################################
check_port() {
    local port="${1:?Port number required}"
    local timeout="${2:-5}"

    if ! [[ "$port" =~ ^[0-9]+$ ]] || (( port < 1 || port > 65535 )); then
        echo "Invalid port number: $port" >&2
        return 1
    fi

    if timeout "$timeout" bash -c "cat < /dev/null > /dev/tcp/localhost/$port" 2>/dev/null; then
        return 0
    else
        echo "Port $port is not listening" >&2
        return 1
    fi
}

# 使用示例
if check_port 8080; then
    echo "服务已启动"
fi

几个关键点：

函数注释：用Google Shell Style Guide的格式，说明Globals、Arguments、Outputs、Returns
参数验证：${1:?'error message'} 确保必填参数存在
local变量：函数内变量一定要用 local 声明，避免污染全局
返回值语义：用返回码表示成功/失败，用stdout输出数据，用stderr输出错误

模块化组织

把通用函数抽到独立文件：

# lib/logging.sh
#!/usr/bin/env bash
#
# 日志记录模块
#

# 防止重复source
[[ -n "${_LOGGING_SH_LOADED:-}" ]] && return
_LOGGING_SH_LOADED=1

# 日志级别
declare -gr LOG_LEVEL_DEBUG=0
declare -gr LOG_LEVEL_INFO=1
declare -gr LOG_LEVEL_WARN=2
declare -gr LOG_LEVEL_ERROR=3

# 当前日志级别，默认INFO
declare -g CURRENT_LOG_LEVEL=${CURRENT_LOG_LEVEL:-$LOG_LEVEL_INFO}

# 颜色定义（仅当输出是终端时启用）
if [[ -t 2 ]]; then
    declare -gr COLOR_RED='\033[0;31m'
    declare -gr COLOR_YELLOW='\033[0;33m'
    declare -gr COLOR_GREEN='\033[0;32m'
    declare -gr COLOR_BLUE='\033[0;34m'
    declare -gr COLOR_RESET='\033[0m'
else
    declare -gr COLOR_RED=''
    declare -gr COLOR_YELLOW=''
    declare -gr COLOR_GREEN=''
    declare -gr COLOR_BLUE=''
    declare -gr COLOR_RESET=''
fi

#######################################
# 内部日志函数
# Arguments:
#   $1 - 日志级别数值
#   $2 - 日志级别名称
#   $3 - 颜色代码
#   $4... - 日志消息
#######################################
_log() {
    local level_num="$1"
    local level_name="$2"
    local color="$3"
    shift 3

    (( level_num < CURRENT_LOG_LEVEL )) && return

    local timestamp
    timestamp=$(date '+%Y-%m-%d %H:%M:%S')

    # 获取调用者信息
    local caller_func="${FUNCNAME[2]:-main}"
    local caller_line="${BASH_LINENO[1]:-0}"

    echo -e "${color}[$timestamp] [$level_name] [$caller_func:$caller_line] $*${COLOR_RESET}" >&2
}

log_debug() { _log $LOG_LEVEL_DEBUG "DEBUG" "$COLOR_BLUE" "$@"; }
log_info()  { _log $LOG_LEVEL_INFO  "INFO "  "$COLOR_GREEN" "$@"; }
log_warn()  { _log $LOG_LEVEL_WARN  "WARN "  "$COLOR_YELLOW" "$@"; }
log_error() { _log $LOG_LEVEL_ERROR "ERROR"  "$COLOR_RED" "$@"; }

# 设置日志级别
set_log_level() {
    local level="${1:-info}"
    case "${level,,}" in
        debug) CURRENT_LOG_LEVEL=$LOG_LEVEL_DEBUG ;;
        info)  CURRENT_LOG_LEVEL=$LOG_LEVEL_INFO ;;
        warn)  CURRENT_LOG_LEVEL=$LOG_LEVEL_WARN ;;
        error) CURRENT_LOG_LEVEL=$LOG_LEVEL_ERROR ;;
        *)     log_warn "Unknown log level: $level, using INFO" ;;
    esac
}

使用方式：

#!/usr/bin/env bash
# main.sh

set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "${SCRIPT_DIR}/lib/logging.sh"

set_log_level debug

log_info "脚本开始执行"
log_debug "调试信息"
log_warn "警告信息"
log_error "错误信息"

技巧3: 参数解析 (getopts和getopt)

命令行参数解析是每个脚本都要面对的问题。我见过太多脚本用一堆 if [ "$1" = "-v" ] 来处理参数，维护起来简直要命。

基础方案：getopts

getopts 是Bash内置的，简单可靠，但只支持短选项：

#!/usr/bin/env bash
#
# getopts示例
# 用法: ./script.sh -e env -v version [-d] [-h]
#

set -euo pipefail

# 默认值
ENVIRONMENT="staging"
VERSION=""
DRY_RUN=false
VERBOSE=false

usage() {
    cat << EOF
Usage: $(basename "$0") [OPTIONS]

Options:
    -e ENV      目标环境 (staging|production)，默认staging
    -v VERSION  部署版本号（必填）
    -d          干运行模式，不实际执行
    -V          详细输出模式
    -h          显示帮助信息

Examples:
    $(basename "$0") -e production -v 1.2.3
    $(basename "$0") -v 1.2.3 -d
EOF
}

while getopts ":e:v:dVh" opt; do
    case $opt in
        e)
            ENVIRONMENT="$OPTARG"
            if [[ ! "$ENVIRONMENT" =~ ^(staging|production)$ ]]; then
                echo "Error: Invalid environment: $ENVIRONMENT" >&2
                exit 1
            fi
            ;;
        v)
            VERSION="$OPTARG"
            ;;
        d)
            DRY_RUN=true
            ;;
        V)
            VERBOSE=true
            ;;
        h)
            usage
            exit 0
            ;;
        :)
            echo "Error: Option -$OPTARG requires an argument" >&2
            exit 1
            ;;
        \?)
            echo "Error: Invalid option -$OPTARG" >&2
            usage
            exit 1
            ;;
    esac
done

shift $((OPTIND - 1))

# 验证必填参数
if [[ -z "$VERSION" ]]; then
    echo "Error: Version is required (-v)" >&2
    usage
    exit 1
fi

# 处理剩余的位置参数
EXTRA_ARGS=("$@")

echo "Environment: $ENVIRONMENT"
echo "Version: $VERSION"
echo "Dry run: $DRY_RUN"
echo "Verbose: $VERBOSE"
echo "Extra args: ${EXTRA_ARGS:-none}"

getopts的注意点：

选项字符串开头的 : 表示静默模式，自己处理错误
选项后的 : 表示该选项需要参数
$OPTARG 是当前选项的参数值
$OPTIND 是下一个要处理的参数索引

进阶方案：getopt (GNU)

要支持长选项（--environment这种），需要用GNU getopt：

#!/usr/bin/env bash
#
# GNU getopt示例
# 支持短选项和长选项
#

set -euo pipefail

# 默认值
ENVIRONMENT="staging"
VERSION=""
DRY_RUN=false
VERBOSE=false
CONFIG_FILE=""

usage() {
    cat << EOF
Usage: $(basename "$0") [OPTIONS] [TARGETS...]

Options:
    -e, --environment ENV   目标环境 (staging|production)
    -v, --version VERSION   部署版本号（必填）
    -c, --config FILE       配置文件路径
    -d, --dry-run          干运行模式
    -V, --verbose          详细输出模式
    -h, --help             显示帮助信息

Examples:
    $(basename "$0") --environment production --version 1.2.3
    $(basename "$0") -e staging -v 1.2.3 -d
    $(basename "$0") -v 1.2.3 server1 server2
EOF
}

# 检查getopt版本（macOS默认的是BSD版本，不支持长选项）
check_getopt() {
    if ! getopt --test > /dev/null 2>&1; then
        if [[ $? -ne 4 ]]; then
            echo "Error: GNU getopt required. On macOS, install with: brew install gnu-getopt" >&2
            exit 1
        fi
    fi
}

parse_arguments() {
    check_getopt

    local short_opts="e:v:c:dVh"
    local long_opts="environment:,version:,config:,dry-run,verbose,help"

    # getopt会重新排列参数，把选项放前面，非选项放后面
    local parsed
    if ! parsed=$(getopt --options "$short_opts" \
                          --longoptions "$long_opts" \
                          --name "$(basename "$0")" \
                          -- "$@"); then
        usage
        exit 1
    fi

    eval set -- "$parsed"

    while true; do
        case "$1" in
            -e|--environment)
                ENVIRONMENT="$2"
                shift 2
                ;;
            -v|--version)
                VERSION="$2"
                shift 2
                ;;
            -c|--config)
                CONFIG_FILE="$2"
                shift 2
                ;;
            -d|--dry-run)
                DRY_RUN=true
                shift
                ;;
            -V|--verbose)
                VERBOSE=true
                shift
                ;;
            -h|--help)
                usage
                exit 0
                ;;
            --)
                shift
                break
                ;;
            *)
                echo "Error: Unexpected option: $1" >&2
                exit 1
                ;;
        esac
    done

    # 剩余的是位置参数
    TARGETS=("$@")
}

validate_arguments() {
    # 验证必填参数
    if [[ -z "$VERSION" ]]; then
        echo "Error: --version is required" >&2
        exit 1
    fi

    # 验证环境值
    if [[ ! "$ENVIRONMENT" =~ ^(staging|production)$ ]]; then
        echo "Error: Invalid environment: $ENVIRONMENT" >&2
        exit 1
    fi

    # 验证配置文件存在
    if [[ -n "$CONFIG_FILE" ]] && [[ ! -f "$CONFIG_FILE" ]]; then
        echo "Error: Config file not found: $CONFIG_FILE" >&2
        exit 1
    fi
}

main() {
    parse_arguments "$@"
    validate_arguments

    echo "=== Configuration ==="
    echo "Environment: $ENVIRONMENT"
    echo "Version: $VERSION"
    echo "Config file: ${CONFIG_FILE:-<not specified>}"
    echo "Dry run: $DRY_RUN"
    echo "Verbose: $VERBOSE"
    echo "Targets: ${TARGETS:-<none>}"
}

main "$@"

我常用的参数解析模板

实际项目中，我会结合两者的优点：

#!/usr/bin/env bash
#
# 参数解析最佳实践模板
#

set -euo pipefail

# 版本信息
readonly VERSION="1.0.0"
readonly SCRIPT_NAME="$(basename "$0")"

# 配置变量
declare -A CONFIG=(
    [environment]="staging"
    [version]=""
    [config_file]=""
    [dry_run]="false"
    [verbose]="false"
    [log_level]="info"
)

# 位置参数
declare -a TARGETS=()

die() {
    echo "Error: $*" >&2
    exit 1
}

usage() {
    cat << EOF
$SCRIPT_NAME v$VERSION - 自动化部署工具

Usage:
$SCRIPT_NAME [OPTIONS] [TARGETS...]

Options:
    -e, --env ENV         目标环境 (staging|production) [默认: staging]
    -v, --version VER     部署版本号（必填）
    -c, --config FILE     配置文件路径
    -n, --dry-run         干运行模式，只打印将执行的操作
    --verbose             详细输出
    --log-level LEVEL     日志级别 (debug|info|warn|error) [默认: info]
    -h, --help            显示帮助信息
    --version             显示版本信息

Environment Variables:
    DEPLOY_ENV           等同于 --env
    DEPLOY_CONFIG        等同于 --config

Examples:
# 部署到staging环境
$SCRIPT_NAME -v 1.2.3

# 部署到production，只处理指定服务器
$SCRIPT_NAME -e production -v 1.2.3 server1 server2

# 干运行模式
$SCRIPT_NAME -v 1.2.3 --dry-run --verbose
EOF
}

parse_args() {
    while [[ $# -gt 0 ]]; do
        case "$1" in
            -e|--env)
                [[ $# -lt 2 ]] && die "Option $1 requires an argument"
                CONFIG[environment]="$2"
                shift 2
                ;;
            -v|--version)
                [[ $# -lt 2 ]] && die "Option $1 requires an argument"
                CONFIG[version]="$2"
                shift 2
                ;;
            -c|--config)
                [[ $# -lt 2 ]] && die "Option $1 requires an argument"
                CONFIG[config_file]="$2"
                shift 2
                ;;
            -n|--dry-run)
                CONFIG[dry_run]="true"
                shift
                ;;
            --verbose)
                CONFIG[verbose]="true"
                shift
                ;;
            --log-level)
                [[ $# -lt 2 ]] && die "Option $1 requires an argument"
                CONFIG[log_level]="$2"
                shift 2
                ;;
            -h|--help)
                usage
                exit 0
                ;;
            --version)
                echo "$SCRIPT_NAME v$VERSION"
                exit 0
                ;;
            --)
                shift
                TARGETS+=("$@")
                break
                ;;
            -*)
                die "Unknown option: $1"
                ;;
            *)
                TARGETS+=("$1")
                shift
                ;;
        esac
    done

    # 从环境变量读取默认值
    : "${CONFIG[environment]:=${DEPLOY_ENV:-staging}}"
    : "${CONFIG[config_file]:=${DEPLOY_CONFIG:-}}"
}

validate_args() {
    # 必填参数检查
    [[ -z "${CONFIG[version]}" ]] && die "Version is required. Use -v or --version"

    # 环境值验证
    [[ ! "${CONFIG[environment]}" =~ ^(staging|production)$ ]] && \
        die "Invalid environment: ${CONFIG[environment]}. Must be 'staging' or 'production'"

    # 版本格式验证（语义化版本）
    [[ ! "${CONFIG[version]}" =~ ^[0-9]+\.[0-9]+\.[0-9]+(-[a-zA-Z0-9.]+)?$ ]] && \
        die "Invalid version format: ${CONFIG[version]}. Expected semantic versioning (e.g., 1.2.3)"

    # 配置文件验证
    if [[ -n "${CONFIG[config_file]}" ]]; then
        [[ ! -f "${CONFIG[config_file]}" ]] && \
            die "Config file not found: ${CONFIG[config_file]}"
    fi

    # 日志级别验证
    [[ ! "${CONFIG[log_level]}" =~ ^(debug|info|warn|error)$ ]] && \
        die "Invalid log level: ${CONFIG[log_level]}"
}

print_config() {
    echo "=== Configuration ==="
    for key in "${!CONFIG[@]}"; do
        printf "  %-15s: %s\n" "$key" "${CONFIG[$key]}"
    done
    echo "  targets        : ${TARGETS:-<all>}"
}

main() {
    parse_args "$@"
    validate_args

    if [[ "${CONFIG[verbose]}" == "true" ]]; then
        print_config
    fi

    # 主逻辑...
}

main "$@"

技巧4: 日志记录框架

前面在函数模块化部分展示过一个简单的日志模块。这里给出一个更完整的版本，支持文件输出和日志轮转：

#!/usr/bin/env bash
#
# lib/logging.sh - 生产级日志框架
#

[[ -n "${_LOGGING_SH:-}" ]] && return
readonly _LOGGING_SH=1

# 日志配置
declare -g LOG_FILE="${LOG_FILE:-}"
declare -g LOG_MAX_SIZE="${LOG_MAX_SIZE:-10485760}"  # 10MB
declare -g LOG_BACKUP_COUNT="${LOG_BACKUP_COUNT:-5}"
declare -gi LOG_LEVEL=1  # 0=DEBUG, 1=INFO, 2=WARN, 3=ERROR

# 颜色（仅终端）
if [[ -t 2 ]]; then
    readonly _C_RED=$'\033[31m'
    readonly _C_YELLOW=$'\033[33m'
    readonly _C_GREEN=$'\033[32m'
    readonly _C_BLUE=$'\033[34m'
    readonly _C_GRAY=$'\033[90m'
    readonly _C_RESET=$'\033[0m'
    readonly _C_BOLD=$'\033[1m'
else
    readonly _C_RED='' _C_YELLOW='' _C_GREEN='' _C_BLUE=''
    readonly _C_GRAY='' _C_RESET='' _C_BOLD=''
fi

# 日志级别设置
log_set_level() {
    case "${1,,}" in
        debug|0) LOG_LEVEL=0 ;;
        info|1)  LOG_LEVEL=1 ;;
        warn|2)  LOG_LEVEL=2 ;;
        error|3) LOG_LEVEL=3 ;;
        *) echo "Unknown log level: $1" >&2 ;;
    esac
}

# 设置日志文件
log_set_file() {
    local file="$1"
    local dir
    dir="$(dirname "$file")"

    if [[ ! -d "$dir" ]]; then
        mkdir -p "$dir" || {
            echo "Failed to create log directory: $dir" >&2
            return 1
        }
    fi

    LOG_FILE="$file"
}

# 日志轮转
_log_rotate() {
    [[ -z "$LOG_FILE" ]] && return
    [[ ! -f "$LOG_FILE" ]] && return

    local size
    size=$(stat -f%z "$LOG_FILE" 2>/dev/null || stat -c%s "$LOG_FILE" 2>/dev/null || echo 0)

    if (( size > LOG_MAX_SIZE )); then
        # 轮转旧文件
        for ((i = LOG_BACKUP_COUNT - 1; i >= 1; i--)); do
            [[ -f "${LOG_FILE}.$i" ]] && mv "${LOG_FILE}.$i" "${LOG_FILE}.$((i + 1))"
        done
        mv "$LOG_FILE" "${LOG_FILE}.1"
        touch "$LOG_FILE"
    fi
}

# 核心日志函数
_log() {
    local level=$1 level_name=$2 color=$3
    shift 3

    (( level < LOG_LEVEL )) && return 0

    local timestamp caller_info
    timestamp=$(date '+%Y-%m-%d %H:%M:%S.%3N' 2>/dev/null || date '+%Y-%m-%d %H:%M:%S')

    # 获取调用位置（跳过内部函数）
    local frame=1
    while [[ "${FUNCNAME[$frame]:-}" == _log* ]] || [[ "${FUNCNAME[$frame]:-}" == log_* ]]; do
        ((frame++))
    done
    local func="${FUNCNAME[$frame]:-main}"
    local line="${BASH_LINENO[$((frame - 1))]:-0}"
    caller_info="${func}:${line}"

    local message="$*"
    local log_line="[$timestamp] [$level_name] [$caller_info] $message"

    # 输出到终端（带颜色）
    echo -e "${color}${log_line}${_C_RESET}" >&2

    # 输出到文件（无颜色）
    if [[ -n "$LOG_FILE" ]]; then
        echo "$log_line" >> "$LOG_FILE"
        _log_rotate
    fi
}

# 公共日志函数
log_debug() { _log 0 "DEBUG" "$_C_GRAY" "$@"; }
log_info()  { _log 1 "INFO "  "$_C_GREEN" "$@"; }
log_warn()  { _log 2 "WARN "  "$_C_YELLOW" "$@"; }
log_error() { _log 3 "ERROR"  "$_C_RED" "$@"; }

# 结构化日志（JSON格式）
log_json() {
    local level="${1:-info}"
    local message="${2:-}"
    shift 2

    local timestamp
    timestamp=$(date -u '+%Y-%m-%dT%H:%M:%SZ')

    local extra=""
    while [[ $# -gt 0 ]]; do
        local key="${1%%=*}"
        local value="${1#*=}"
        extra+=", \"$key\": \"$value\""
        shift
    done

    echo "{\"timestamp\": \"${timestamp}\", \"level\": \"${level}\", \"message\": \"${message}\"${extra}}"
}

# 执行命令并记录
log_exec() {
    local cmd="$*"
    log_debug "Executing: $cmd"

    local output exit_code
    output="$("$@" 2>&1)" && exit_code=$? || exit_code=$?

    if (( exit_code == 0 )); then
        log_debug "Command succeeded"
        [[ -n "$output" ]] && log_debug "Output: $output"
    else
        log_error "Command failed with exit code $exit_code"
        [[ -n "$output" ]] && log_error "Output: $output"
    fi

    return $exit_code
}

# 计时日志
log_timer_start() {
    local name="${1:-default}"
    declare -g "_timer_${name}=$(date +%s%N)"
}

log_timer_end() {
    local name="${1:-default}"
    local start_var="_timer_${name}"
    local start="${!start_var:-}"

    if [[ -z "$start" ]]; then
        log_warn "Timer '$name' was not started"
        return
    fi

    local end duration_ms
    end=$(date +%s%N)
    duration_ms=$(( (end - start) / 1000000 ))

    log_info "Timer '$name': ${duration_ms}ms"
    unset "$start_var"
}

使用示例：

#!/usr/bin/env bash

source lib/logging.sh

# 配置
log_set_level debug
log_set_file "/var/log/myapp/deploy.log"

# 使用
log_info "部署开始"

log_timer_start "download"
# ... 下载操作
log_timer_end "download"

log_exec curl -sS "https://api.example.com/health"

# JSON格式日志（适合ELK等日志系统）
log_json info "User login" user_id=12345 ip="192.168.1.1"

技巧5: 错误处理和trap信号捕获

这是把Shell脚本从玩具变成生产工具的关键。

trap基础

trap 可以捕获信号并执行指定命令：

#!/usr/bin/env bash

# 基本语法
trap 'command' SIGNAL

# 常用信号
# EXIT  - 脚本退出时（正常或异常）
# ERR   - 命令返回非零时（需要set -e）
# INT   - Ctrl+C
# TERM  - kill命令
# HUP   - 终端关闭
# DEBUG - 每个命令执行前

完整的错误处理框架

#!/usr/bin/env bash
#
# 错误处理最佳实践
#

set -euo pipefail

# 临时文件跟踪
declare -a TEMP_FILES=()
declare -a TEMP_DIRS=()

# 创建临时文件（自动追踪）
make_temp_file() {
    local template="${1:-tmp.XXXXXX}"
    local tmpfile
    tmpfile=$(mktemp -t "$template")
    TEMP_FILES+=("$tmpfile")
    echo "$tmpfile"
}

# 创建临时目录（自动追踪）
make_temp_dir() {
    local template="${1:-tmpdir.XXXXXX}"
    local tmpdir
    tmpdir=$(mktemp -d -t "$template")
    TEMP_DIRS+=("$tmpdir")
    echo "$tmpdir"
}

# 清理函数
cleanup() {
    local exit_code=$?

    # 清理临时文件
    for f in "${TEMP_FILES[@]:-}"; do
        [[ -f "$f" ]] && rm -f "$f"
    done

    # 清理临时目录
    for d in "${TEMP_DIRS[@]:-}"; do
        [[ -d "$d" ]] && rm -rf "$d"
    done

    # 恢复终端设置（如果需要）
    # stty sane 2>/dev/null || true

    # 杀死后台进程
    jobs -p | xargs -r kill 2>/dev/null || true

    exit $exit_code
}

# 错误处理函数
error_handler() {
    local exit_code=$?
    local line_no=$1
    local bash_lineno=$2
    local last_command=$3
    local func_trace=$4

    echo ""
    echo "========== ERROR REPORT ==========" >&2
    echo "Exit code: $exit_code" >&2
    echo "Failed command: $last_command" >&2
    echo "Line number: $line_no" >&2
    echo "Function trace: $func_trace" >&2
    echo "" >&2

    # 打印调用栈
    echo "Call stack:" >&2
    local frame=0
    while caller $frame; do
        ((frame++))
    done 2>/dev/null | while read -r line sub file; do
        echo "  $file:$line in $sub()" >&2
    done
    echo "==================================" >&2
}

# 注册trap
trap cleanup EXIT
trap 'error_handler ${LINENO} "${BASH_LINENO}" "$BASH_COMMAND" "${FUNCNAME
:-main}"' ERR

# 优雅处理Ctrl+C
trap 'echo ""; echo "Interrupted by user"; exit 130' INT

# 处理TERM信号
trap 'echo "Received TERM signal, cleaning up..."; exit 143' TERM

# 示例：安全的文件操作
process_file() {
    local input_file="$1"

    # 使用临时文件进行原子操作
    local tmp_output
    tmp_output=$(make_temp_file "output.XXXXXX")

    # 处理文件
    # 如果失败，临时文件会被自动清理
    some_processing < "$input_file" > "$tmp_output"

    # 原子替换
    mv "$tmp_output" "${input_file}.processed"
}

# 重试机制
retry() {
    local max_attempts="${1:-3}"
    local delay="${2:-5}"
    shift 2
    local cmd=("$@")

    local attempt=1
    while true; do
        echo "Attempt $attempt/$max_attempts: ${cmd}" >&2

        if "${cmd[@]}"; then
            return 0
        fi

        if (( attempt >= max_attempts )); then
            echo "All $max_attempts attempts failed" >&2
            return 1
        fi

        echo "Failed, retrying in ${delay}s..." >&2
        sleep "$delay"
        ((attempt++))
    done
}

# 带超时的命令执行
run_with_timeout() {
    local timeout_secs="$1"
    shift

    # 使用timeout命令（GNU coreutils）
    if command -v timeout >/dev/null; then
        timeout --signal=TERM --kill-after=10 "$timeout_secs" "$@"
    else
        # macOS上使用perl实现
        perl -e '
            alarm shift @ARGV;
            exec @ARGV;
        ' "$timeout_secs" "$@"
    fi
}

# 使用示例
main() {
    echo "Starting process..."

    # 创建临时工作目录
    local work_dir
    work_dir=$(make_temp_dir "work.XXXXXX")
    echo "Working in: $work_dir"

    # 使用重试机制
    retry 3 5 curl -sS --fail "https://example.com/api/data"

    # 带超时执行
    run_with_timeout 30 long_running_command

    echo "Process completed"
}

main "$@"

常见错误处理模式

# 模式1: 检查命令是否存在
require_command() {
    local cmd="$1"
    if ! command -v "$cmd" >/dev/null; then
        echo "Error: Required command '$cmd' not found" >&2
        exit 1
    fi
}

require_command curl
require_command jq
require_command aws

# 模式2: 检查文件/目录
require_file() {
    local file="$1"
    if [[ ! -f "$file" ]]; then
        echo "Error: File not found: $file" >&2
        exit 1
    fi
    if [[ ! -r "$file" ]]; then
        echo "Error: File not readable: $file" >&2
        exit 1
    fi
}

# 模式3: 检查root权限
require_root() {
    if [[ $EUID -ne 0 ]]; then
        echo "Error: This script must be run as root" >&2
        exit 1
    fi
}

# 模式4: 互斥锁（防止重复执行）
acquire_lock() {
    local lock_file="${1:-/var/lock/$(basename "$0").lock}"

    exec 200>"$lock_file"

    if ! flock -n 200; then
        echo "Error: Another instance is running" >&2
        exit 1
    fi

    # 写入PID
    echo $$ >&200
}

# 模式5: 确认操作
confirm() {
    local prompt="${1:-Are you sure?}"
    local default="${2:-n}"

    local yn_prompt
    case "$default" in
        y|Y) yn_prompt="[Y/n]" ;;
        *)   yn_prompt="[y/N]" ;;
    esac

    read -rp "$prompt$yn_prompt " response
    response="${response:-$default}"

    [[ "$response" =~ ^[Yy]$ ]]
}

# 使用
if confirm "Delete all files in /tmp?"; then
    rm -rf /tmp/*
fi

技巧6: 并行执行

单线程处理大量任务效率太低。Shell有几种并行执行的方式。

方式1: 后台进程 + wait

#!/usr/bin/env bash
#
# 基础并行执行
#

set -euo pipefail

# 要处理的服务器列表
servers=(
    "server1.example.com"
    "server2.example.com"
    "server3.example.com"
    "server4.example.com"
)

# 记录每个后台任务的PID
declare -A pids

# 检查服务器健康状态
check_server() {
    local server="$1"
    echo "Checking $server..."

    if ssh -o ConnectTimeout=5 "$server" 'systemctl is-active nginx' >/dev/null; then
        echo "[OK] $server: nginx is running"
        return 0
    else
        echo "[FAIL] $server: nginx is not running"
        return 1
    fi
}

# 并行执行
for server in "${servers[@]}"; do
    check_server "$server" &
    pids["$server"]=$!
done

# 等待所有任务完成并收集结果
failed=0
for server in "${servers[@]}"; do
    if ! wait "${pids[$server]}"; then
        ((failed++))
    fi
done

echo ""
echo "Summary: $((${#servers[@]} - failed))/${#servers[@]} servers healthy"
exit $((failed > 0 ? 1 : 0))

方式2: xargs并行

#!/usr/bin/env bash
#
# xargs并行处理
#

set -euo pipefail

# 处理函数（需要export才能在xargs子进程中使用）
process_file() {
    local file="$1"
    echo "Processing: $file"
    gzip -9 "$file"
    echo "Done: $file"
}
export -f process_file

# 并行压缩文件（4个并发）
find /var/log -name "*.log" -mtime +7 -print0 | \
    xargs -0 -P 4 -I {} bash -c 'process_file "$@"' _ {}

# 更简单的写法（如果不需要复杂处理）
find /data -name "*.csv" -print0 | \
    xargs -0 -P 8 -n 1 gzip -9

# 处理命令输出
cat servers.txt | xargs -P 10 -I {} ssh {} 'uptime'

xargs参数说明：

-P N: 最大并行数
-n N: 每次处理N个参数
-I {}: 指定替换字符串
-0: 以null字符分隔输入

方式3: GNU Parallel（推荐）

parallel 是更强大的并行工具：

#!/usr/bin/env bash
#
# GNU Parallel 高级用法
#

set -euo pipefail

# 基础用法
cat servers.txt | parallel -j 10 ssh {} 'uptime'

# 从文件读取参数
parallel -a servers.txt ssh {} 'df -h'

# 多参数
parallel echo {1} {2} ::: A B C ::: 1 2 3
# 输出: A 1, A 2, A 3, B 1, B 2, B 3, C 1, C 2, C 3

# 显示进度
parallel --progress -j 4 gzip ::: *.log

# 带重试和超时
parallel --retries 3 --timeout 60 curl -sS {} ::: "${urls[@]}"

# 远程执行
parallel -S server1,server2,server3 'hostname; uptime'

# 保持输出顺序（默认是谁先完成谁先输出）
parallel --keep-order echo ::: {1..10}

# 输出到单独文件
parallel --results output_dir/ command ::: arg1 arg2 arg3

# 限制CPU使用
parallel --load 80% --noswap heavy_task ::: "${inputs[@]}"

# 显示执行时间
parallel --joblog job.log command ::: "${args[@]}"

实战：并行部署脚本

#!/usr/bin/env bash
#
# 并行部署到多台服务器
#

set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "${SCRIPT_DIR}/lib/logging.sh"

# 配置
MAX_PARALLEL="${MAX_PARALLEL:-5}"
DEPLOY_TIMEOUT="${DEPLOY_TIMEOUT:-300}"
SERVERS_FILE="${SERVERS_FILE:-servers.txt}"
DEPLOY_PACKAGE="${1:?Deploy package required}"

# 单台服务器部署函数
deploy_to_server() {
    local server="$1"
    local package="$2"
    local log_file="/tmp/deploy_${server//[^a-zA-Z0-9]/_}.log"

    {
        echo "=== Deploy to $server started at $(date) ==="

        # 1. 上传包
        echo "Uploading package..."
        if ! scp -o ConnectTimeout=10 "$package" "$server:/tmp/"; then
            echo "Failed to upload package"
            return 1
        fi

        # 2. 执行部署脚本
        echo "Running deploy script..."
        if ! ssh -o ConnectTimeout=10 "$server" "
            set -e
            cd /opt/app
            tar -xzf /tmp/$(basename "$package")
            ./scripts/restart.sh
            rm /tmp/$(basename "$package")
        "; then
            echo "Deploy script failed"
            return 1
        fi

        # 3. 健康检查
        echo "Health check..."
        sleep 5
        if ! ssh "$server" 'curl -sf http://localhost:8080/health'; then
            echo "Health check failed"
            return 1
        fi

        echo "=== Deploy to $server completed at $(date) ==="

    } > "$log_file" 2>&1

    local exit_code=$?
    if (( exit_code == 0 )); then
        echo "[SUCCESS] $server"
    else
        echo "[FAILED] $server - check $log_file"
    fi
    return $exit_code
}
export -f deploy_to_server

main() {
    if [[ ! -f "$SERVERS_FILE" ]]; then
        log_error "Servers file not found: $SERVERS_FILE"
        exit 1
    fi

    if [[ ! -f "$DEPLOY_PACKAGE" ]]; then
        log_error "Package not found: $DEPLOY_PACKAGE"
        exit 1
    fi

    local server_count
    server_count=$(wc -l < "$SERVERS_FILE")
    log_info "Starting parallel deploy to $server_count servers (max $MAX_PARALLEL concurrent)"

    # 使用parallel执行部署
    local failed=0
    if ! parallel \
        --timeout "$DEPLOY_TIMEOUT" \
        --jobs "$MAX_PARALLEL" \
        --joblog /tmp/deploy_joblog.txt \
        --results /tmp/deploy_results/ \
        deploy_to_server {} "$DEPLOY_PACKAGE" \
        < "$SERVERS_FILE"; then
        failed=1
    fi

    # 汇总结果
    echo ""
    log_info "=== Deploy Summary ==="

    local success_count fail_count
    success_count=$(grep -c "SUCCESS" /tmp/deploy_results/*/stdout 2>/dev/null || echo 0)
    fail_count=$(grep -c "FAILED" /tmp/deploy_results/*/stdout 2>/dev/null || echo 0)

    log_info "Success: $success_count"
    log_info "Failed: $fail_count"

    if (( fail_count > 0 )); then
        log_warn "Failed servers:"
        grep -l "FAILED" /tmp/deploy_results/*/stdout | while read -r f; do
            cat "$f"
        done
    fi

    return $failed
}

main "$@"

技巧7: 配置文件管理

硬编码配置是新手常犯的错误。我见过太多脚本写死了IP地址、密码，换个环境就不能用。

方式1: Shell格式配置文件

最简单直接的方式：

# conf/default.conf
# 默认配置

# 应用配置
APP_NAME="myapp"
APP_PORT=8080
APP_ENV="development"

# 数据库配置
DB_HOST="localhost"
DB_PORT=3306
DB_NAME="myapp"
DB_USER="app"
# DB_PASS应该通过环境变量传入，不要写在配置文件里

# 日志配置
LOG_LEVEL="info"
LOG_DIR="/var/log/myapp"

# 路径配置
DEPLOY_DIR="/opt/myapp"
BACKUP_DIR="/opt/backups"

#!/usr/bin/env bash
#
# 加载配置文件
#

set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
CONFIG_DIR="${SCRIPT_DIR}/../conf"

# 加载配置函数
load_config() {
    local config_file="$1"

    if [[ ! -f "$config_file" ]]; then
        echo "Config file not found: $config_file" >&2
        return 1
    fi

    # 安全检查：确保配置文件只包含变量赋值
    if grep -qvE '^\s*(#.*)?$|^[a-zA-Z_][a-zA-Z0-9_]*=' "$config_file"; then
        echo "Config file contains invalid content: $config_file" >&2
        return 1
    fi

    # shellcheck source=/dev/null
    source "$config_file"
}

# 加载默认配置
load_config "${CONFIG_DIR}/default.conf"

# 根据环境覆盖配置
if [[ -n "${APP_ENV:-}" ]] && [[ -f "${CONFIG_DIR}/${APP_ENV}.conf" ]]; then
    load_config "${CONFIG_DIR}/${APP_ENV}.conf"
fi

# 环境变量优先级最高
APP_NAME="${APP_NAME_OVERRIDE:-$APP_NAME}"

方式2: INI格式配置

# conf/app.ini
[database]
host = localhost
port = 3306
name = myapp
user = app

[server]
host = 0.0.0.0
port = 8080
workers = 4

[logging]
level = info
file = /var/log/myapp/app.log

解析脚本：

#!/usr/bin/env bash
#
# INI配置文件解析器
#

set -euo pipefail

# 解析INI文件到关联数组
# 用法: parse_ini config.ini
# 结果存储在 CONFIG 关联数组中，key格式为 "section.key"
declare -gA CONFIG

parse_ini() {
    local file="$1"
    local section=""
    local line key value

    while IFS= read -r line || [[ -n "$line" ]]; do
        # 去除前后空白
        line="${line#"${line%%[![:space:]]*}"}"
        line="${line%"${line##*[![:space:]]}"}"

        # 跳过空行和注释
        [[ -z "$line" ]] && continue
        [[ "$line" =~ ^[#\;] ]] && continue

        # 解析section
        if [[ "$line" =~ ^\[([^\]]+)\]$ ]]; then
            section="${BASH_REMATCH[1]}"
            continue
        fi

        # 解析key=value
        if [[ "$line" =~ ^([^=]+)=(.*)$ ]]; then
            key="${BASH_REMATCH[1]}"
            value="${BASH_REMATCH[2]}"

            # 去除key/value的空白
            key="${key%"${key##*[![:space:]]}"}"
            value="${value#"${value%%[![:space:]]*}"}"

            # 去除值的引号
            if [[ "$value" =~ ^\"(.*)\"$ ]] || [[ "$value" =~ ^\'(.*)\'$ ]]; then
                value="${BASH_REMATCH[1]}"
            fi

            if [[ -n "$section" ]]; then
                CONFIG["${section}.${key}"]="$value"
            else
                CONFIG["$key"]="$value"
            fi
        fi
    done < "$file"
}

# 获取配置值（带默认值）
config_get() {
    local key="$1"
    local default="${2:-}"
    echo "${CONFIG[$key]:-$default}"
}

# 使用示例
parse_ini "conf/app.ini"

db_host=$(config_get "database.host" "localhost")
db_port=$(config_get "database.port" "3306")
log_level=$(config_get "logging.level" "info")

echo "Database: $db_host:$db_port"
echo "Log level: $log_level"

方式3: YAML/JSON配置

对于复杂配置，用YAML或JSON更合适，但需要借助外部工具：

#!/usr/bin/env bash
#
# YAML/JSON配置解析
#

set -euo pipefail

# 检查yq是否安装
require_yq() {
    if ! command -v yq >/dev/null; then
        echo "Error: yq is required for YAML parsing" >&2
        echo "Install: brew install yq  OR  pip install yq" >&2
        exit 1
    fi
}

# 读取YAML配置
yaml_get() {
    local file="$1"
    local path="$2"
    local default="${3:-}"

    local value
    value=$(yq eval "$path // \"$default\"" "$file" 2>/dev/null)

    # yq返回null时使用默认值
    if [[ "$value" == "null" ]]; then
        echo "$default"
    else
        echo "$value"
    fi
}

# 检查jq是否安装
require_jq() {
    if ! command -v jq >/dev/null; then
        echo "Error: jq is required for JSON parsing" >&2
        exit 1
    fi
}

# 读取JSON配置
json_get() {
    local file="$1"
    local path="$2"
    local default="${3:-}"

    local value
    value=$(jq -r "$path // \"$default\"" "$file" 2>/dev/null)

    if [[ "$value" == "null" ]]; then
        echo "$default"
    else
        echo "$value"
    fi
}

# 使用示例
# YAML
require_yq
db_host=$(yaml_get "config.yaml" ".database.host" "localhost")
servers=$(yaml_get "config.yaml" ".servers[]" | tr '\n' ' ')

# JSON
require_jq
api_key=$(json_get "config.json" ".api.key" "")
endpoints=$(json_get "config.json" ".endpoints | keys[]" | tr '\n' ' ')

配置验证

#!/usr/bin/env bash
#
# 配置验证框架
#

set -euo pipefail

# 验证必填配置
validate_required() {
    local var_name="$1"
    local var_value="${!var_name:-}"

    if [[ -z "$var_value" ]]; then
        echo "Error: Required config '$var_name' is not set" >&2
        return 1
    fi
}

# 验证数字
validate_number() {
    local var_name="$1"
    local var_value="${!var_name:-}"
    local min="${2:-}"
    local max="${3:-}"

    if [[ ! "$var_value" =~ ^[0-9]+$ ]]; then
        echo "Error: Config '$var_name' must be a number, got: $var_value" >&2
        return 1
    fi

    if [[ -n "$min" ]] && (( var_value < min )); then
        echo "Error: Config '$var_name' must be >= $min, got: $var_value" >&2
        return 1
    fi

    if [[ -n "$max" ]] && (( var_value > max )); then
        echo "Error: Config '$var_name' must be <= $max, got: $var_value" >&2
        return 1
    fi
}

# 验证枚举值
validate_enum() {
    local var_name="$1"
    local var_value="${!var_name:-}"
    shift
    local valid_values=("$@")

    local valid
    for valid in "${valid_values[@]}"; do
        if [[ "$var_value" == "$valid" ]]; then
            return 0
        fi
    done

    echo "Error: Config '$var_name' must be one of: ${valid_values}, got: $var_value" >&2
    return 1
}

# 验证路径存在
validate_path() {
    local var_name="$1"
    local var_value="${!var_name:-}"
    local type="${2:-file}"  # file, dir, or any

    case "$type" in
        file)
            if [[ ! -f "$var_value" ]]; then
                echo "Error: Config '$var_name' file not found: $var_value" >&2
                return 1
            fi
            ;;
        dir)
            if [[ ! -d "$var_value" ]]; then
                echo "Error: Config '$var_name' directory not found: $var_value" >&2
                return 1
            fi
            ;;
        any)
            if [[ ! -e "$var_value" ]]; then
                echo "Error: Config '$var_name' path not found: $var_value" >&2
                return 1
            fi
            ;;
    esac
}

# 使用示例
validate_config() {
    local errors=0

    validate_required "DB_HOST"    || ((errors++))
    validate_required "DB_NAME"    || ((errors++))
    validate_number "DB_PORT" 1 65535 || ((errors++))
    validate_enum "LOG_LEVEL" debug info warn error || ((errors++))
    validate_path "CONFIG_FILE" file || ((errors++))
    validate_path "LOG_DIR" dir || ((errors++))

    if (( errors > 0 )); then
        echo "Configuration validation failed with $errors error(s)" >&2
        return 1
    fi

    echo "Configuration validation passed"
}

技巧8: 单元测试 (bats框架)

说实话，刚开始写Shell脚本时我从来不写测试。直到有一次改了一个"小bug"，结果把生产环境的备份脚本搞坏了，三天没有备份。从那以后我就老实了。

bats框架介绍

bats (Bash Automated Testing System) 是最流行的Shell测试框架：

# 安装
# macOS
brew install bats-core

# Ubuntu/Debian
sudo apt install bats

# 从源码安装
git clone https://github.com/bats-core/bats-core.git
cd bats-core
./install.sh /usr/local

基本测试结构

#!/usr/bin/env bats
#
# tests/test_utils.bats
#

# 测试前的setup
setup() {
    # 加载要测试的脚本
    load '../lib/utils.sh'

    # 创建测试用的临时目录
    TEST_TEMP_DIR="$(mktemp -d)"
}

# 测试后的cleanup
teardown() {
    # 清理临时文件
    rm -rf "$TEST_TEMP_DIR"
}

# 测试用例
@test "is_number returns 0 for valid numbers" {
    run is_number 123
    [ "$status" -eq 0 ]

    run is_number 0
    [ "$status" -eq 0 ]

    run is_number -456
    [ "$status" -eq 0 ]
}

@test "is_number returns 1 for non-numbers" {
    run is_number "abc"
    [ "$status" -eq 1 ]

    run is_number "12.34"  # 如果只支持整数
    [ "$status" -eq 1 ]

    run is_number ""
    [ "$status" -eq 1 ]
}

@test "trim removes leading and trailing whitespace" {
    result=$(trim "  hello world  ")
    [ "$result" = "hello world" ]
}

@test "file_exists detects existing files" {
    touch "$TEST_TEMP_DIR/testfile"

    run file_exists "$TEST_TEMP_DIR/testfile"
    [ "$status" -eq 0 ]
}

@test "file_exists returns 1 for non-existing files" {
    run file_exists "$TEST_TEMP_DIR/nonexistent"
    [ "$status" -eq 1 ]
}

运行测试：

# 运行所有测试
bats tests/

# 运行单个测试文件
bats tests/test_utils.bats

# 显示详细输出
bats --verbose-run tests/

# TAP格式输出（CI集成）
bats --formatter tap tests/

高级测试技巧

#!/usr/bin/env bats
#
# tests/test_deploy.bats - 部署脚本测试
#

# 加载bats辅助库
load 'test_helper/bats-support/load'
load 'test_helper/bats-assert/load'

# 全局setup（整个测试文件执行一次）
setup_file() {
    export TEST_DIR="$(mktemp -d)"
    export ORIG_PATH="$PATH"

    # 创建mock命令
    mkdir -p "$TEST_DIR/bin"
    cat > "$TEST_DIR/bin/ssh" << 'EOF'
#!/bin/bash
echo "MOCK SSH: $*"
exit 0
EOF
    chmod +x "$TEST_DIR/bin/ssh"

    export PATH="$TEST_DIR/bin:$PATH"
}

teardown_file() {
    export PATH="$ORIG_PATH"
    rm -rf "$TEST_DIR"
}

setup() {
    source ./deploy.sh
}

# 测试输出内容
@test "usage displays help message" {
    run usage
    assert_success
    assert_output --partial "Usage:"
    assert_output --partial "--help"
}

# 测试参数解析
@test "parse_args sets environment correctly" {
    parse_args --env production --version 1.0.0
    assert_equal "${CONFIG[environment]}" "production"
}

@test "parse_args fails without required version" {
    run parse_args --env staging
    assert_failure
    assert_output --partial "Version is required"
}

# 测试验证逻辑
@test "validate_args rejects invalid environment" {
    CONFIG[environment]="invalid"
    CONFIG[version]="1.0.0"

    run validate_args
    assert_failure
    assert_output --partial "Invalid environment"
}

# Mock外部命令
@test "deploy uses ssh to connect to servers" {
    # 这里用的是setup_file中创建的mock ssh
    run deploy_to_server "server1.example.com" "/tmp/package.tar.gz"
    assert_success
    assert_output --partial "MOCK SSH: server1.example.com"
}

# 测试文件操作
@test "backup creates timestamped backup file" {
    local test_file="$TEST_DIR/test.conf"
    echo "test content" > "$test_file"

    run create_backup "$test_file"
    assert_success

    # 验证备份文件存在
    local backup_files
    backup_files=$(ls "$TEST_DIR"/test.conf.backup.* 2>/dev/null | wc -l)
    assert_equal "$backup_files" "1"
}

# 测试错误处理
@test "handles network timeout gracefully" {
    # 创建一个会超时的mock
    cat > "$TEST_DIR/bin/curl" << 'EOF'
#!/bin/bash
sleep 10
EOF
    chmod +x "$TEST_DIR/bin/curl"

    run timeout 2 health_check "http://localhost:8080"
    assert_failure
}

# 参数化测试
@test "validate_version accepts semantic versions" {
    local valid_versions=("1.0.0" "1.2.3" "0.0.1" "10.20.30" "1.0.0-alpha" "1.0.0-rc.1")

    for v in "${valid_versions[@]}"; do
        run validate_version "$v"
        assert_success "Expected $v to be valid"
    done
}

@test "validate_version rejects invalid versions" {
    local invalid_versions=("1.0" "v1.0.0" "1.0.0.0" "abc" "")

    for v in "${invalid_versions[@]}"; do
        run validate_version "$v"
        assert_failure "Expected $v to be invalid"
    done
}

测试覆盖率

#!/usr/bin/env bash
#
# 简单的覆盖率统计脚本
#

# 使用kcov进行覆盖率分析（需要安装kcov）
run_coverage() {
    local script="$1"
    local output_dir="coverage"

    mkdir -p "$output_dir"

    kcov --include-pattern=.sh \
         --exclude-pattern="tests/,/tmp/" \
         "$output_dir" \
         bats tests/

    echo "Coverage report: $output_dir/index.html"
}

# 或者使用bashcov
run_bashcov() {
    bashcov bats tests/
}

技巧9: 脚本模板和代码片段

经过多年积累，我有一套自己常用的模板和代码片段。

完整的生产级脚本模板

#!/usr/bin/env bash
#===============================================================================
#
#          FILE: template.sh
#
#         USAGE: ./template.sh [options] [arguments]
#
#   DESCRIPTION: 脚本功能描述
#
#     OPTIONS: ---
#  REQUIREMENTS: bash >= 4.0, curl, jq
#         BUGS: ---
#        NOTES: ---
#       AUTHOR: Your Name (your.email@example.com)
# ORGANIZATION: Your Company
#       CREATED: 2025-01-06
#      REVISION: 1.0.0
#
#===============================================================================

#-------------------------------------------------------------------------------
# 严格模式和基础设置
#-------------------------------------------------------------------------------

set -euo pipefail
IFS=$'\n\t'

# 脚本版本
readonly VERSION="1.0.0"

# 脚本路径信息
readonly SCRIPT_NAME="$(basename "${BASH_SOURCE[0]}")"
readonly SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
readonly SCRIPT_PATH="${SCRIPT_DIR}/${SCRIPT_NAME}"

# 运行时信息
readonly RUN_TIMESTAMP="$(date '+%Y%m%d_%H%M%S')"
readonly RUN_ID="${RUN_TIMESTAMP}_$$"

#-------------------------------------------------------------------------------
# 默认配置
#-------------------------------------------------------------------------------

# 可通过命令行参数或环境变量覆盖
declare -A CONFIG=(
    [log_level]="${LOG_LEVEL:-info}"
    [log_file]=""
    [dry_run]="false"
    [verbose]="false"
    [config_file]=""
)

# 位置参数
declare -a ARGS=()

#-------------------------------------------------------------------------------
# 颜色定义（仅终端）
#-------------------------------------------------------------------------------

if [[ -t 1 ]]; then
    readonly RED=$'\033[0;31m'
    readonly GREEN=$'\033[0;32m'
    readonly YELLOW=$'\033[0;33m'
    readonly BLUE=$'\033[0;34m'
    readonly MAGENTA=$'\033[0;35m'
    readonly CYAN=$'\033[0;36m'
    readonly WHITE=$'\033[0;37m'
    readonly BOLD=$'\033[1m'
    readonly RESET=$'\033[0m'
else
    readonly RED='' GREEN='' YELLOW='' BLUE=''
    readonly MAGENTA='' CYAN='' WHITE='' BOLD='' RESET=''
fi

#-------------------------------------------------------------------------------
# 日志函数
#-------------------------------------------------------------------------------

_log() {
    local level="$1"
    local color="$2"
    shift 2

    local timestamp
    timestamp="$(date '+%Y-%m-%d %H:%M:%S')"

    # 检查日志级别
    local -A levels=([debug]=0 [info]=1 [warn]=2 [error]=3)
    local current_level="${levels[${CONFIG[log_level]}]:-1}"
    local msg_level="${levels[$level]:-1}"

    (( msg_level < current_level )) && return

    local message="[$timestamp] [${level^^}] $*"

    # 输出到终端
    echo -e "${color}${message}${RESET}" >&2

    # 输出到文件
    if [[ -n "${CONFIG[log_file]}" ]]; then
        echo "$message" >> "${CONFIG[log_file]}"
    fi
}

log_debug() { _log debug "$BLUE"   "$@"; }
log_info()  { _log info  "$GREEN"  "$@"; }
log_warn()  { _log warn  "$YELLOW" "$@"; }
log_error() { _log error "$RED"    "$@"; }

# 带前缀的输出
print_header() {
    echo ""
    echo "${BOLD}${CYAN}=== $* ===${RESET}"
    echo ""
}

print_step() {
    echo "${BLUE}>>> $*${RESET}"
}

print_success() {
    echo "${GREEN}[OK] $*${RESET}"
}

print_warning() {
    echo "${YELLOW}[WARN] $*${RESET}"
}

print_error() {
    echo "${RED}[ERROR] $*" >&2
}

#-------------------------------------------------------------------------------
# 工具函数
#-------------------------------------------------------------------------------

# 检查命令是否存在
require_command() {
    local cmd="$1"
    local install_hint="${2:-}"

    if ! command -v "$cmd" >/dev/null; then
        log_error "Required command not found: $cmd"
        [[ -n "$install_hint" ]] && log_error "Install hint: $install_hint"
        exit 1
    fi
}

# 确认操作
confirm() {
    local prompt="${1:-Continue?}"
    local default="${2:-n}"

    [[ "${CONFIG[dry_run]}" == "true" ]] && return 0

    local yn_hint
    [[ "$default" == "y" ]] && yn_hint="[Y/n]" || yn_hint="[y/N]"

    read -rp "${YELLOW}${prompt}${yn_hint}${RESET} " response
    response="${response:-$default}"

    [[ "$response" =~ ^[Yy]$ ]]
}

# 执行命令（支持dry-run）
run_cmd() {
    if [[ "${CONFIG[dry_run]}" == "true" ]]; then
        log_info "[DRY-RUN] Would execute: $*"
        return 0
    fi

    if [[ "${CONFIG[verbose]}" == "true" ]]; then
        log_debug "Executing: $*"
    fi

    "$@"
}

# 创建目录（如果不存在）
ensure_dir() {
    local dir="$1"
    if [[ ! -d "$dir" ]]; then
        run_cmd mkdir -p "$dir"
    fi
}

# 检查文件是否存在且可读
require_file() {
    local file="$1"
    local desc="${2:-File}"

    if [[ ! -f "$file" ]]; then
        log_error "$desc not found: $file"
        exit 1
    fi

    if [[ ! -r "$file" ]]; then
        log_error "$desc not readable: $file"
        exit 1
    fi
}

#-------------------------------------------------------------------------------
# 清理和信号处理
#-------------------------------------------------------------------------------

declare -a CLEANUP_TASKS=()
declare -a TEMP_FILES=()

# 注册清理任务
register_cleanup() {
    CLEANUP_TASKS+=("$*")
}

# 创建临时文件
create_temp_file() {
    local template="${1:-tmp.XXXXXX}"
    local tmp
    tmp="$(mktemp -t "$template")"
    TEMP_FILES+=("$tmp")
    echo "$tmp"
}

# 清理函数
cleanup() {
    local exit_code=$?

    log_debug "Running cleanup tasks..."

    # 执行注册的清理任务
    for task in "${CLEANUP_TASKS[@]:-}"; do
        eval "$task" || true
    done

    # 删除临时文件
    for tmp in "${TEMP_FILES[@]:-}"; do
        [[ -f "$tmp" ]] && rm -f "$tmp"
    done

    exit $exit_code
}

# 错误处理
on_error() {
    local exit_code=$?
    local line_no="$1"
    local command="$2"

    log_error "Command failed with exit code $exit_code"
    log_error "  Line: $line_no"
    log_error "  Command: $command"

    # 打印调用栈
    local frame=0
    log_error "  Call stack:"
    while caller $frame; do
        ((frame++))
    done 2>/dev/null | while read -r line sub file; do
        log_error "$file:$line in $sub()"
    done
}

# 注册信号处理
trap cleanup EXIT
trap 'on_error ${LINENO} "$BASH_COMMAND"' ERR
trap 'log_warn "Interrupted by user"; exit 130' INT
trap 'log_warn "Received TERM signal"; exit 143' TERM

#-------------------------------------------------------------------------------
# 参数解析
#-------------------------------------------------------------------------------

usage() {
    cat << EOF
${BOLD}${SCRIPT_NAME}${RESET} v${VERSION} - 脚本功能描述

${BOLD}USAGE:${RESET}
${SCRIPT_NAME} [OPTIONS] [ARGUMENTS]

${BOLD}OPTIONS:${RESET}
    -c, --config FILE     配置文件路径
    -l, --log-level LEVEL 日志级别 (debug|info|warn|error)
        --log-file FILE   日志文件路径
    -n, --dry-run         干运行模式，不实际执行
    -v, --verbose         详细输出模式
    -h, --help            显示帮助信息
        --version         显示版本信息

${BOLD}ARGUMENTS:${RESET}
    argument1             第一个参数说明
    argument2             第二个参数说明

${BOLD}ENVIRONMENT:${RESET}
    LOG_LEVEL             等同于 --log-level
    CONFIG_FILE           等同于 --config

${BOLD}EXAMPLES:${RESET}
    # 基本用法
    ${SCRIPT_NAME} argument1

    # 使用配置文件
    ${SCRIPT_NAME} -c config.conf argument1

    # 干运行模式
    ${SCRIPT_NAME} --dry-run --verbose argument1

${BOLD}FILES:${RESET}
    /etc/${SCRIPT_NAME%.sh}/config.conf  默认配置文件
    /var/log/${SCRIPT_NAME%.sh}/         默认日志目录

EOF
}

parse_args() {
    while [[ $# -gt 0 ]]; do
        case "$1" in
            -c|--config)
                [[ $# -lt 2 ]] && { log_error "Option $1 requires an argument"; exit 1; }
                CONFIG[config_file]="$2"
                shift 2
                ;;
            -l|--log-level)
                [[ $# -lt 2 ]] && { log_error "Option $1 requires an argument"; exit 1; }
                CONFIG[log_level]="$2"
                shift 2
                ;;
            --log-file)
                [[ $# -lt 2 ]] && { log_error "Option $1 requires an argument"; exit 1; }
                CONFIG[log_file]="$2"
                shift 2
                ;;
            -n|--dry-run)
                CONFIG[dry_run]="true"
                shift
                ;;
            -v|--verbose)
                CONFIG[verbose]="true"
                shift
                ;;
            -h|--help)
                usage
                exit 0
                ;;
            --version)
                echo "${SCRIPT_NAME} v${VERSION}"
                exit 0
                ;;
            --)
                shift
                ARGS+=("$@")
                break
                ;;
            -*)
                log_error "Unknown option: $1"
                echo "Use --help for usage information" >&2
                exit 1
                ;;
            *)
                ARGS+=("$1")
                shift
                ;;
        esac
    done
}

validate_args() {
    # 验证日志级别
    if [[ ! "${CONFIG[log_level]}" =~ ^(debug|info|warn|error)$ ]]; then
        log_error "Invalid log level: ${CONFIG[log_level]}"
        exit 1
    fi

    # 验证配置文件
    if [[ -n "${CONFIG[config_file]}" ]]; then
        require_file "${CONFIG[config_file]}" "Config file"
    fi

    # 验证必要参数
    # if (( ${#ARGS[@]} < 1 )); then
    #     log_error "At least one argument is required"
    #     exit 1
    # fi
}

load_config() {
    local config_file="${CONFIG[config_file]}"

    [[ -z "$config_file" ]] && return

    log_debug "Loading config from: $config_file"

    # shellcheck source=/dev/null
    source "$config_file"
}

#-------------------------------------------------------------------------------
# 主要业务逻辑
#-------------------------------------------------------------------------------

# 在这里实现你的业务逻辑
do_something() {
    local arg="${1:-}"

    print_step "Doing something with: $arg"

    # 示例：执行命令
    run_cmd echo "Hello, $arg"

    print_success "Done with: $arg"
}

#-------------------------------------------------------------------------------
# 主函数
#-------------------------------------------------------------------------------

main() {
    # 解析参数
    parse_args "$@"

    # 加载配置
    load_config

    # 验证参数
    validate_args

    # 检查依赖
    require_command bash "Should be pre-installed"

    # 打印运行信息
    print_header "Starting ${SCRIPT_NAME} v${VERSION}"
    log_info "Run ID: ${RUN_ID}"
    log_info "Working directory: $(pwd)"

    if [[ "${CONFIG[dry_run]}" == "true" ]]; then
        log_warn "Running in DRY-RUN mode"
    fi

    if [[ "${CONFIG[verbose]}" == "true" ]]; then
        log_debug "Configuration:"
        for key in "${!CONFIG[@]}"; do
            log_debug "$key = ${CONFIG[$key]}"
        done
    fi

    # 执行主逻辑
    for arg in "${ARGS[@]:-default}"; do
        do_something "$arg"
    done

    print_header "Completed successfully"
}

# 执行入口
main "$@"

常用代码片段

#===============================================================================
# 代码片段集合
#===============================================================================

#-------------------------------------------------------------------------------
# 字符串处理
#-------------------------------------------------------------------------------

# 去除首尾空白
trim() {
    local var="$*"
    var="${var#"${var%%[![:space:]]*}"}"
    var="${var%"${var##*[![:space:]]}"}"
    echo "$var"
}

# 转换为小写
to_lower() {
    echo "${1,,}"
}

# 转换为大写
to_upper() {
    echo "${1^^}"
}

# 字符串包含检查
contains() {
    local string="$1"
    local substring="$2"
    [[ "$string" == *"$substring"* ]]
}

# 字符串开头检查
starts_with() {
    local string="$1"
    local prefix="$2"
    [[ "$string" == "$prefix"* ]]
}

# 字符串结尾检查
ends_with() {
    local string="$1"
    local suffix="$2"
    [[ "$string" == *"$suffix" ]]
}

# URL编码
urlencode() {
    local string="$1"
    python3 -c "import urllib.parse; print(urllib.parse.quote('$string'))"
}

#-------------------------------------------------------------------------------
# 数字处理
#-------------------------------------------------------------------------------

# 检查是否为整数
is_integer() {
    local var="$1"
    [[ "$var" =~ ^-?[0-9]+$ ]]
}

# 检查是否为正整数
is_positive_integer() {
    local var="$1"
    [[ "$var" =~ ^[1-9][0-9]*$ ]]
}

# 数字范围检查
in_range() {
    local num="$1"
    local min="$2"
    local max="$3"
    (( num >= min && num <= max ))
}

# 人类可读的大小转换
human_readable_size() {
    local bytes="$1"
    if (( bytes < 1024 )); then
        echo "${bytes}B"
    elif (( bytes < 1048576 )); then
        echo "$(( bytes / 1024 ))KB"
    elif (( bytes < 1073741824 )); then
        echo "$(( bytes / 1048576 ))MB"
    else
        echo "$(( bytes / 1073741824 ))GB"
    fi
}

#-------------------------------------------------------------------------------
# 数组操作
#-------------------------------------------------------------------------------

# 数组是否包含元素
array_contains() {
    local needle="$1"
    shift
    local hay
    for hay in "$@"; do
        [[ "$hay" == "$needle" ]] && return 0
    done
    return 1
}

# 数组去重
array_unique() {
    printf '%s\n' "$@" | sort -u
}

# 数组交集
array_intersect() {
    local -n arr1=$1
    local -n arr2=$2
    local result=()

    for item in "${arr1[@]}"; do
        if array_contains "$item" "${arr2[@]}"; then
            result+=("$item")
        fi
    done

    printf '%s\n' "${result[@]}"
}

#-------------------------------------------------------------------------------
# 文件操作
#-------------------------------------------------------------------------------

# 安全的文件备份
backup_file() {
    local file="$1"
    local backup_dir="${2:-$(dirname "$file")}"
    local timestamp="$(date '+%Y%m%d_%H%M%S')"

    if [[ -f "$file" ]]; then
        local backup_file="${backup_dir}/$(basename "$file").${timestamp}.bak"
        cp "$file" "$backup_file"
        echo "$backup_file"
    fi
}

# 原子写入文件
atomic_write() {
    local file="$1"
    local content="$2"
    local tmp_file
    tmp_file="$(mktemp)"

    echo "$content" > "$tmp_file"
    mv "$tmp_file" "$file"
}

# 递归查找文件
find_files() {
    local dir="$1"
    local pattern="${2:-*}"
    local type="${3:-f}"

    find "$dir" -type "$type" -name "$pattern" 2>/dev/null
}

# 获取文件的绝对路径
abs_path() {
    local path="$1"
    if [[ -d "$path" ]]; then
        (cd "$path" && pwd)
    else
        echo "$(cd "$(dirname "$path")" && pwd)/$(basename "$path")"
    fi
}

#-------------------------------------------------------------------------------
# 网络操作
#-------------------------------------------------------------------------------

# 检查端口是否开放
is_port_open() {
    local host="$1"
    local port="$2"
    local timeout="${3:-5}"

    timeout "$timeout" bash -c "cat < /dev/null > /dev/tcp/$host/$port" 2>/dev/null
}

# 等待端口开放
wait_for_port() {
    local host="$1"
    local port="$2"
    local timeout="${3:-60}"
    local interval="${4:-2}"

    local elapsed=0
    while (( elapsed < timeout )); do
        if is_port_open "$host" "$port"; then
            return 0
        fi
        sleep "$interval"
        (( elapsed += interval ))
    done

    return 1
}

# 简单的HTTP GET
http_get() {
    local url="$1"
    local timeout="${2:-30}"

    if command -v curl >/dev/null; then
        curl -sS --fail --max-time "$timeout" "$url"
    elif command -v wget >/dev/null; then
        wget -qO- --timeout="$timeout" "$url"
    else
        echo "Error: Neither curl nor wget is available" >&2
        return 1
    fi
}

# 下载文件
download_file() {
    local url="$1"
    local output="$2"
    local timeout="${3:-300}"

    if command -v curl >/dev/null; then
        curl -sS --fail --max-time "$timeout" -o "$output" "$url"
    elif command -v wget >/dev/null; then
        wget -q --timeout="$timeout" -O "$output" "$url"
    else
        echo "Error: Neither curl nor wget is available" >&2
        return 1
    fi
}

#-------------------------------------------------------------------------------
# 日期时间
#-------------------------------------------------------------------------------

# 当前时间戳
timestamp() {
    date '+%Y-%m-%d %H:%M:%S'
}

# ISO 8601格式
timestamp_iso() {
    date -u '+%Y-%m-%dT%H:%M:%SZ'
}

# 计算时间差（秒）
time_diff() {
    local start="$1"
    local end="$2"

    local start_sec end_sec
    start_sec=$(date -d "$start" '+%s')
    end_sec=$(date -d "$end" '+%s')

    echo $(( end_sec - start_sec ))
}

# 格式化持续时间
format_duration() {
    local seconds="$1"
    local hours=$(( seconds / 3600 ))
    local minutes=$(( (seconds % 3600) / 60 ))
    local secs=$(( seconds % 60 ))

    if (( hours > 0 )); then
        printf '%dh %dm %ds' "$hours" "$minutes" "$secs"
    elif (( minutes > 0 )); then
        printf '%dm %ds' "$minutes" "$secs"
    else
        printf '%ds' "$secs"
    fi
}

#-------------------------------------------------------------------------------
# 进度显示
#-------------------------------------------------------------------------------

# 简单的进度条
progress_bar() {
    local current="$1"
    local total="$2"
    local width="${3:-50}"
    local prefix="${4:-Progress}"

    local percent=$(( current * 100 / total ))
    local filled=$(( current * width / total ))
    local empty=$(( width - filled ))

    printf '\r%s: [' "$prefix"
    printf '%*s' "$filled" '' | tr ' ' '#'
    printf '%*s' "$empty" '' | tr ' ' '-'
    printf '] %3d%%' "$percent"

    if (( current == total )); then
        echo ""
    fi
}

# 旋转等待指示器
spinner() {
    local pid="$1"
    local message="${2:-Working}"
    local delay=0.1
    local spinchars='|/-\'

    while kill -0 "$pid" 2>/dev/null; do
        for (( i=0; i<${#spinchars}; i++ )); do
            printf '\r%s %s' "$message" "${spinchars:$i:1}"
            sleep "$delay"
        done
    done

    printf '\r%s done\n' "$message"
}

#-------------------------------------------------------------------------------
# 版本比较
#-------------------------------------------------------------------------------

# 比较语义化版本
# 返回: 0 (等于), 1 (大于), 2 (小于)
version_compare() {
    local v1="$1"
    local v2="$2"

    if [[ "$v1" == "$v2" ]]; then
        return 0
    fi

    local IFS='.'
    local i
    local -a ver1=($v1) ver2=($v2)

    for (( i=0; i<${#ver1[@]} || i<${#ver2[@]}; i++ )); do
        local num1="${ver1[i]:-0}"
        local num2="${ver2[i]:-0}"

        if (( num1 > num2 )); then
            return 1
        elif (( num1 < num2 )); then
            return 2
        fi
    done

    return 0
}

# 检查版本是否满足要求
version_gte() {
    local current="$1"
    local required="$2"

    version_compare "$current" "$required"
    local result=$?

    (( result == 0 || result == 1 ))
}

技巧10: shellcheck静态检查

shellcheck是Shell脚本的救星。它能发现很多隐蔽的bug，我现在写任何脚本第一时间都会跑一遍shellcheck。

安装和基本用法

# 安装
# macOS
brew install shellcheck

# Ubuntu/Debian
sudo apt install shellcheck

# CentOS/RHEL
sudo yum install epel-release
sudo yum install ShellCheck

# 基本用法
shellcheck script.sh

# 检查多个文件
shellcheck *.sh

# 指定shell类型
shellcheck --shell=bash script.sh

# 排除某些检查
shellcheck --exclude=SC2086,SC2034 script.sh

# 输出格式
shellcheck --format=gcc script.sh      # GCC风格
shellcheck --format=json script.sh     # JSON格式
shellcheck --format=checkstyle script.sh  # CI集成

# 从标准输入读取
echo 'echo $undefined' | shellcheck -

常见警告及修复

SC2086: 变量未加引号

# 警告示例
file=$1
rm $file  # SC2086: Double quote to prevent globbing and word splitting

# 修复
rm "$file"

# 或者如果确实需要分词
# shellcheck disable=SC2086
rm $file

SC2034: 变量已赋值但未使用

# 警告示例
my_var="hello"  # SC2034: my_var appears unused

# 修复1: 如果确实要用，确保后面使用了
my_var="hello"
echo "$my_var"

# 修复2: 如果是导出给其他脚本用
export my_var="hello"

# 修复3: 如果是故意的（比如设置bash选项）
# shellcheck disable=SC2034
HISTCONTROL=ignoredups

SC2155: 声明和赋值应该分开

# 警告示例
local my_var=$(some_command)  # SC2155

# 原因：如果some_command失败，$?会被local的返回值覆盖

# 修复
local my_var
my_var=$(some_command)

SC2164: cd失败后继续执行

# 警告示例
cd /some/dir
rm *  # 如果cd失败会在错误的目录执行rm

# 修复1
cd /some/dir || exit 1

# 修复2
if ! cd /some/dir; then
    echo "Failed to cd" >&2
    exit 1
fi

SC2181: 检查上一个命令的退出码

# 警告示例
some_command
if [[ $? -ne 0 ]]; then
    # SC2181: 直接用命令做条件更好
    echo "failed"
fi

# 修复
if ! some_command; then
    echo "failed"
fi

SC2046: 命令替换未加引号

# 警告示例
files=$(find . -name "*.txt")
process $files  # SC2046: 如果文件名有空格会出问题

# 修复1: 用数组
mapfile -t files < <(find . -name "*.txt")
process "${files[@]}"

# 修复2: 用read循环
find . -name "*.txt" | while read -r file; do
    process "$file"
done

# 修复3: 用-print0和xargs -0
find . -name "*.txt" -print0 | xargs -0 process

SC2129: 多次重定向到同一个文件

# 低效写法
echo "line1" >> file.txt
echo "line2" >> file.txt
echo "line3" >> file.txt

# 高效写法
{
    echo "line1"
    echo "line2"
    echo "line3"
} >> file.txt

shellcheck配置文件

在项目根目录创建 .shellcheckrc：

# .shellcheckrc
# 全局禁用某些检查
disable=SC1090,SC1091

# 设置默认shell
shell=bash

# 启用所有可选检查
enable=all

# 使用外部源文件时的路径
source-path=SCRIPTDIR
source-path=lib

在脚本中使用指令

#!/usr/bin/env bash

# 文件级别禁用
# shellcheck disable=SC2034

# 下一行禁用
# shellcheck disable=SC2086
echo $unquoted_var

# 禁用多个检查
# shellcheck disable=SC2034,SC2086
unused_var=$unquoted

# source文件提示
# shellcheck source=lib/common.sh
source "$SCRIPT_DIR/lib/common.sh"

# 指定shell类型
# shellcheck shell=bash

# 启用可选检查
# shellcheck enable=check-unassigned-uppercase

CI集成示例

GitHub Actions:

# .github/workflows/shellcheck.yml
name: ShellCheck

on: [push, pull_request]

jobs:
  shellcheck:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4

    - name: Run ShellCheck
      uses: ludeeus/action-shellcheck@master
      with:
        scandir: './scripts'
        severity: warning
      env:
        SHELLCHECK_OPTS: -e SC1090 -e SC1091

GitLab CI:

# .gitlab-ci.yml
shellcheck:
  stage: lint
  image: koalaman/shellcheck-alpine:stable
  script:
  - find . -name "*.sh" -exec shellcheck {} +
  allow_failure: false

最佳实践和注意事项

性能优化

1. 减少fork/exec调用

# 低效：每次循环都fork
for f in *.txt; do
    wc -l "$f"  # 每次调用外部命令
done

# 高效：单次调用
wc -l *.txt

# 或者用内置命令
for f in *.txt; do
    local count=0
    while IFS= read -r _; do
        ((count++))
    done < "$f"
    echo "$count$f"
done

2. 使用内置命令代替外部命令

# 低效
result=$(echo "$string" | tr '[:upper:]' '[:lower:]')

# 高效
result="${string,,}"

# 低效
dir=$(dirname "$path")
base=$(basename "$path")

# 高效
dir="${path%/*}"
base="${path##*/}"

# 低效
length=$(echo -n "$string" | wc -c)

# 高效
length="${#string}"

3. 避免无谓的子shell

# 创建子shell（有开销）
result=$(cat file.txt)

# 不创建子shell
result=$(<file.txt)

# 创建子shell
echo "$(pwd)"

# 不创建子shell
echo "$PWD"

4. 管道vs进程替换

# 管道（子shell中变量修改不影响外部）
cat file.txt | while read -r line; do
    ((count++))  # count修改在子shell中
done
echo "$count"  # 可能是0

# 进程替换（推荐）
while read -r line; do
    ((count++))
done < <(cat file.txt)
echo "$count"  # 正确的值

5. 批量操作

# 低效：逐行处理
while read -r line; do
    echo "$line" >> output.txt
done < input.txt

# 高效：批量重定向
{
    while read -r line; do
        echo "$line"
    done < input.txt
} > output.txt

# 更高效：直接用命令
cat input.txt > output.txt

安全加固

1. 输入验证

# 验证用户输入
validate_input() {
    local input="$1"
    local pattern="$2"

    if [[ ! "$input" =~ $pattern ]]; then
        echo "Invalid input: $input" >&2
        return 1
    fi
}

# 使用示例
user_input="$1"
validate_input "$user_input" '^[a-zA-Z0-9_-]+$' || exit 1

2. 路径安全

# 不安全：用户可以输入../../etc/passwd
file_path="$1"
cat "/data/$file_path"

# 安全：规范化并验证路径
file_path="$1"
safe_path="$(realpath -m "/data/$file_path")"
if [[ "$safe_path" != /data/* ]]; then
    echo "Invalid path: outside of /data" >&2
    exit 1
fi
cat "$safe_path"

3. 避免命令注入

# 危险：用户输入直接拼接到命令
user_input="$1"
eval "ls $user_input"  # 用户可以输入"; rm -rf /"

# 安全：使用数组
user_input="$1"
ls -- "$user_input"

# 安全：使用--分隔选项和参数
rm -- "$file"  # 即使$file是-rf也不会被当作选项

4. 临时文件安全

# 不安全：可预测的文件名
temp_file="/tmp/myapp_$$"  # PID可预测，可能被symlink攻击

# 安全：使用mktemp
temp_file="$(mktemp)"
temp_dir="$(mktemp -d)"

# 更安全：使用私有目录
private_dir="$(mktemp -d)"
chmod 700 "$private_dir"
temp_file="${private_dir}/data"

5. 权限检查

# 检查文件权限
check_file_permissions() {
    local file="$1"
    local expected_perms="$2"

    local actual_perms
    actual_perms=$(stat -c '%a' "$file" 2>/dev/null || stat -f '%Lp' "$file" 2>/dev/null)

    if [[ "$actual_perms" != "$expected_perms" ]]; then
        echo "Warning: $file has permissions $actual_perms, expected $expected_perms" >&2
        return 1
    fi
}

# 配置文件不应该world-readable
check_file_permissions "$CONFIG_FILE" "600"

常见错误

1. 变量比较错误

# 错误：字符串比较用单个=
if [[ $var = "value" ]]; then  # 可能误写为赋值

# 正确
if [[ "$var" == "value" ]]; then

# 数字比较用-eq
if [[ "$num" -eq 5 ]]; then

2. 数组处理错误

# 错误：丢失空元素
arr=("one" "" "three")
for item in ${arr[@]}; do  # 空元素被跳过
    echo "[$item]"
done

# 正确
for item in "${arr[@]}"; do
    echo "[$item]"
done

3. 函数返回值误用

# 错误：函数中的echo被当作返回值
get_value() {
    echo "debug info" >&2
    echo "actual value"
}

# 调用
result=$(get_value)  # result包含"actual value"，debug info到stderr

4. 管道中的exit


# 错误：exit只退出子shell
cat file | while read -r line; do
    [[ "$line" == "stop" ]] && exit 1  # 只退出while的子shell
done

# 正确：使用进程替换
while read -r line; do
    [[ "$line" == "stop" ]] && exit 1
done < <(cat file)

上一篇：AIVD边云协同框架：基于MLLM与动态调度优化工业视觉检测
下一篇：扣子2.0发布：四大核心能力解析与AI智能体开发平台演进

Bash, Linux, 运维自动化, DevOps, 系统管理

Shell脚本10大实战技巧提升开发效率

Shell脚本10大实战技巧提升开发效率

技术特点

适用场景

1. 确认执行环境

2. 创建项目结构

3. 设置开发环境

技巧1: set -euo pipefail 严格模式

默认行为的坑

严格模式详解

完整的严格模式模板

技巧2: 函数封装和模块化

函数的基本规范

模块化组织

技巧3: 参数解析 (getopts和getopt)

基础方案：getopts

进阶方案：getopt (GNU)

我常用的参数解析模板

技巧4: 日志记录框架

技巧5: 错误处理和trap信号捕获

trap基础

完整的错误处理框架

常见错误处理模式

技巧6: 并行执行

方式1: 后台进程 + wait

方式2: xargs并行

方式3: GNU Parallel（推荐）

实战：并行部署脚本

技巧7: 配置文件管理

方式1: Shell格式配置文件

方式2: INI格式配置

方式3: YAML/JSON配置

配置验证

技巧8: 单元测试 (bats框架)

bats框架介绍

基本测试结构

高级测试技巧

测试覆盖率

技巧9: 脚本模板和代码片段

完整的生产级脚本模板

常用代码片段

技巧10: shellcheck静态检查

安装和基本用法

常见警告及修复

shellcheck配置文件

在脚本中使用指令

CI集成示例

最佳实践和注意事项

性能优化

1. 减少fork/exec调用

2. 使用内置命令代替外部命令

3. 避免无谓的子shell

4. 管道vs进程替换

5. 批量操作

安全加固

1. 输入验证

2. 路径安全

3. 避免命令注入

4. 临时文件安全

5. 权限检查

常见错误

1. 变量比较错误

2. 数组处理错误

3. 函数返回值误用

4. 管道中的exit

相关帖子