云栈社区»论坛 › 技术文档「 Note & Doc 」 › Python数据库全解析：10个纯Python实现方案从入门到实战 ...

1757 积分	0 好友	257 主题

发消息

Python数据库全解析：10个纯Python实现方案从入门到实战

发表于 6 天前 | 查看: 17| 回复: 0

Python凭借其简洁的语法和强大的生态，不仅被广泛用于应用开发，还催生了许多完全由自身实现的数据库系统。这些数据库涵盖了从轻量级键值存储到面向对象数据库、再到关系型数据库等多种类型，为不同规模和应用场景的项目提供了多样化的选择。它们通常易于安装、使用和理解，非常适合快速原型开发、小型项目、教学或特定领域的数据管理。

本文将深入介绍10个完全或主要由Python开发的数据库，每个都包含其核心特点、适用场景和基础代码示例，帮助你为下一个项目找到合适的存储方案。

1. PickleDB：极简键值存储

特点与说明
PickleDB是一个超轻量级的键值存储数据库，其API设计模仿了Python字典，并使用JSON文件进行持久化。它没有任何外部依赖，开箱即用，是存储简单配置或临时数据的理想选择。

适用场景

应用程序的配置管理。
小型项目或脚本的快速数据持久化。
学习和教学基本的键值存储概念。
开发测试阶段的临时数据存储。

代码示例

import pickledb

# 创建或加载数据库文件
db = pickledb.load('my_config.db', auto_dump=True)

# 像操作字典一样使用
db.set('api_endpoint', 'https://api.example.com')
db.set('retry_count', 3)

# 获取数据
endpoint = db.get('api_endpoint')
print(f"API端点: {endpoint}")

# 获取所有键
all_keys = db.getall()
print(f"所有配置键: {all_keys}")

# 删除一个键
db.rem('retry_count')

2. TinyDB：面向文档的NoSQL数据库

特点与说明
TinyDB是一个纯Python编写的面向文档的数据库。它将数据存储在JSON文件中，提供了比PickleDB更丰富的查询功能。其API直观，支持类似ORM的查询方式。

适用场景

需要结构化存储但不想引入重型数据库（如MongoDB）的小型应用。
桌面应用程序或嵌入式系统的数据存储。
快速原型开发。

代码示例

from tinydb import TinyDB, Query

db = TinyDB('users.json')

# 插入文档
db.insert({'name': 'Alice', 'age': 28, 'role': 'Engineer'})
db.insert({'name': 'Bob', 'age': 35, 'role': 'Designer'})

# 创建查询对象
User = Query()

# 查询数据
engineers = db.search(User.role == 'Engineer')
print(f"工程师: {engineers}")

# 更新数据
db.update({'age': 29}, User.name == 'Alice')

# 使用逻辑运算符进行复杂查询
young_users = db.search((User.age < 30) & (User.role == 'Engineer'))

3. ZODB (Zope Object Database)：面向对象数据库

特点与说明
ZODB是一个成熟的、支持ACID事务的面向对象数据库。它允许你直接存储和检索任何可Pickle的Python对象，实现了“透明持久化”，开发者几乎感觉不到数据库层的存在。

适用场景

需要持久化复杂Python对象图的应用（如内容管理系统、工作流引擎）。
对事务一致性有要求的Python应用。
希望避免对象关系映射（ORM）复杂性的项目。

代码示例

import ZODB, ZODB.FileStorage
import persistent
import transaction

# 定义持久化类
class Employee(persistent.Persistent):
    def __init__(self, name, salary):
        self.name = name
        self.salary = salary

# 设置数据库
storage = ZODB.FileStorage.FileStorage('company_data.fs')
db = ZODB.DB(storage)
connection = db.open()
root = connection.root()

# 使用（对象自动持久化）
if 'engineering' not in root:
    root['engineering'] = persistent.list.PersistentList()
root['engineering'].append(Employee('Charlie', 85000))
transaction.commit()  # 提交事务

# 查询
for emp in root['engineering']:
    print(f"{emp.name}: ${emp.salary}")

connection.close()

4. Durus：轻量级对象持久化

特点与说明
Durus是另一个面向对象的持久化系统，设计比ZODB更简单、更轻量。它同样支持事务和直接存储Python对象，但代码库更小，API更简洁。

适用场景

需要简单对象持久化的小型项目。
作为学习面向对象数据库概念的入门工具。
替代ZODB的轻量级选项。

代码示例

from durus.persistent import Persistent
from durus.connection import Connection

# 定义持久化类
class Product(Persistent):
    def __init__(self, name, price):
        self.name = name
        self.price = price

# 连接到文件存储
connection = Connection('durus_data.db')
root = connection.get_root()

# 存储对象
if 'products' not in root:
    root['products'] = []
root['products'].append(Product('Laptop', 999.99))
connection.commit()  # 提交更改

# 检索对象
for p in root['products']:
    print(f"{p.name}: ${p.price}")

connection.close()

5. Buzhug：纯Python的类SQL数据库

特点与说明
Buzhug是一个用纯Python实现的关系型数据库，支持类SQL的查询语法。它旨在提供一个简单、自包含的数据库解决方案，无需学习新的API。

适用场景

需要在Python环境中使用简单SQL语义的小项目。
教学环境，用于讲解数据库和SQL基本原理。
快速验证数据模型。

代码示例

from buzhug import Base

# 定义表结构并创建
ProductDB = Base('products').create(
    ('id', str),
    ('category', str),
    ('price', float),
    ('stock', int)
)

# 插入数据
ProductDB.insert(id='A1', category='Book', price=29.99, stock=50)
ProductDB.insert(id='B2', category='Toy', price=15.50, stock=120)

# 类SQL查询（使用迭代）
print("所有产品:")
for record in ProductDB.select():
    print(f"  {record.id}: {record.category}")

print("\n价格高于20的产品:")
for record in ProductDB.select(ProductDB.price > 20.0):
    print(f"  {record.id} - ${record.price}")

# 更新
ProductDB.update(ProductDB.id == 'A1', stock=45)

6. Gadfly：嵌入式SQL数据库

特点与说明
Gadfly是一个用Python实现的、支持大部分SQL-92标准的轻量级关系数据库管理系统（RDBMS）。它是一个嵌入式数据库，不需要独立的服务器进程。

适用场景

需要标准SQL支持但无法部署大型数据库（如SQLite）的环境。
教育目的，理解RDBMS工作原理。
历史项目维护或特定兼容性需求。

代码示例

# 注意：Gadfly在现代Python环境可能需要额外处理
from gadfly import gadfly

# 连接到数据库目录
connection = gadfly("mydb", "./dbdata")
cursor = connection.cursor()

# 执行标准SQL
cursor.execute("""
    CREATE TABLE IF NOT EXISTS orders (
        order_id INTEGER PRIMARY KEY,
        customer TEXT,
        amount REAL
    )
""")
cursor.execute("INSERT INTO orders (customer, amount) VALUES ('Acme Corp', 250.75)")
cursor.execute("INSERT INTO orders (customer, amount) VALUES ('Globex', 99.50)")

cursor.execute("SELECT * FROM orders WHERE amount > 100")
for row in cursor.fetchall():
    print(row)

connection.commit()
connection.close()

7. PyTables：基于HDF5的科学数据库

特点与说明
PyTables是一个用于高效管理大量结构化数据的库，基于HDF5文件格式。它通过NumPy提供极快的I/O操作，并支持高级压缩和即时查询。

适用场景

科学计算、数值分析（如物理模拟、金融时间序列）。
存储和处理远超内存容量的大型数据集。
需要复杂层次结构组织的数据归档。

代码示例

import tables as tb
import numpy as np

# 创建HDF5文件
with tb.open_file('sensor_data.h5', 'w') as h5file:
    # 创建组（类似文件夹）
    sensor_group = h5file.create_group('/', 'sensors', 'Sensor Data')

    # 定义表结构
    sensor_dtype = np.dtype([
        ('timestamp', 'f8'),  # 浮点型时间戳
        ('temperature', 'f4'),
        ('pressure', 'f4')
    ])
    table = h5file.create_table(sensor_group, 'readings', sensor_dtype, title='Sensor Readings')

    # 写入大量数据
    row = table.row
    for i in range(1000):
        row['timestamp'] = 1609459200.0 + i
        row['temperature'] = 20.0 + np.random.randn()
        row['pressure'] = 1013.25 + np.random.randn() * 10
        row.append()
    table.flush()

    # 高效查询（不加载全部数据）
    high_temp_readings = table.read_where('temperature > 22.0')
    print(f"高温读数数量: {len(high_temp_readings)}")

8. CodernityDB：纯Python的NoSQL数据库

特点与说明
CodernityDB是一个用纯Python实现的无服务器、无模式NoSQL数据库。它强调速度和简单性，支持自定义索引，并且所有数据操作都在Python进程内完成。

适用场景

需要高性能键值或文档存储的Python应用。
作为应用程序内置的数据库引擎。
对第三方数据库依赖有严格限制的项目。

代码示例

# 首先安装: pip install CodernityDB3
from CodernityDB3.database import Database
from CodernityDB3.hash_index import HashIndex

class CustomIndex(HashIndex):
    def __init__(self, *args, **kwargs):
        kwargs['key_format'] = 'I'  # 整数键
        super(CustomIndex, self).__init__(*args, **kwargs)

    def make_key_value(self, data):
        return data.get('user_id'), None

    def make_key(self, key):
        return key

# 创建数据库并添加索引
db = Database('/tmp/my_codernity_db')
db.create()
db.add_index(CustomIndex(db.path, 'user_id'))

# 插入文档
doc = {'user_id': 123, 'name': 'Eve', 'email': 'eve@example.com'}
db.insert(doc)

# 通过索引查询
result = db.get('user_id', 123)
print(f"查询结果: {result}")

9. Shelve：Python标准库的对象持久化模块

特点与说明
shelve是Python标准库的一部分，它使用dbm后端和pickle模块，提供了一个简单的持久化字典，用于存储任意Python对象。它本质上是Python对象序列化到键值存储的接口。

适用场景

快速为脚本或小型工具添加持久化功能。
存储程序状态或缓存。
利用标准库，避免引入第三方依赖。

代码示例

import shelve

# 打开一个shelf文件（类似字典）
with shelve.open('app_cache.db') as cache:
    # 存储复杂对象
    cache['preferences'] = {'theme': 'dark', 'language': 'en', 'notifications': True}
    cache['recent_searches'] = ['python db', 'shelve tutorial', 'data persistence']

    # 存储函数结果（可pickle的对象）
    import datetime
    cache['last_updated'] = datetime.datetime.now()

    # 读取数据
    prefs = cache.get('preferences', {})
    print(f"当前主题: {prefs.get('theme')}")

# 上下文管理器外自动关闭。也可以手动操作：
# db = shelve.open('app_cache.db')
# ... 操作 ...
# db.close()

10. Klepto：用于记忆化和缓存的字典式存储

特点与说明
Klepto是一个专注于函数记忆化（memoization）和缓存的库，它提供了将字典内容持久化到各种后端（内存、文件、数据库）的能力。其API与标准字典高度兼容。

适用场景

函数结果的缓存和记忆化，以提升计算密集型任务的性能。
需要灵活选择存储后端（内存、磁盘、SQL等）的缓存场景。
管理科学计算或机器学习中的中间结果。

代码示例

import klepto
from klepto import lru_cache, diskcache
import hashlib
import time

# 1. 使用磁盘缓存的记忆化
@lru_cache(cache=diskcache('hash_cache'), ignore=('self',))
def compute_expensive_hash(data: str):
    """一个模拟的耗时计算函数"""
    time.sleep(1)  # 模拟耗时
    return hashlib.sha256(data.encode()).hexdigest()

print(compute_expensive_hash("Hello"))  # 第一次计算，耗时
print(compute_expensive_hash("Hello"))  # 第二次直接从磁盘缓存读取，瞬间完成

# 2. 使用字典式API进行通用存储
archive = klepto.archives.dir_archive('my_archive', serialized=True, cached=False)
archive['dataset_1'] = {'features': [[1,2], [3,4]], 'labels': [0, 1]}
archive['config'] = {'epochs': 10, 'lr': 0.001}

# 后续会话中加载
archive.load()
print(f"存储的配置: {archive['config']}")