performance - How to implement database-style table in Python -
i implementing class resembles typical database table:
- has named columns , unnamed rows
- has primary key can refer rows
- supports retrieval , assignment primary key , column title
- can asked add unique or non-unique index of columns, allowing fast retrieval of row (or set of rows) have given value in column
- removal of row fast , implemented "soft-delete": row kept physically, marked deletion , won't show in subsequent retrieval operations
- addition of column fast
- rows added
- columns deleted
i decided implement class directly rather use wrapper around sqlite.
what data structure use?
just example, 1 approach thinking dictionary. keys values in primary key column of table; values rows implemented in 1 of these ways:
as lists. column numbers mapped column titles (using list 1 direction , map other). here, retrieval operation first convert column title column number, , find corresponding element in list.
as dictionaries. column titles keys of dictionary.
not sure pros/cons of two.
the reasons want write own code are:
- i need track row deletions. is, @ time want able report rows deleted , "reason" (the "reason" passed delete method).
- i need reporting during indexing (e.g., while non-unique index being built, want check conditions , report if violated)
you might want consider creating class uses in-memory sqlite table under hood:
import sqlite3 class mytable(object): def __init__(self): self.conn=sqlite3.connect(':memory:') self.cursor=self.conn.cursor() sql='''\ create table foo ... ''' self.execute(sql) def execute(self,sql,args): self.cursor.execute(sql,args) def delete(self,id,reason): sql='update table set softdelete = 1, reason = %s tableid = %s' self.cursor.execute(sql,(reason,id,)) def verify(self): # check conditions true # report (or raise exception?) if violated def build_index(self): self.verify() ...
soft-delete can implemented having softdelete
column (of bool type). similarly, can have column store reason deletion. undeleting involve updating row , changing softdelete
value. selecting rows have not been deleted achieved sql condition where softdelete != 1
.
you write verify
method verify conditions on data satisfied. , call method within build_index
method.
another alternative use numpy structured masked array.
it's hard fastest. perhaps sure way tell write code each , benchmark on real-world data timeit.
Comments
Post a Comment