![]() This string's lexicographical order gives sane results with sorts and inequalities but takes up lots of space on disk and prevents you from doing math on dates in queries. The default adapter for the Python datetime class stores an ISO 8601 formatted string in the database. It's best to keep as much work in SQLite as possible as both a performance optimization and to minimize errors like this one. You must take care not to mutate that object in Python code or you will break every object and every future object that tries to reference that value. One consequence of memoization and Python's reference types is that a single instance of a value will be used for every instance of the value. Memoization successfully moved the bottleneck in some parts of the report such as loading data out of an Oracle database into SQLite from being limited on CPU usage in Python to being limited by the Oracle database's I/O speed. The increased memory usage was negligible for our data set because it had lots of identical values. Both reading and writing adapters for both types benefitted from memoization decorators that cached the instantiation of those types. By default it ships with adapters for converting timestamps to the Python datetime class and converting arbitrary precision decimal types to Decimal objects. Python's sqlite3 module has the ability to convert values to and from Python objects as they are read and written to the database. SET blade = tmp_table.blade, port = tmp_table.port As an example, here’s what a typical SQL Server query looked like: It is very fast to read and write queries keyed on the rowid and I made extensive use of it. Internally SQLite keeps a b-tree index to store and retrieve the physical rows on disk and the key to this index is exposed as the hidden rowid column. The original database’s reliance on its single, giant clustered index thrashed the performance of virtually all queries queries across the entire run so I wound up rewriting many queries to use SQLite’s hidden rowid feature. The first, naive port over to SQLite had quite a few performance problems which I will get into shortly. Thanks to the on-disk ordering provided by the clustering this was sequential, predictable I/O and SQL Server did a pretty good job with it, often beating or tying SQLite which was running on a modern i7 with an SSD. A typical query might have a WHERE clause that matched on the first 4 or 5 indexed columns and the remaining data would be read in a partial table scan. This sounds strange but the report was synthesizing data across multiple internal company databases whose data was populated by devices from many different vendors so it needed the flexibility while it was normalizing data from all of these sources. Most of the data was in two tables with around 50 sparsely populated columns each, a single clustered index on the leading 7 columns or so and no primary keys or UNIQUE constraints. Unfortunately my reporting database relied heavily on SQL Server’s clustered indexes. ![]() The underlying storage of the table does whatever it wants. In SQLite there is no way to cluster on a given column. SQL Server supports clustered indexes in a normal way. If a table is clustered on an index or column it means that the underlying table on disk has its data in the collation order for that index or column and updates to the table will keep that order. Some background information: in DBA speak a clustered index/table/column refers to the order of data on disk. SQLite is case sensitive and the only way to change this is by declaring every text column as COLLATE NOCASE. The default collation for SQL Server 2005 is something called SQL_Latin1_General_CP1_CI_AS. I chose Python and sqlite3 as the replacements for a mix of SQL Server queries and jobs and Visual Basic macros embedded in Excel templates for a bunch of reasons that aren't relevant here but I do have some notes on problems I ran into while moving everything over: Collations ![]() After three months of wrangling with the report I was able to get it off of the Microsoft stack just before the Windows sysadmins decided it was time to pull the plug on this long forgotten and neglected Windows Server 2003 system. ![]() ![]() We're a company full of Linux and a few legacy Solaris systems but the DBAs who built this thing almost 15 years ago must have insisted on SQL Server 2005 so I was the proud owner of one of the few Windows servers on our side of the business. After a sudden reduction in force at the day job I inherited a crufty old reporting system from a team of DBAs. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |