45
duckWit
6y

A recent project actually taught me how HORRIBLY STUPID it is to store large bodies of text in a SQL Server database. There were millions of records with pages of compressed text each.

More and more text records pile on every single day. Needless to say it was becoming super slow and backups were taking WAY too long.

After refactoring them out as compressed files to disk storage (I love you, micro-services) and dropping them completely from the database, the backup size went from 90gb to 3gb!

It's not every day you get to see a dramatic result like that from a refactor.

Lesson learned, and yes it was quite cool.

Comments
  • 25
    BLOB
    Big Load Of Bullshit
  • 3
    Well what if you want to do a fulltext search?
  • 4
    @scroach in my case, the text was compressed before being inserted into the db to save space. A full-text search would poorly perform in that regard because it would have to decompress each entry just to be able to read it and to see if it matched hits from a search. If you do that one row at a time, that's called RBAR: "Row By Agonizing Row".

    Any design that does RBAR is poor, which is why you don't loop in SQL one record at a time (cursors, I'm looking at you). If you do, you are denying the query optimizer from choosing an execution plan leveraging the full power of SQL and your code will be slow.

    Second point, it's best to use a full-text search engine for that. At the very least, a schema design previously mapping out all documents' words to match in an index would be better. Then there's a one-to-many relationship between the documents and all the index words they contain. Searches would be faster that way b/c entire doc contents are not scanned each time, just the index.
  • 3
    Oh lol I didn't understand that right XD of course then there's no good way of searching in db.
    So you had 90gig compressed data? How did you reduce that so drastically? Was the compression shit?

    I just don't see a generic reason why saving that much data in a database would be bad. If you save them in files you have to tackle them separately in backups and may run into inconsistencies.. but as always it mainly depends on use case and environment
  • 0
    @scroach the size of the backup files of the database is what dropped dramatically. The backup files no longer have those giant transaction logs reflecting all those huge document insertions.

    Also, saving them off to compressed files on disk is much better for us because we can leverage a third-party cloud storage provider who is cheaper than consuming our own server resources. Then, when one of the files is needed, we recall it back down to our server temporarily. It's a much better design.
  • 0
Add Comment