Welcome Guest, Not a member yet? Register   Sign In
softDelete and database indexing
#11

I thought that the new softDelete was using datetime values now, or can you use both?
What did you Try? What did you Get? What did you Expect?

Joined CodeIgniter Community 2009.  ( Skype: insitfx )
Reply
#12

(08-23-2019, 07:58 AM)InsiteFX Wrote: I thought that the new softDelete was using datetime values now, or can you use both?

I thought you could still use both. FWIW, the docs still say DATETIME or INTEGER.
Reply
#13

(08-23-2019, 07:55 AM)dave friend Wrote: Cardinality matters for indexing. A very small number of distinct values among a very large number of rows will not produce noticeable efficiency gains. Indexes are B-tree structures. There is little point in searching against a B-Tree when the size of the tree is small. Use indexes with fields with high cardinality to achieve search performance gains versus sequential scans. With that in mind it might make sense to consider multi-column indexes that include the soft delete column. Perhaps even to the point of including soft deletes into the primary key.

I definitely agree with this statement.
Reply
#14

Soft deletes now use a time representation for “deleted at” - this can be a datetime or an integer seconds-from-epoch. Given that it is now an ordered column an index makes more sense - BUT a key piece of info is that it is nullable and standard find() queries will be “IS NOT NULL”, so it is sort of still functioning like a boolean.

I’m not a DBA but I was told to index “anything that is searched against regularly” and this field will probably be hit even more than primary for any models using these tables - hence the question!
Reply
#15

(This post was last modified: 08-24-2019, 06:28 PM by dave friend.)

(08-24-2019, 06:32 AM)MGatner Wrote: Soft deletes now use a time representation for “deleted at” - this can be a datetime or an integer seconds-from-epoch. Given that it is now an ordered column an index makes more sense - BUT a key piece of info is that it is nullable and standard find() queries will be “IS NOT NULL”, so it is sort of still functioning like a boolean.

If "standard find() queries" means returning the not-deleted rows then I think you meant to say a standard find() will use "IS NULL". The presence of a value indicates a soft-delete right?

Yes, sort of boolean. But, for some databases or with certain engines in MySQL columns containing NULL cannot be stored in an index. Or, even if allowed, the index will be ignored if it contains NULL values.

Earlier I surmised that a multi-column PK that included the 'deleted_at' column might be useful. However, because of the use of NULL, it is impossible because primary keys cannot have null values in any part of the key.
This makes me wonder if null is the right answer for "not deleted".

BTW, I don't find any documentation explaining that to remove a soft delete the record should update the 'deleted_at' field with NULL.

(08-24-2019, 06:32 AM)MGatner Wrote: I’m not a DBA but I was told to index “anything that is searched against regularly” and this field will probably be hit even more than primary for any models using these tables - hence the question!

Yes, in general, that's true. But understanding the effect of cardinality needs to be considered. For every rule there are exceptions.
Reply
#16

Yes, I switched it, you’re very right: find() queries whatever you have searched for plus “deleted_at IS NULL”

That’s interesting about NULL being in-indexable, I definitely never knew that. So far I’ve been adding (`deleted_at`, `id`) indexes to all my CI4 tables for models with soft deletes, and often one other like (`deleted_at`, `name`). Does the fact that deleted_at can be null mean these keys are useless? (I’m using MySQL but would like an agnostic solution)
Reply
#17

(This post was last modified: 08-25-2019, 11:27 AM by dave friend.)

(08-25-2019, 03:56 AM)MGatner Wrote: That’s interesting about NULL being in-indexable,

NULL being "in-indexable" is not the case for all databases/engines. MySQL/MariaDB using the InnoDB engine apparently does put NULL values in an index and clusters them together at the top. Perhaps that is why WHERE 'column' IS NULL is generally super quick?

You still might run up against "low cardinality" when indexing on 'deleted_at'. Indexing a column that has a small number of values is almost always useless. The optimizer prefers to do a table scan rather than to bounce between the index BTree and the data.

(08-25-2019, 03:56 AM)MGatner Wrote: So far I’ve been adding (`deleted_at`, `id`) indexes to all my CI4 tables for models with soft deletes, and often one other like (`deleted_at`, `name`). Does the fact that deleted_at can be null mean these keys are useless? (I’m using MySQL but would like an agnostic solution)

I think those compound indexes might still be useful. However, you might want to rethink the column order. In the case of (`deleted_at`, `id`) if 'id' is the primary index and you're using InnoDB you have actually created a duplicate index. That's because InnoDB "clusters" the PRIMARY KEY with the data. This page of MariaDB documentation explains it better than I. It's a very useful page in other respects.

Another fabulous page packed with useful database info is Rick's RoTs -- Rules of Thumb for MySQL. You might notice a lot of redundancy between that page a lot of the MySQL and MariaDB documentation. That's because Rick's work is the basis for much of the official documentation.

Wish I could offer a more agnostic solution but that's outside my knowledge base.
Reply
#18

Wow Rick’s RoTs are amazing! Thanks for this, I’ve learned a ton already. This gives me lots to think over.
Reply




Theme © iAndrew 2016 - Forum software by © MyBB