Id | Source | Type | Entry | ParentID |
---|---|---|---|---|
1 | FBI | Name | Charles Smith | NULL |
2 | FBI | Alias | Chuck Smith | 1 |
3 | FBI | Eye Color | Brown | 1 |
4 | Interpol | Name | Richard Jones | NULL |
4 | Interpol | Alias | Dick Jones | 4 |
6 | Interpol | Nationality | UK | 4 |
And so on, for about 12MM rows. (These are "wanted" lists -- and we get different attributes on a per-list or per-entry basis.) So what we want to do is let someone search by name, in which case we want to match any name or alias. We have a lot of lists, so we end up parallelizing the query like this:
select * from watch_list where source = 'FBI' and similarity(name, 'Robert Cooper') > 0.5
But we were not getting the performance that we needed out of it that we needed. Enter "selective indexes", where we can limit the number of rows in the index, thus limiting the number of rows we have to search:
create index name_and_alias on watch_list (entry) where type in ('Name','Alias');
And boom!, we get the performance we need.
No comments:
Post a Comment