![]() |
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
#1 |
|
Newbie
Join Date: Jul 2008
Posts: 2
Rep Power: 0
![]() |
Hi,
I have a code where I need to search in a set of non-overlapping intervals; lets say { (1-10), (20-25), (30-40) } is my set. I need to search for a particular item, say 22, and determine which interval it lies in, (20-25) for this example. I need to do this fast, as I have a large number of lookups. I am using a splay tree (basically a binary search tree which brings last accessed node to the root, to make searching tree fast). I was wondering if there was some way I could go down to O(1) using hash tables. But I can't figure out how I could hash intervals. Any ideas? ~gattu~ |
|
|
|
|
|
#2 |
|
Battle Programmer
Join Date: Feb 2006
Location: Bellevue, WA, USA
Posts: 751
Rep Power: 3
![]() |
Re: Q: Search in intervals
Assuming your intervals are constant (e.g. it will always be {[1,10],[20,25],[30,40]} for a given dataset), you could create a hashing function which takes in a value and returns a hash code for the appropriate interval. This would get your hash lookup to O(1) for each <item, interval> mapping.
Regarding your choice of a splay tree: do you know that you'll be searching for certain data more often than other data? If not, then it's quite possible for a splay tree to be non-performant, since you always have to restructure the tree to bring the latest search to the top. This works out well if you search for a subset of the data often as those nodes will congretate towards the top, but if you are going over your data set fairly evenly then another tree structure might be just as good or better.
__________________
<insert disclaimer here> <insert shameless plug for Visual Studio here> |
|
|
|
|
|
#3 |
|
Programming Guru
![]() Join Date: Apr 2005
Posts: 1,827
Rep Power: 5
![]() |
Re: Q: Search in intervals
A trivial way to get a query down to O(1) would be to precompute all of the values. You wouldn't need a hash function. A simple 1D array would suffice.
lookup = []
for i in num_intervals:
for j in intervals[i][0] ... intervals[i][1]:
lookup[j] = (intervals[i][0], intervals[i][1])
for i in num_queries:
if lookup[queries[i]]:
print lookup[queries[i]][0], lookup[queries[i]][1]
else:
print "Not In Any Interval"This would be an <O(N), O(1)> algorithm (<Precomputing, Querying>), where N is the end point of the largest interval. But if N could be as large as a couple billion, I doubt you can get an O(1) query without doing any such massive precomputations. ---------- Here is another really easy solution which does not require a splay tree. Just another 1D array. An O(logN) query is usually fast enough for any situation (unless there are 10,000,000 queries). Assuming logn is okay, why not just binary search the interval. This assumes your intervals are already sorted in increasing order. for i in num_queries:
low = 0
high = num_intervals-1
if queries[i] < intervals[0][0] or
queries[i] > intervals[num_intervals-1][1]:
print "Not In Any Interval"
continue
while low < high:
mid = (low+high)/2
if intervals[mid][0] < queries[i] and
intervals[mid][1] < queries[i]:
low = mid + 1
else if intervals[mid][0] > queries[i] and
intervals[mid][1] > queries[i]:
high = mid - 1
else:
low = high
print intervals[low][0], intervals[low][1]This would be an <O(N), O(logN)> algorithm (<Precomputing, Querying>), where N is the number of intervals. ---------- Another very fast algorithm would be a line sweep. After all of the intervals are sorted in increasing order, sort the queries in increasing order as well. Do a horizontal sweep of the queries, keeping the sweep aligned with the intervals. The intervals are found for you. Then resort, output, and you're done. Assuming queries are already sorted in increasing order.
j = 0
for i in num_queries:
while j<num_intervals and intervals[j][0] <= queries[i]:
j++
if queries[i] < intervals[0][0] or
queries[i] > intervals[num_intervals-1][1]:
print "Not In Any Interval"
else:
print intervals[j-1][0], intervals[j-1][1]If the overall query time is QlogQ+N, then the average time per query is logQ+N/Q. Therefore, this would be an <O(1), O(logQ+N/Q)> algorithm (<Precomputing, Querying>), where N is the number of intervals and Q is the number of queries. If the queries are already sorted, then this would be <O(1), O(1)>, which is by far the fastest possible! And again, it is very easy to code (much easier than a splay tree). ---------- Lastly, if you wanted your data structure to support more functions related to intervals, a range tree should be much easier to code than a splay tree. Here is a good link related to what you're doing. http://www.topcoder.com/tc?module=St...CommonAncestor
__________________
Waterloo's Canadian Computing Competition (CCC) - Stage 2 Problems, Solutions, and Test Data Last edited by Sane; Jul 17th, 2008 at 9:17 AM. |
|
|
|
|
|
#4 |
|
Newbie
Join Date: Jul 2008
Posts: 2
Rep Power: 0
![]() |
Re: Q: Search in intervals
Thank you for your replies.
Unfortunately, the data structure is not static. I need to do around a billion lookups, with around a thousand insertions/deletions in between. Also there is huge temporal locality in my access patterns. Any tree structures that take advantage of the cache that come to your mind, in case hashing is not an option? ~gattu~ |
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Preforming a binary(SQL) search on a datagrid. | SydneyMcConnell | Visual Basic .NET | 4 | Jan 10th, 2008 12:40 PM |
| Better Search Engine Design? | bigguy | Coder's Corner Lounge | 4 | Aug 13th, 2007 7:16 PM |
| How to save all search engine results urls in a text file | abojan | ASP | 0 | Dec 27th, 2006 8:33 AM |
| Backup Search | grimpirate | C | 3 | Jul 3rd, 2006 7:49 PM |
| An alternative to 'Windows Search'... | SaturN | Coder's Corner Lounge | 9 | Jul 24th, 2005 2:30 PM |