Microsoft Interview Question

Find the 20 longest strings in a text file.

Interview Answers

Anonymous

Aug 28, 2012

Another way can be to just traverse the strings and keep a min b-tree to keep the minimum element on the top. When size of tree + 20, then only insert (replace) in tree, if the current string length > root of tree(min) . This will take O(log20) for each insertion and O(N) for traversal.

3

Anonymous

Oct 11, 2012

In the hash map approach described in the first answer, instead of having the word as the key, we can have the length as the key and value as the list of words with the length pointed by the key. You can use LinkedHashSet if you want sorting. Or you can maintain the largest 20 lengths in a separate data structure. This obviously is not a really optimal solution in terms of space complexity. Instead of that we can maintain a min-heap. Add first 20 words as it is. And then, if the new word has a larger length than the word at the top of the heap (smallest word as it's a min-heap) you remove the word from the heap and insert the new word into the heap. The heap size is always going to be 20 so you don't take much space and don't take much time to insert the word in the heap as well.

1

Anonymous

Oct 28, 2012

Correction: The min-heap version takes n.log(k) only which is O(n) only. -This looks the best solution so far.

1

Anonymous

Jan 16, 2018

Use selection rank algorithm. Expected time complexity o(n)

Anonymous

May 25, 2019

Call for the intern!

Anonymous

Oct 28, 2012

The min-heap solutions looks nice..However, it would take O(n.logn) right? On worst case, we do have to insert/delete in min-heap (log n) for every n words. Any better solution exists?? Can we use dynamic programming to find the longest word and then remove the longest word and repeat the routine for 19 more times? That would be 20times O(N) , which is better than O(n.log n). Any comments?

Anonymous

Aug 28, 2012

If you don't have to worry about space, you could use a hash with each word being the "key" and its length being the "value". Then sort bu hash by value, and choose the top 20.