• En
  • Signup
  • 刷题上岸
  • Today 0
  • Post 12
  • Followed 1
  • 捐助

    请在小工具里添加二维码

    刷题上岸 刷题上岸 关注:1 内容:12

    Amazon | OA 2020 | Top K Frequently Mentioned Keywords

  • 查看作者
  • 打赏作者
  • 拉黑名单
    • 刷题上岸
    • 等级6
      土豪

      Given a list of reviews, a list of keywords and an integer k. Find the most popular k keywords in order of most to least frequently mentioned.

      The comparison of strings is case-insensitive.
      Multiple occurances of a keyword in a review should be considred as a single mention.
      If keywords are mentioned an equal number of times in reviews, sort alphabetically.

      Example 1:

      Input:
      k = 2keywords = ["anacell", "cetracular", "betacellular"]
      reviews = [  "Anacell provides the best services in the city",  "betacellular has awesome services",  "Best services provided by anacell, everyone should use anacell",
      ]
      
      Output:
      ["anacell", "betacellular"]
      
      Explanation:"anacell" is occuring in 2 different reviews and "betacellular" is only occuring in 1 review.

      Example 2:

      Input:
      k = 2keywords = ["anacell", "betacellular", "cetracular", "deltacellular", "eurocell"]
      reviews = [  "I love anacell Best services; Best services provided by anacell",  "betacellular has great services",  "deltacellular provides much better services than betacellular",  "cetracular is worse than anacell",  "Betacellular is better than deltacellular.",
      ]
      
      Output:
      ["betacellular", "anacell"]
      
      Explanation:"betacellular" is occuring in 3 different reviews. "anacell" and "deltacellular" are occuring in 2 reviews, but "anacell" is lexicographically smaller.

      Related problems:

      k = 2
      keywords = ["anacell", "betacellular", "cetracular", "deltacellular", "eurocell"]
      reviews = [
        "I love anacell Best services; Best services provided by anacell",
        "betacellular has great services",
        "deltacellular provides much better services than betacellular",
        "cetracular is worse than anacell",
        "Betacellular is better than deltacellular.",
      ]
      import re
      from collections import Counter
      import heapq
      import collections
      class Element:
          def __init__(self, word, freq):
              self.word = word
              self.freq = freq
              
          def __lt__(self, other):
              if self.freq == other.freq:
                  return self.word > other.word
              return self.freq < other.freq
      def topKFrequent(k, keywords, reviews):
          '''
          k: int
          keywwords: list of string
          reviews: list of string
          '''
          word_list = []
          
          for review in reviews:
              word_list += set(review.lower().replace('[^a-z0-9]', '').split())
          table = collections.defaultdict(int)
          for word in word_list:
              table[word] += 1
          
          heap = []
          
          for word, freq in table.items():
              if word in keywords:
                  heapq.heappush(heap, Element(word, freq))
                  if len(heap) > k:
                      heapq.heappop(heap)
          
          return [heapq.heappop(heap).word for i in range(k)][::-1]
          
      print(topKFrequent(k, keywords, reviews))

      请登录之后再进行评论

      登录
    • Tasks
    • Current Activities
    • Back to Top