Friday, March 15, 2013

Implementation of hashcode for a more efficient String hash


Unicode uses 16 bits per character, allowing for 2^16 possibilities. A String with 2147483647 (Integer.MAX_VALUE) characters has 2^34359738352 possibilities. What a large number! Thus, it is of no doubt that the hashCode method of the String class is less efficient than that of Long, since the number of all unique possibilities of a String is by far larger than Long's.

However, regarding Strings that match certain patterns, the number of unique possibilities is much smaller. For example, the number of English vocabularies with length below is lower than 2^64, the number of a Long's possibilities. For a class with a single non-static field "long test", the hashcode is simply "31 + (int) (test ^ (test >>> 32));", which is much more efficient than that of String, "s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]".

The Java code below is an implementation of a more efficient hashcode for "English vocabularies with length below 12." (The first character can be upper-cased, and other characters should be lower-cased. No whitespace, hyphens, or other character are allowed.)


package com.packa9e;

import java.util.HashMap;

public class Vocabulary  {

 private final StringBuilder vocabulary;
 private long[] digits = new long[12];
 
 public Vocabulary(String vocabulary){
  this.vocabulary = new StringBuilder(vocabulary);
  if (hasInputError())
   new Exception("Not a valid vocabulary").printStackTrace();
 }
 
 private boolean hasInputError(){
  if (vocabulary.length() > 12 )
   return true;
  
  if (vocabulary.charAt(0) >= 65 && vocabulary.charAt(0) <= 90) //A-Z
   digits[0] = vocabulary.charAt(0) - 38; // A - 27 ... Z - 52 
  else if (vocabulary.charAt(0) >= 97 && vocabulary.charAt(0) <= 122)
   digits[0] = vocabulary.charAt(0) - 96; //a - 1 .. z - 26
  else
   return true;
   
  for (int i = 1 ; i < vocabulary.length(); i++){
   if (vocabulary.charAt(i) >= 97 && vocabulary.charAt(i) <= 122)
    digits[i] = vocabulary.charAt(0) - 96; //a - 1 ... z - 26
   else
    return true;
  }
  return false;
 }
 
 @Override
 public int hashCode() {
  Long longValue = digits[0] << 55;
  for (int i = 1 ; i < vocabulary.length(); i++){
   longValue += digits[i] << (5 * (11 - i));
  }
  return longValue.hashCode();
 }

 @Override
 public boolean equals(Object obj) {
  if (this == obj)
   return true;
  if (obj == null)
   return false;
  if (getClass() != obj.getClass())
   return false;
  Vocabulary other = (Vocabulary) obj;
  if (vocabulary == null) {
   if (other.vocabulary != null)
    return false;
  } else if (!vocabulary.toString().equals(other.vocabulary.toString()))
   return false;
  return true;
 }
 
 public static void main(String[] args) {
  HashMap hashMap = new HashMap();
  Vocabulary v1 = new Vocabulary("test");
  hashMap.put(v1, "reg4te");
  Vocabulary v2 = new Vocabulary("test");
  System.out.println(hashMap.get(v2));
 }
}

No comments:

Post a Comment