Sunday, September 11, 2011

Anagram Checker–To check if 2 strings are anagrams

Problem 

Wap to find out if 2 strings are anagrams or not.


Anagrams
Two words are anagrams if one of them has exactly same characters as that of the another word.

Example : Anagram & Nagaram are anagrams (case-insensitive).Similarily, abcd and abdc are anagrams, but abcd and abdce is not.

Solution

A simple way to check if two strings are anagrams is to find out if the numeric sum of the characters in the strings is equal. The approach below uses a hash function that returns the sum of the characters in the string. The running time is O(n) but has a space overhead proportional to the size of the strings.


Approach 1 - Using the custom hash function

The obvious solution is to map each character to a prime number and multiply the prime numbers. So if 'a'' -> 2 and 'b' -> 3, then
  • 'ab' ->5
  • 'ba' ->5
  • 'bab' ->8
  • 'abba' ->10
  • 'baba' ->10
To minimise the chance of overflow, the smallest primes could be assigned to the more frequent letters (e,t,i,a,n). Note: The 26th prime is 101.


In simple terms,write a hash function with as low collision as possible. So 2 strings should generate same hash, if they have same characters.

package strings;

/**
 * Finds if two strings are anagrams using a simple hash function
 *
 */
public class Anagram {

 int getHash(String str){

  char[] chars=str.toCharArray();
  int sum=0;
  for(int val:chars){
   sum+=val;

  }
  return sum;
 }

 public boolean areAnagram(String str1,String str2,boolean isCaseSensitive){
  str1=(isCaseSensitive)?str1:str1.toUpperCase();
  str2=(isCaseSensitive)?str2:str2.toUpperCase();

  return (getHash(str1)==getHash(str2));
 }

 public static void main(String[] a){
  Anagram anagram=new Anagram();
  System.out.println(anagram.isAnagram(“test”,”TeST”,false));
  System.out.println(anagram.isAnagram(“test”,”TeST”,true));
 }
}

Output:
true
false

Time Complexity - This is fast approach as it takes O(n) complexity.
Space Complexity - Space taken by hashmap - O(1)
Caution : But problem with this approach above is hash returning function is not unique. So it may throw out wrong result...so it is highly advisable to have very good hash function.

Approach 2 - Counting Characters
Rather than writing your own hash function, you can use in-built hashmap and then store the first string with count values in that hashmap, and then apply the same logic to do the same, and check if hashmap are same or not. This again takes O(n) time and is better than above approach.

boolean areAnagrams(String s1, String s2){
  char str1[] = s1.toCharArray();
  char str2[] = s2.toCharArray();

  Map<Character, Integer> map = new HashMap<Character, Integer>();
  //process first string
  for(char c : str1)
    if(map.containsKey(c)
          map.put(c, map.getValue(c)++)
    else 
         map.put(c, 1)

  //process 2nd string
  for(char c : str2)
    if(map.containsKey(c) && map.getValue(c)!=0)
         map.put(c,map.getValue(c)--)
    else
        return false
  //check map
  for(int i : map.values())
    if(i!=0)
      return false
    else
      return true;
}

Though the above approach is cleaner, the same can be achieved by the array as well, with array being of 26 letters to denote 26 alphabets.
This is essentially asking you to compare if two sets are equivalent, thinking of the strings as a set of characters where the order doesn't matter.

Time Complexity - O(n)
Space Complexity - O(constant)
Here constant is actually 26.
For an O(n) runtime algorithm, you could iterate through the first string, and count the number of instances of each letter. Then iterate over the second string and do the same. Afterwards, make sure the counts match.

To save a little bit of storage, use the same array for both counts; while iterating the first string, increment the count once for each letter. When iterating the second, decrement. Afterwards, make sure each letter count is zero.

When running this algorithm over generic sets of objects, one might store the counts in a dictionary keyed off the object's hashcode. However, because we're using a relatively small alphabet, a simple array would do. Either way, storage cost is O(constant) (where constant is 26) .

boolean are_anagrams(string1, string2){

    let counts = new int[26];

    for each char c in lower_case(string1)
        counts[(int)c]++

    for each char c in lower_case(string2)
        counts[(int)c]--

    for each int count in counts
        if count != 0
            return false

    return true
}


Approach 3 - Sorting the string and comparing
1) Sort both strings
2) Compare the sorted strings

import java.util.Arrays;
... 

public boolean areAnagrams(String s1, String s2) {
    char[] ch1 = s1.toCharArray();
    char[] ch2 = s2.toCharArray();
    Arrays.sort(ch1);
    Arrays.sort(ch2);
    return Arrays.equals(ch1,ch2);
}

Time complexity : So this approach has O(nlogn)+n complexity.

1 comment:

  1. An anagram is a type of word, the result of rearranging the letters of a word or phrase to produce a new word or phrase, using all the original letters exactly once.

    For example: orchestra can be rearranged into carthorse or cat can be rearranged into act.

    We can find out the anagram strings using below algorithm:

    public static boolean isAnagram(String str1, String str2) {
    if (str1 == null || str2 == null) {
    return false;
    } else if (str1.length() != str2.length()) {
    return false;
    }

    Map map = new HashMap();

    for (int i = 0; i < str1.length(); i++) {
    char characters = str1.charAt(i);
    int charInStr1 = map.containsKey(characters) ? map.get(characters) : 0;
    map.put(characters, ++charInStr1);
    char charFromStr2 = str2.charAt(i);
    int charsInRight = map.containsKey(charFromStr2) ? map.get(charFromStr2) : 0;
    map.put(charFromStr2, --charsInRight);
    }

    for (int occurrences : map.values()) {
    if (occurrences != 0) {
    return false;
    }
    }
    return true;
    }

    I found below useful links for more information

    Write program to find if Strings are anagram

    ReplyDelete