Saturday, November 30, 2013

Algorithms: Prime Number Revisited, The Sieve of Eratosthenes

The last time we played with prime numbers, we created an algorithm that would start with the first prime as a seed, and then discover each subsequent prime in the sequence by testing it against all known primes. The problem with this algorithm is that although it works, it needs a large number of computations once the prime list gets high enough.

What if we could eliminate many of those computations, and use a boolean array to immediately tell if a number was prime or not?

The good news is, we can. And this method has been known since the time of the ancient Greeks. It's easy enough to do by hand, but when working with very large numbers, computers can do a much faster job than a human. The method is known as the Sieve of Eratosthenes.

In order to use the method, you must first have a target. Let's say that you want to find all the prime numbers up to 10. You start by making a list of all numbers from 2 to 10.

2, 3, 4, 5, 6, 7, 8, 9, 10

The first number on your list is 2, and that's because 2 is the first prime number. I will underline it to mark it as prime.

2, 3, 4, 5, 6, 7, 8, 9, 10

From here, you cross off all the numbers that are multiples of 2.

2, 3, 4, 5, 6, 7, 8, 9, 10

The next number that is not crossed off is the next prime in the series, which is 3. We underline 3 to mark it as prime, and cross off the multiples of 3.

2, 34, 5, 6, 7, 8, 910

The next number not crossed off is 5, which is prime. We repeat the process with 7, and we are left with our list of primes.

234, 56, 78910

The algorithm is straightforward, with a base case and a finite number of steps. We can easily code this to accept a target number, generate an array of the numbers between 2 and the target, and then run the sieve method on the list to generate all primes. We can even optimize it so that we are not working with a number type at all, but rather a boolean type using a hash-table.

In my implementation, I create a hash-table as a simple array of booleans. The index of the array represents the actual number. If the value is true, then the number is prime. If the value is false, then the number is non-prime. Once the hash-table has computed, the algorithm will walk the table and add each prime to a list of numbers, returning it to the calling method.


        static List<int> SieveOfEratosthenes_Full(int x)
        {
            //The sieve will generate a list of primes between 0 and x.

            //Array of booleans to determine whether or not a given index is a prime number.
            //For example, primes[2] == true because 2 is a prime number while primes[4] == false because 4 is not prime.
            bool[] sieve = new bool[(int)(x + 1.0)];

            //Seed the starting values. 0 and 1 are not considered prime and are seeded as false.
            sieve[0] = false;
            sieve[1] = false;

            //2 is the first prime number and is seeded as true. Each number after 2 is considered prime until ruled out by the sieve
            //and is seeded as true.
            for (int y = 2; y <= x; y++)
                sieve[y] = true;

            //The algorithm walks the array. If the number is prime, then all multiples of that number are marked as false.
            //The algorithm continues ruling out numbers based on their status as a multiple of a prime.
            //The algorithm starts at 2, the first prime.
            for (int y = 2; y <= x; y++)
                if (sieve[y])
                    for (int z = 2 * y; z <= x; z += y)
                        sieve[z] = false;

            List<int> primes = new List<int>();

            for (int y = 2; y < sieve.Length; y++)
                if (sieve[y])
                    primes.Add(y);

            return primes;
        }




The algorithm is in place and runs quite fast. While discussing ways to improve it, several programmers suggested to only test odd numbers as we know all even numbers except 2 are non-prime. By using a hash-table with no gaps, we would cut the physical memory required down by half.

The second implementation I came up with does just that. Simple formulas will convert a number to an index and back again. The hash-table holds all odd values from 3 to the target number. The issue with the implementation is that the code becomes more complex. Even with comments, it's not immediately clear why some of the in-line calculations are happening.


        static List<int> SieveOfEratosthenes_Odd(int x)
        {
            //The sieve will generate a list of primes between 3 and x, only odd numbers.

            bool[] sieve = new bool[(int)(Math.Ceiling(x / 2.0) - 1)];

            for (int z = 0; z < sieve.Length; z++)
                sieve[z] = true;

            //The sieve only needs the odd numbers from 3 to x. The prime's index in the sieve is: Math.floor(n / 2) - 1
            //The index can be converted to the prime it represents by the formula: index * 2 + 3
            for (int z = 0; z < sieve.Length; z++)
                if (sieve[z])
                {
                    int number = z * 2 + 3;

                    //When incrementing, only investigate the odd numbers.
                    for (int y = 3 * number; y <= x; y += (2 * number))
                        sieve[(int)Math.Floor(y / 2.0) - 1] = false;
                }
                        
            //Create the list of primes. Seed with 2 and pull the odd primes from the sieve.
            List<int> primes = new List<int>();

            primes.Add(2);

            for (int z = 0; z < sieve.Length; z++)
                if (sieve[(int)z])
                    //The index can be converted to the prime it represents by the formula: index * 2 + 3
                    primes.Add(z * 2 + 3);

            return primes;
        }

The first implementation is much easier to read and follow than the second. The second implementation uses half the memory, but an array of over 2,000,000 boolean values still takes up less space than the average video or music file. The footprint is theoretically 1.9MB for the full hash implementation and 0.95MB for the odds-only implementation. What about speed? I ran a simple test harness to determine the time used to run both implementations. The speed saved by only testing odd numbers is then lost during the additional calculations. Here are results from a few runs:






The odds-only implementation saves a small fraction of the time. In one case above, it even took longer than the full implementation. Unless processing speed and memory footprint of the hash-table are tight constraints, I'd opt to use the full implementation as it would be easier to read and maintain in a team-environment.

Friday, July 19, 2013

Algorithms: Prime Numbers and Prime Factorizations

EDITED: Removed a bug discovered by a reader regarding very large prime numbers. Any input over 1,000,000 will take a while to run.

I was recently introduced to Project Euler. This site contains many problems for programmers to test both their math skills and coding skills. One of the problems dealt with finding the largest prime factor of a very large number. I enjoyed this problem, and decided to make a small program in C# to find the prime factorization of ANY number between 2 and 9,223,372,036,854,775,807 (the highest value stored in long). The speed for the high end isn't great and would be improved if the program used an existing list of prime numbers. The program does just fine for reasonable input values, however, and the algorithm is a good tool for those wishing to have a quick prime factorization tool.

I broke the problem down into two parts. The first part is how to generate a sequence of prime numbers. The second part is finding the prime factorization using the list. Since my program does not know what the highest prime number is until the process is completed, I only generate new prime numbers as factorization candidate as needed.

Generating a sequence of primes is the first step. A prime number, be definition, is only divisible by 1 and itself. It has no other factors. All non-prime numbers can be broken out into a factorization of primes. For example, 10 is not prime as it is divisible by 1, 2, and 5. The prime factorization of 10 is 2 * 5. Since all non-prime numbers have a prime factorization, any non-prime number that is a factor of another number ensures that the primes in the prime factorization of the lower number are also factors of the higher number. For example, 10 is a factor of 20 and 10 is not prime. Since the prime factorization of 10 is 2 * 5, we know that 2 and 5 are both factors of 20. Using these deductions, we can generate a list of primes merely by testing numbers against all known primes that are lower than the given number.

I wrote a method that does exactly that. Given a list, it will populate that list with the next prime number in the sequence. See the code below, the method is named generateNextPrime. It takes in a list as the only . argument. The only prime number that is hard-coded is 2. When an empty list is passed in, 2 is seeded as the first prime. All subsequent primes in the list are discovered. When the list is not empty, the method determines the last prime in the list and then starts at the very next number. For example, if the list contains { 2 }, then the method will start searching for the next prime at 3. The method compares the number to all discovered primes. Since 2 is greater than the square root of 3, we know that 3 is prime. The list now contains { 2, 3 } as prime numbers. The next time the method is invoked, it will start at 4. 2 is a factor of 4, so 4 is not prime and is not added to the list. The method then moves to 5. Both 2 and 3 are not factors of 5, so we add 5 to the list as prime. The list now contains { 2, 3, 5 }.

This method is known as trial division. It is a lot of work if done by hand, but computers are great at doing monotonous, high-volume calculations like this quickly.

The next part of the problem is to find the prime factorization of a given number. I used trial division again to accomplish this. A list is generated that contains prime numbers to use as candidate factors. A second list is used to store numbers that are actual prime factors of the target. A prime number is generated, in sequence, one at a time and tested against the target. When a prime factor is found, the target is reduced by dividing it by the given factor and the factor is added to the list of prime factors. The prime is checked again to see if it appears in the prime factorization multiple times (such 20, which has a prime factorization of 2 * 2 * 5). Each time the target is reduced as a prime factor is found, and when the prime fails as a candidate, the program generates the next prime in sequence to test as a prime factor candidate.

Here are some screenshots of the console program in action, using inputs 10, 20, 65, and 1,234,567,890





The code, with comments:

    class Program
    {
        static void Main(string[] args)
        {
            //The target number to generate a prime factorization for.
            long target;

            //List to store the prime factors of the target.
            List<long> primeFactorList = new List<long>();

            //List to store all primes. Generated as needed.
            List<long> candidateList = new List<long>();

            //Seed the candidateList with 2 as the first prime.
            generateNextPrime(candidateList);

            //Get the target number.
            Console.Write("Please enter a number to generate a prime factorization for: ");
            string userInput = Console.ReadLine();

            //Validate input is a valid long number. If not, get new input.
            while (long.TryParse(userInput, out target) == false || target < 2)
            {
                Console.Write("\nPlease enter a positive number greater than 1: ");
                userInput = Console.ReadLine();
            }

            //Main while loop. Once the target is 1, we know that we have found all the prime factors.
            while (target > 1)
            {
                //The value of the index at the last prime discovered in our candidate list.
                int lastIndex = candidateList.Count - 1;

                Console.WriteLine("\nChecking against prime number " + candidateList[lastIndex]);

                if (target % candidateList[lastIndex] == 0)
                {
                    //Our target is divisible by the prime number candidate. Add this to the prime factorization list and reduce
                    //our target by dividing it by this prime factor.
                    target = target / candidateList[lastIndex];
                    primeFactorList.Add(candidateList[lastIndex]);
                    Console.WriteLine("\nPrime number " + candidateList[lastIndex] + " added as a prime factor.");
                }
                else
                {
                    //Our target was not divisible by the current prime number candidate. Find the next prime candidate to test.
                    generateNextPrime(candidateList);
                }
            }

            //All prime factors have been found by this point in the program. Now we just need to output the prime factorization.
            if (primeFactorList.Count == 1)
            {
                //1 prime factor.
                Console.WriteLine("\nThe number is prime.");
            }

            else
            {
                //Multiple prime factors
                Console.Write("\nThe prime factors are: ");
                foreach (long primeFactor in primeFactorList)
                {
                    Console.Write(primeFactor);
                    if (primeFactor < primeFactorList[primeFactorList.Count - 1]) Console.Write(" * ");
                }

            }

            Console.Write("\n\nPlease press any key to close...");
            Console.ReadKey();
        }

        static private void generateNextPrime(List<long> primeList)
        {
            //This method will generate the next prime number in the sequence of all primes stored in primeList. The method uses trial division by comparing
            //the candidate number to all previous primes.

            //If the list is empty, seed it with 2 as the first prime in the sequence.
            if (primeList.Count == 0)
            {
                primeList.Add(2);
                return;
            }

            //Find the last prime in the discovered sequence. Start looking for a new prime at the very next number.
            long lastPrime = primeList[primeList.Count - 1] + 1;
            bool foundPrime = false;

            //Loop until the next prime is found.
            while (foundPrime == false)
            {
                bool notPrime = false;

                //Compare the candidate number to all previous discovered primes. If the candidate is not divisible by any of the primes, then its prime factorization
                //will be 1 and itself, which means it is indeed prime.
                for (int x = 0; x < primeList.Count; x++)
                {
                    if (lastPrime % primeList[x] == 0)
                    {
                        //This number is divisible by a previously discovered prime number. It is not prime.
                        notPrime = true;
                        continue;
                    }
                }

                if (notPrime == false)
                {
                    //The number is prime. Add it to the prime list and flag the while loop to stop.
                    foundPrime = true;
                    primeList.Add(lastPrime);
                }
                else
                {
                    //The number was not prime. Try the next number.
                    lastPrime++;
                }
            }
        }
    }




Sunday, February 24, 2013

Project Examples: 3D Lightsaber Model

I apologize to anyone who wants to read this blog regularly as I have not had time to update it with a proper coding post in a few weeks. I am in the home stretch of my undergraduate work. All of my classes will be completed by the end of the spring term. Once that is done, I have two projects to complete for my degree, which will both be games. I am excited and cannot wait to complete them.

In the meantime, I wanted to show off a small project I did this week for a 3D modeling class. We are using Maya 2011 for the class. The project was to model a weapon using primitive polygon modeling. Since primitive polygons are great for modeling man-made objects, and since natural curvatures take a lot more time, I decided to model a lightsaber from the Star Wars series. Bonus points to anyone who can guess which movie and hilt I used as reference, but please note that some of the finer details are missing from the model.





The next stage of this modeling project will be to work on textures, modeling a character to use the weapon, and then finally to animate the scene. I'm not an artist, but I'm happy on how this one turned out.




Thursday, February 14, 2013

Miscellaneous: Videogame Violence: A Second Look from the Perspective of the Gaming Culture

I was going through some old files on my PC to decide what needs to be moved to archive and off my main hard drive. I found this while looking through some of the folders. It was a fun little positional essay I wrote concerning videogame violence for one of my composition classes. I thought I'd repost it here for fun:




Videogame Violence: A Second Look from the Perspective of the Gaming Culture

The party of warriors crashed through the underbrush into the clearing. An elf, tall and lean, was carrying an ornate golden bow. A man, strong and fearless, was sporting armor and broadsword that seemed more decorative than functional. And a dwarf, stout and sturdy, yelled a battle-cry through his thick beard, swinging his large axe in wide arcs. In front of the trio was a clearing in the dense forest where five goblins had set up a small camp. The fire in the center was burning bright, and the goblins scrambled to pick up their own crude weapons for the coming fight. The elf, man and dwarf made short work of the group of goblins, their weapons hewing through flesh and bone. The corpses lay in a pile, and the trio began looking through the camp for anything valuable. They would carry the spoils with them, food and healing herbs, to the next small camp only a short walk ahead. Perhaps there they would find a weapon better than what they currently possessed; A bow that made its bearer shoot faster and farther or a sword with cryptic runes that would slowly heal its bearer’s wounds. They would continue this pattern throughout the night. They would never tire, for these three were not living in the conventional sense, but in the virtual sense. They were avatars, or a virtual representation of a person, for a popular game known in gaming culture as an “MMORPG,” or, “Massively Multiplayer-Online Role-Playing Game.” When the users who created them would log off the virtual world, they would not move. Their world would persist, but they would wait motionless and lifeless where they stood until their user logged back on for more carnage and mayhem. The world of fantasy violence awaits.
            Since their humble inception, videogames have become a significant part of American culture. The stereotypical gamer, anti-social and brooding, has been portrayed in the media in a negative light for many years. Only recently has some of the stigma of playing videogames lifted as they have become more mainstream and widely accepted by society. Some of the stigma still lingers, however. Images still haunt communities around Littleton, Colorado and Paducah, Kentucky, famous for their school shooting incidents. Various psychologists perform studies on videogames and their effects on children and their conclusions are often the subject of talk of parents. When looked at by other professionals who are self-professed “gamers,’ the results of these studies are sometimes interpreted very differently. Videogames have been unfairly criticized and vilified for a number of societal problems and incidents due to various misconceptions about gaming culture. Often, benefits from playing videogames are not studied or overlooked. Stereotypes concerning people who play videogames devoutly are often false and results from studies on videogames can often be seen as skewed by the media.
            Concern over videogame violence can be seen as far back as the 1970s with an obscure arcade cabinet game called Death Race. Like all early videogames, the graphics were crude and left a lot to the imagination. A block was to represent the car, and the player earned points for running down stick-figures scattered in the road. The stick-figures were to represent “Gremlins,” but this style of graphic violence caused the game to be banned in some areas of the country (Kushner, 2003, p. 79). Even with very little graphical detail, and a small set of instructions that clearly stated the car was not running over people but rather monsters, society let their imagination run wild. Unfortunately, they overreacted and assumed the worst. Children would be corrupted and tempted to run down pedestrians in real-life.
The controversy in Death Race was the first major issue to strike the burgeoning industry concerning violence. Videogames were a brand-new medium, captivating the minds of many computer enthusiasts at the time. By the time the first videogames were shown to the public, society did not know how to react. Chaplin and Ruby write in their 2005 book Smartbomb: The Quest for Art, Entertainment, and Big Bucks in the Videogame Revolution, “Considering that most people in America had never seen a computer in 1962, to actually be able to play with one – to control a visual representation on screen – left them speechless” (p. 46). The backlash to videogames in the early years is easy to understand since many in the general public, in fact, did not understand videogames or computers. One of the very first videogames, Spacewar, was discontinued from mass-production due to general confusion from the large instruction manual that came with it. This was something that very few people had seen or even thought of. The games were created by brilliant programmers working in labs at MIT or Berkeley on the very first computer systems available to the academic and business world. They viewed the programs they created as a way to expand their problem-solving skills. They were pioneers in an industry that did not yet exist and were misunderstood by the general public.
Videogames gradually gained acceptance and popularity in American culture. The industry crashed and recovered in the 1980s. A Japanese company called Nintendo succeeded in gaining a large share of the market and producing a game console that was affordable enough to be placed in many homes. Many programmers and computer enthusiasts challenged themselves to create games on their own systems, inspired by games and characters such as Mario. Two of these enthusiasts were John Carmack and John Romero, founders of id software and creators of several violent games such as Wolfenstein, Doom, and Quake. These games would go on to be linked to the tragic school shooting incidents that took place in Littleton, Colorado and Peducah, Kentucky.
The two teen shooters from the Columbine shootings were said to be warped by their favorite music and their favorite game, Doom. Doom was shown in a negative light in the media, portrayed as sick fantasy and ultra violent. Today, many writers look back on the incident as an example of how an industry is portrayed when conclusions are rushed. Fantasy violence was not new and had existed before in other forms of media and imagination. Kushner, an American writer and Journalist, stated in his book Masters of Doom:
Violent fantasy, of course, had an ancient history. Readers had been fascinated by the gore in Beowulf for over a thousand years (“The demon clutched a sleeping thane in his swift assault, tore him in pieces, bit through the bones, gulped the blood, and gobbled the flesh”). Kids played cops and robbers, brandishing their guns and flying backwards in imagined bursts of blood. (2003, p. 79)
Had the shooting incidents not occurred, the fantasy violence in games such as Doom may have been overlooked or unnoticed. The outrage over the incidents fueled public rage and led to a scramble to make sense out of a senseless act. It is argued that the music and videogames that these teens played were the scapegoat for the incidents because they had not yet reached mainstream acceptance. Doom was distributed not through retailers, but freely through the internet. The new forms of metal played by acts such as Nine Inch Nails and Marilyn Manson were demonized and also associated with the same games. Kushner explained that Trent Reznor of Nine Inch Nails had agreed to record music for Carmack’s next violent game, titled Quake (2003, p. 213).
Some proponents of the videogame industry view fantasy violence as essential for a child’s normal development, teaching him or her to be able to distinguish fantasy from reality much better than a child who has not had any experience with fantasy violence. Charles Herold, a writer for the New York Times and videogame critic, challenged parents to attend a boxing match with their child and watch as the crowd cheers and a man is beaten until he can no longer compete. He stated, “Perhaps then people will realize that an animated spear is better than a real fist and they will e-mail me with their gratitude for luring the world away from true violence. They are most welcome” (2005, para. 21). It’s true that sporting events today are not the same violent spectacles the Romans put on for their public, but real violence in sports can still have an effect on a child. A favorite star gets physical with another player, and a child may assume that this sort of physicality is acceptable. A child who plays a violent game will know the level of violence on screen is fantasy and is not acceptable in the real world.
Studies on how videogames affect children are numerous. One of the most active in the field of study is Anderson, a PhD and journalist for the American Psychological Association. His report on violent videogame studies state that children who routinely play violent videogames are subject to a moderate but predictable rise in aggressive behavior and thought (2003, para. 4). A moderate increase in aggression may serve as proof to some that videogames are responsible for all aggressive behavior. The very same conclusion was drawn in the past when studying the effects of contact sports on college and high school students. Goleman, a journalist for the New York Times, wrote in 1985, “Since football athletes might be more aggressive to start with, the telling point in this research is that, as the season goes on, these athletes grow more hostile and aggressive, and remain so in the off-season. That result was not found with swimmers” (para. 12). Aggressive behavior in individuals has always been present, and experts and researchers will continue to do studies on activities and other areas of our culture that contribute to aggressiveness. Videogames were not the first scapegoat of aggressive behavior, and will not be the last. Organized sports such as football and hockey are an accepted part of modern culture and serve many benefits to the players. These sports are not banned because of an increase in aggressive behavior in the participants. Videogames, similarly, have many benefits not reported in the aggression studies.
Skills that can benefit from videogames vary. Some, in common with organized sports, let the player build upon his teamwork skills. Others are not readily apparent but are now being studied, such as the effects of the player’s problem solving and critical thinking skills. Young, an American journalist, studied research by Steinkuehler. Young reported on Steinkuehler’s argument, “’People were actually—no kidding—gathering data on things like the game monster's behavior, putting it in an Excel spreadsheet, and building little mathematical models to try to beat the monster,’ she told me recently. The game teaches complex problem solving and collaborative learning, Ms. Steinkuehler argues” (2010, para. 4). If these studies continue and are able to ascend to the same level as the aggression studies, many may see all the benefits that videogames may offer and stop focusing on just the behavioral studies that indicate videogames instill aggressive behavior and anti-social tendencies in players.
As a testament to the skills that videogames can teach and hone, the government has been using videogames extensively. In the mid 1990s, when Senator Lieberman and President Clinton were calling for regulation of the videogame industry due to its violence, the military had licensed the Doom engine built by Carmack, and built a new version called Marine Doom. This version of the game was used to teach soldiers teamwork in an urban combat environment since eight players were able to cooperatively conquer the many levels (Kushner, 2003, pp. 193-194). More recently, the armed forces have released their own game series entitled America’s Army. The games can be downloaded and played for free from the web-site www.americasarmy.com. Inside the games, the player must go through virtual boot camp that includes conquering an obstacle course, learning basic and advanced rifle skills, and even sit inside a virtual classroom to learn other vital areas of warfare, such as combat first aid. Once the player finished boot camp, he is allowed to play with others on various locations that pit two squads against each other, much like war games played at real boot camps. The game is hugely popular and is widely known to be one of the Army’s best recruitment tools. From within the game, the player can request to have information on joining the armed forces sent to his address or be contacted directly by a real recruiter over the phone.
Videogames have slowly been becoming accepted as part of mainstream culture over its history. The violence in videogames has often been cited as a societal problem that contributes to delinquency and violent behavior in youths. In much the same way the music industry and motion picture industry were targeted years before, videogames became a scapegoat for the public and for politicians. With the help of greater societal acceptance and studies that reveal benefits from videogames, the industry is slowly shedding the stigma associated with fantasy violence and aggressive behavior. Even the government uses videogames as a training and recruitment tool for the military. The next time a zombie is shown being dismembered on a television or computer monitor, remember that violence is not the cause for societal behavioral problems and that the game may be teaching the player critical thinking and problem solving skills. He must figure out a way past all obstacles whether they are a part of the environment or a monster standing in his way to the goal. He may be working as part of a larger group, tackling a large obstacle that he would not be able to handle on his own. His culture is not one of endless violence, but of teamwork, problem solving, and critical thinking. The violence depicted is no different than the popular violence in major motion pictures, television, and sports. Videogames will receive fair judgment once all of society is able to participate and understand the concepts and challenges behind their design.


References
Anderson, C. A. (2003, October). Violent videogames: Myths, facts, and unanswered questions. American Psychological Association. Retrieved April 12, 2010 from: http://www.apa.org/about/psa/2003/10/anderson.aspx
Chaplin, H. & Ruby, A. (2005). Smartbomb: The quest for art, entertainment, and big bucks in the videogame revolution. Chapel Hill, North Carolina: Algonquin Books of Chapel Hill.
Goleman, D. (1985, August 13). Brutal sports and brutal fans. The New York Times. Retrieved April 18, 2010 from: http://www.nytimes.com/1985/08/13/science/brutal-sports-and-brutal-fans.html
Herold, C. (2005, March 24). Fighting on the screen, out of harm’s way. The New York Times. Retrieved April 12, 2010 from: http://www.nytimes.com/2005/03/24/technology/circuits/24game.html
Kushner, D. (2003). Masters of doom: How two guys created and empire and transformed pop culture. New York: Random House.

Young, J. (2010, January 24). Five teaching tips for professors—from video games. The Chronicle of Higher Education. Retrieved April 19, 2010 from: http://chronicle.com.bakerezproxy.palnet.info/article/5-Lessons-Professors-Can-Le/63708/

Monday, February 11, 2013

Algorithms: Replace all ' ' characters in a string with "%20"

In McDowell's (2012) Cracking the Coding Interview, we arrive at problem 1.4.

1.4 - Write a method to replace all spaces in a string with "%20". You may assume that the string has sufficient space at the end of the string to hold the additional characters, and that you are given the "true" length of the string. (Note: If implementing in Java, please use a character array so that you can perform this operation in place.)
EXAMPLE
Input: "Mr John Smith    " 
Output: "Mr%20John%20Smith"

This problem was pretty straightforward, and I'd like to alter it a bit. I am not going to assume that the string has enough spaces at the end, because somewhere down the line, some engineer will not read the documentation and toss in a string with no trailing spaces. Since I assumed that there were trailing spaces, that's a nice crash that was unanticipated. Instead, my method will trim any space characters that are leading or trailing the string. That way, we are sure that every string that is passed to this method DOES NOT have enough space to store the extra characters. Every string is passed through the Trim() method to be equal. We can avoid having to throw an exception when we have some strings with trailing spaces and some strings without.

The original problem was also pretty clear that in-place operations are preferred. I wrote my method in C#. This means that the strings are immutable, and I cannot swap characters in place. Instead, I tried to do the next best thing. Convert the string to a character array, swap the characters in place in that array, then convert the array back into a string. I understand that new memory is allocated for the character array, and then memory is allocated again for the new string, so the method isn't quite as in-place as it would be in Java, but that's OK for the exercise.

The first thing I do (after the Trim()) is to count the number of spaces in the string. For each space, we need two more empty characters added to the string to make sure there is enough room to store all the "%20"'s. After that is done, we convert the string to a character array.

From here, we begin moving characters from the end of the original string to the end of the lengthened string. For all non-space characters, it's a simple move. When we get to a space character, we replace it with "%20". In either case, we decrement our indices carefully until we get to the beginning of the string. From here, we cast the array back into a string and store it as the original variable, which is passed back to the caller as it was given as reference.

My goal for this method was not the shortest code, but the shortest time complexity. The time complexity is O(n), which means it will scale well when given large strings. Somehow, I have a feeling somebody will try to pass in a string that contains the entire manuscript of "War and Peace" so that they have an HTML version whenever they feel the need to read the classic...

Here is the code:


//Brandon Theolet
//
//1.4 Write a method to replace all spaces in a string with '%20'. If implementing in Java, please use a character array so that you can perform this operation in place (McDowell, 2012).
//
//McDowell, G. L. (2012). Cracking the coding interview: 150 programming questions and solutions. Palo Alto, CA: Career Cup, LLC.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace Problem1_4
{
    class Program
    {
        static void Main(string[] args)
        {
            //Get a string from the user.
            string myString;

            Console.Clear();
            Console.Out.Write("Please enter a string: ");
            myString = Console.In.ReadLine();

            //Call the method and display a before and after
            //comparison showing the transformation.
            Console.Clear();
            Console.Out.WriteLine("Given string:\t\t\"" + myString + "\"");
            insertHTMLSpaces(ref myString);
            Console.Out.WriteLine("\nTransformed string:\t\"" + myString + "\"\n");
         
            //Pause to display results before closing.
            Console.Out.Write("Please press any key to exit...");
            Console.ReadKey();
        }

        static void insertHTMLSpaces(ref string myString)
        {
            //The original problem has been modifed for
            //this exercise. It stated that we can assume
            //there is enough space at the end of the
            //string to insert the characters. Instead,
            //we are not assuming there are enough spaces
            //to do the in-place operations. This means we
            //have to create the space first. For every
            //space present in the string, we need to add
            //two more.
         
            //If extra spaces were given at the beginning or end, remove them.
            myString = myString.Trim();

            //The true length of the string.
            int stringLength = myString.Length;

            if (stringLength < 1) return;

            for (int myIndex = 0; myIndex < stringLength; myIndex++)
            {
                //Add additional spaces for in-place operations.
                if (myString[myIndex] == ' ') myString += "  ";
            }

            //The last character index in the lengthened string.
            int lastCharIndex = myString.Length - 1;

            //The last character from the original string.
            int partitionIndex = stringLength - 1;

            //Convert the string to a char array for in-place swaps.
            char[] myCharArray = myString.ToCharArray();

            while (partitionIndex >= 0)
            {
                if (myString[partitionIndex] != ' ')
                {
                    //Character at partitionIndex is not a space.
                    //Move it to the rear of the string.
                    myCharArray[lastCharIndex] = myCharArray[partitionIndex];
                    lastCharIndex--;
                    partitionIndex--;
                }
                else
                {
                    //Char at partitionIndex is a space. Replace
                    //it as neccessary.
                    myCharArray[lastCharIndex] = '0';
                    myCharArray[--lastCharIndex] = '2';
                    myCharArray[--lastCharIndex] = '%';
                    lastCharIndex--;
                    partitionIndex--;
                }
            }

            //Allocate the transformed string back to the original
            //variable since it was passed by reference.
            myString = new string(myCharArray);

            //O(n) time complexity, in-place swaps in the char array,
            //method ensures there is enough memory to insert extra
            //characters. Return control to caller.
            return;
        }
    }
}


Here are some screen shots of the program in action:



Reference:

McDowell, G. L. (2012). Cracking the coding interview: 150 programming questions and solutions. Palo Alto, CA: Career Cup, LLC.

Tuesday, February 5, 2013

There is a Fine Line Between Learning and Failure

I was talking with some of my fellow students today and the discussion turned to academic projects. Which academic projects had us learn the most?

Most of the examples that I heard were projects that were overly easy. Very little thought process was put in place for these. The student was able to complete it way ahead of the deadline by applying an established design principle or design pattern. Basically, a problem was given to a student who already knew the answer.

This isn't learning. This is merely practice.

Problems like these definitely have their place in a college curriculum. Practice is essential to hone any skill and keep it sharp. Major League ballplayers don't stop hitting balls in batting practice once they've learned the exact technique their hitting coaches want them to learn. Likewise, programmers shouldn't stop applying existing design patterns to new problems that merit the pattern once they've learned it.

Practicing skills was not the topic of the original question. We asked which project had us learn the most. My answer was a project that I came very close to failing. Some of the guys in the conversation had the same class and the same project, and they shuddered at the thought of it. Many of them had failed that project in at least one requirement of the assignment.

What was the project? It was in a class called Game Console Development and involved writing a working video game in the XNA 4.0 framework that would run on both a PC and an X-Box 360. The course itself was six weeks long, with five weeks given to the project, full of all the standard reading and assignments given in any college course.

This project was not a greenfield project. A greenfield project, for those not familiar with the term, is a code project created from scratch. Those projects tend to be easier for college undergrads because that individual has no questions about the framework because he or she wrote all of it. This project used the XNA 4.0 framework, which is vast and documented (quite well) over at the MSDN web-site. Not only did we have to contend with this framework that was brand new to most of us, but we were also given another framework inside of that one to build around.

Because of the length of the course, our instructor wanted us to spend most of our time on programming game logic and not setting up an infrastructure to display menus and options and load the game assets. At first, this was a welcome blessing. The menu system seemed easy enough to understand with the example code at first glance.

Then we were asked to change it. Because we couldn't leave the default settings for these menus in place since they had nothing to do with our game. The blessing quickly became a curse.

If you would like to check out the code in question, titled Game State Management Example, it is available for everyone here.

Now, looking at this production quality code, written by some great software developers who are loads smarter than I am, made me feel inferior. Not only had I never written anything close to this elegant, beautiful sample framework, but at first I couldn't tell you what many of the lines actually did. It was all Greek to me. I got the overall picture, but instead of using this framework as a black box, I was expected to open it up and tinker with the fragile components. What does this do? I had no idea where to start. Changing some of the code often left me with a bunch of code that no longer compiled.

Now, compare this to the projects discussed earlier, where the students had a clear idea of what was needed and knew every line of code by heart because they authored it. Those projects were practice, and this was learning.

I knew what the results of the sample were. I had run the code on both my PC and X-Box 360. There was an abstract data type that defined what a screen was, and then there were classes that used these defined screen properties to define specific types of screens. That much was clear. I could also find the one screen that proudly displayed the message, "Game code goes here..." So, my game would be coded there, but for the rest of it...I had no idea.

I wanted to be angry with my instructor for working up the assignment this way. Why would he give us code that was so advanced compared to our study of .NET and C# that we would have no idea how to work with it? I tried to bring this up to him as politely as I could, and he gave me an answer. His answer was so important to how things actually worked, and I felt like an idiot for not realizing it.

My instructor, Andy Dunn, formerly of Microsoft and now senior game programmer extraordinaire at Exato Game Studios, told me that rarely in the real world would I be implementing code from scratch. More likely, I will debug, maintain, and add to existing code much like we were doing in this assignment. I was floored, like a boxer who is staring at the stadium lights and can't find the will or energy to stand back up for another round. I was ready to tell myself that if I couldn't understand this, then I shouldn't be in software development.

Good thing I didn't listen to myself about quitting.

There was a lot of trial and error. There was a lot of back-and-forth discussion about the fine details of the framework with Andy. And there was a lot of frustration. But, I ended the course with a passing project. I had not been able to implement all of the features that I wanted to, but I was able to implement all of the features required for the assignment and then some. I was also able to secure some positive feedback from Andy, who told me it looked very polished, you know, for a student project.

I came very close to failing this project because I was ready to quit. I was frustrated with not knowing how all of this code worked. To me, it was pure voodoo. The Microsoft team that wrote it was using arcane PC wizardry to get XNA and .NET to do things I had never been exposed to. I saw little value in using this framework as a foundation for my game, until my mentor stepped in and got me to do it. I forced myself to continue failing at this project until I got it working.

And then I no longer felt like a failure, but rather like a ragged and tattered survivalist who was lost at sea. The design patterns and challenges about sorting small arrays and comparing drop-down list items and writing classes to do binary arithmetic were simply the kiddie-pool. I now saw the actual danger of the open sea and how easy it was to get swept away and drown. To avoid drowning, I would need to be able to dive into huge waves and swim for my life by picking through unfamiliar code and somehow making it work.

Microsoft Research published a paper on some of the skills that fresh graduates lack upon obtaining their first software developer position. Not surprisingly, working on a team with a large established code base was one of the top issues highlighted in the paper. I still feel inferior when comparing myself to some of the best that Microsoft has to offer, but I feel a bit better after reading that new hires in that company experienced some of the same problems that I did on my academic project.

What did I learn during this course? I learned that existing codebases take a lot of time to get familiar with when they are new to you. I learned that trial and error does not mean I'm a total idiot. I learned that three years of study lets me get out of the kiddie pool and see firsthand how much knowledge is actually present in the software development industry. But, the most important lesson of all, I learned that there is a fine line between learning and failure.

Algorithms: Determine if Two Strings Are Permutations of Each Other

In McDowell's (2012) set of problems, we come across problem 1.3: Given two strings, write a method to decide if one is a permutation of the other. If two strings are permutations, then they will have the same characters. Given "ABCDEF" and "FEDCBA", we can determine that these strings are permutations. They contain the same characters, just in different orders.

To go about solving this problem, I decided to use a similar approach to previous string algorithm problems and use an array to produce behavior that is similar to a hash table. An array is declared with spots equal to the number of possible characters. In extended ASCII, this is 256. My array has indices from 0 to 255.

There are two basic checks that I came up with to determine if the strings are permutations. The first is to check the length. If the strings are not the same length, they cannot be permutations. Next, I simply count the characters in each string. If the counts do not match, the strings are not permutations.

I did the counts by first adding the counts from the first string to the count array, and then subtracting the counts from the second string from the count array. At this point, each index in the count array should be 0. If any spot does not equal 0, then the strings are not permutations.

Here is the complete source code. I've written this one in C# in order to brush up on the basic syntax instead of C++.

//Author: Brandon Theolet
//
//McDowell, G. L. (2012). Cracking the coding interview: 150 programming questions and solutions (5th ed.). Palo Alto, CA: Career Cup, LLC.
//Problem 1-3: Given two strings, write a method to decide if one is a permutation of the other.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace Problem1_3
{
    class Problem1_3
    {
        static void Main(string[] args)
        {
            //Testing harness. Prompts the user for two strings
            //and invokes the method that compares for permuatations.

            string myString1, myString2;

            Console.Clear();
            Console.Out.Write("Please enter any string: ");
            myString1 = Console.In.ReadLine();

            Console.Out.Write("Please enter another string: ");
            myString2 = Console.In.ReadLine();

            if (isPermutation(myString1, myString2)) Console.Out.WriteLine("These strings are permutations.\n");
            else Console.Out.WriteLine("These strings are not permutations.\n");

            Console.Out.Write("Please press enter to continue...");
            Console.In.ReadLine();
        }

        static private bool isPermutation(string myString1, string myString2)
        {
            //If the strings are different lengths, they are not
            //permutations.
            if (myString1.Length != myString2.Length) return false;

            //Create an array to count the number of each specific
            //character in the strings.
            int[] characterCount = new int[256];
            int charIndex;

            //Populate the array with default value 0.
            for (int index = 0; index < 256; index++)
            {
                characterCount[index] = 0;
            }

            //Count the number of each character in the first
            //string. Add the count to the array.
            foreach (char myChar in myString1.ToCharArray())
            {
                charIndex = (int)myChar;
                characterCount[charIndex]++;
            }

            //Count the number of each character in the second
            //string. Subtract the count from the array.
            foreach (char myChar in myString2.ToCharArray())
            {
                charIndex = (int)myChar;
                characterCount[charIndex]--;
            }

            //If the strings are permutations, then each character
            //would be added to our character count array and then
            //subtracted. If all values in this array are not 0
            //then the strings are not permutations of each other.
            for (int index = 0; index < 256; index++)
            {
                if (characterCount[index] != 0) return false;
            }

            //The strings are permutations of each other.
            return true;
        }
    }
}


Reference:

McDowell, G. L. (2012). Cracking the coding interview: 150 programming questions and solutions (5th ed.). Palo Alto, CA: Career Cup, LLC.

Friday, January 18, 2013

Project Examples: Sudoku Solver with a GUI

Phase two of the Sudoku project is completed. The solver now has its own GUI, or graphical user interface. The interface was created in Adobe Flash and the Sudoku solver algorithm I wrote in C++ was ported to Actionscript 3.0. If you would like to see the C++ application in action, or view the source code for that application, the posts are available here:

Project Examples: A Sudoku Solver Implementation

Code Examples: SudokuPuzzle.h and SudokuPuzzle.cpp

Some minor changes were made to the code in order to accommodate the new input system. The grid that exists in the GUI is separate from the Sudoku puzzle that exists in memory as it was written for the C++ command-line application. There are simple methods in the Actionscript code that syncs the grid with the puzzle in memory.

The interface is very simple. Click on any box to enter a number. The solver will only accept valid entries. If an entry is not valid, the puzzle will not accept it. Any box that you click on will generate a list of valid candidates to assist the solver.

There are limitations that were carried over from the C++ code to the Actionscript version. For starters, if the solver is executed with too few given clues, then the solver has the potential to run for hours. I've added a check to the solver. If there are too many potential solutions, the solver will stop running and allow the user to keep editing the puzzle.

I've had successful test cases where 17 clues were given and over 4,000 solutions were found. The solver did not stop. In a case where 16 clues were given, and this was a very poorly designed puzzle, the solver stopped immediately.

This tool was designed to assist players in completing difficult Sudoku puzzles that are well-designed. Poorly designed puzzles may be unsolvable or contain too many potential solutions in order to let the solver generate them in a reasonable amount time (less than 3 minutes). If a poorly designed puzzle is entered, do not expect a solution to be found.

Please, feel free to test the new GUI for Sudoku Master. Still on my wish list is an algorithm that will generate new puzzles for users to try and solve, and the ability to save and load puzzles. For now, the existing framework will do for a small applet that will assist in solving puzzles.

 

Sunday, January 13, 2013

Code Examples: SudokuPuzzle.h and SudokuPuzzle.cpp

My latest project, a command-line interface Sudoku solver, used the class SudokuPuzzle. Here is the source code for the header file and implementation file for SudokuPuzzle. The class will be re-used in a project that creates a GUI (graphical user interface) for the solver and adds new features, such as the ability to save and load puzzles.

SudokuPuzzle.h

#ifndef SudokuPuzzle_Header
#define SudokuPuzzle_Header

#include <vector>

using namespace std;

class SudokuPuzzle
{
public:

    //Default constructor.
    SudokuPuzzle();

    //Returns a list of candidate values for a spot at (x, y).
    void getPossibleValuesAt(int possibleSolutions[], int x, int y);

    //Returns the number of possible candidates at (x, y).
    int getNumberOfPossibleValuesAt(int x, int y);

    //Attempts to add a number to (x, y). As long as the number
    //is a valid candidate, the method returns true.
    bool addNumberAt(int addNumber, int x, int y);

    //Removes a number at (x, y)
    void removeNumberAt(int x, int y);

    //Resets the puzzle grid and solution list.
    void newPuzzle();

    //Prints the puzzle.
    void printPuzzle();

    //Returns a vector container of all the possible solutions
    //found for the current puzzle.
    void solvePuzzle();

    //Determines whether or not the current puzzle is solved.
    bool isSolved();

    //Copies the puzzle into newPuzzle with a deep copy of the
    //grid.
    void copyPuzzle(SudokuPuzzle &newPuzzle);

    //Shows a solution in the list of solutions at the
    //specified index.
    void showSolution(int index);

    //Returns a solution in the list of solutions at the
    //specified index.
    SudokuPuzzle getSolution(int index);

    //Returns the number of solutions in the list of solutions.
    int getNumberOfSolutionsFound();

private:

    //Two-dimensional grid to store the puzzle.
    int myGrid[9][9];

    //Dynamically expanding vector to store all possible solutions
    //to the puzzle. Most puzzles only have 1 solution, but poorly
    //designed puzzles may have many.
    vector<SudokuPuzzle> possibleSolutions;
};

#endif

SudokuPuzzle.cpp

#include <iostream>
#include <cmath>
#include <vector>
#include "SudokuPuzzle.h"

using namespace std;

SudokuPuzzle::SudokuPuzzle()
{
    //Default constructor. This constructor creates an empty Sudoku grid.

    for (int x = 0; x < 9; x++)
    {
        for (int y = 0; y < 9; y++)
        {
            myGrid[x][y] = 0;
        }
    }
}

void SudokuPuzzle::getPossibleValuesAt(int possibleSolutions[], int x, int y)
{
    //This function will use the array possibleSolutions as a hash table to
    //determine the value of all candidates for the Sudoku grid at (x, y).
    //The hash table is then used by the calling method to determine which
    //candidates are allowed for that spot.

    for (int index = 0; index < 9; index++)
    {
        possibleSolutions[index] = 1;
    }

    for (int xPos = 0; xPos < 9; xPos++)
    {
        if (xPos != x && myGrid[xPos][y] > 0) possibleSolutions[myGrid[xPos][y] - 1] = 0;
    }

    for (int yPos = 0; yPos < 9; yPos++)
    {
        if (yPos != y && myGrid[x][yPos] > 0) possibleSolutions[myGrid[x][yPos] - 1] = 0;
    }

    int xGroup = floor((double)(x / 3));
    int yGroup = floor((double)(y / 3));
   
    for (int xPos = 0 + (3 * xGroup); xPos < 3 + (3 * xGroup); xPos++)
    {
        for (int yPos = 0 + (3 * yGroup); yPos < 3 + (3 * yGroup); yPos++)
        {
            if (yPos != y && xPos != x && myGrid[xPos][yPos] > 0) possibleSolutions[myGrid[xPos][yPos] - 1] = 0;
        }
    }

    return;
}

int SudokuPuzzle::getNumberOfPossibleValuesAt(int x, int y)
{
    //This function will return the number of candidates available
    //for the Sudoku grid at (x, y).

    if (myGrid[x][y] != 0) return 0;

    int possibleSolutions[9];

    getPossibleValuesAt(possibleSolutions, x, y);

    int candidateCount = 0;

    for (int x = 0; x < 9; x++)
    {
        if (possibleSolutions[x] == 1) candidateCount++;
    }

    return candidateCount;
}

bool SudokuPuzzle::addNumberAt(int addNumber, int x, int y)
{
    //This function will put in addNumber in the grid at (x, y) as long
    //as addNumber is a valid candidate for that spot. A boolean is returned
    //so the caller knows whether or not the number was successfully added.

    if (addNumber == 0) return true;

    if (myGrid[x][y] != 0)
    {
        cout << "The spot at (" << x << "," << y << ") is currently occupied.\n\n";
        system("PAUSE");
        return false;
    }

    int possibleSolutions[9];

    getPossibleValuesAt(possibleSolutions, x, y);

    if (possibleSolutions[addNumber - 1] == 0)
    {
        cout << addNumber << " is not a valid candidate for (" << x << "," << y << ").\n\n";
        system("PAUSE");
        return false;
    }

    myGrid[x][y] = addNumber;
    return true;
}

void SudokuPuzzle::removeNumberAt(int x, int y)
{
    //Remove one number.

    myGrid[x][y] = 0;
}

void SudokuPuzzle::newPuzzle()
{
    //Clears the grid.

    for (int x = 0; x < 9; x++)
    {
        for (int y = 0; y < 9; y++)
        {
            myGrid[x][y] = 0;
        }
    }
}

void SudokuPuzzle::printPuzzle()
{
    //Display the puzzle for implementation
    //in a command-line environment.

    cout << "\t0 1 2  3 4 5  6 7 8\n\n";

    for (int y = 0; y < 9; y++)
    {
        cout << y << "\t";

        for (int x = 0; x < 9; x++)
        {
            cout << myGrid[x][y] << " ";
            if (x == 2 || x == 5) cout << " ";
        }

        if (y == 2 || y == 5 || y == 8) cout << "\n\n";
        else cout << "\n";
    }
}

bool SudokuPuzzle::isSolved()
{
    //Determine if all spots are filled with valid candidates.

    for (int x = 0; x < 9; x++)
    {
        for (int y = 0; y < 9; y++)
        {
            if (myGrid[x][y] == 0) return false;
        }
    }

    return true;
}

void SudokuPuzzle::solvePuzzle()
{
    //The puzzle solving algorithm. The algorithm scans each spot
    //and starts with the spot that has the lowest number of
    //possible candidates. Each candidate is tried one at a time
    //and the rest of the spots are filled in the same manner.
    //If a spot is found that is not yet filled, but there are no
    //valid candidates, then the puzzle is unsolvable and a new
    //candidate is tried in a previous spot.

    int candidateX, candidateY, candidateNumber = 100, tempCandidate;
    int candidateList[9];

    for (int x = 0; x < 9; x++)
    {
        for (int y = 0; y < 9; y++)
        {
            if (myGrid[x][y] == 0)
            {
                tempCandidate = getNumberOfPossibleValuesAt(x, y);

                if (tempCandidate < candidateNumber)
                {
                    candidateNumber = tempCandidate;
                    candidateX = x;
                    candidateY = y;
                    getPossibleValuesAt(candidateList, x, y);
                }
            }
        }
    }

    //All candidates have been determine for (candidateX, candidateY) where
    //that spot has the least amount of candidates. Loop through those candidates
    //and attempt to solve the next resulting puzzle.

    for (int index = 0; index < 9; index++)
    {
        if (candidateList[index] == 1)
        {
            addNumberAt(index + 1, candidateX, candidateY);

            if (isSolved())
            {
                SudokuPuzzle tempPuzzle;
                copyPuzzle(tempPuzzle);
                possibleSolutions.push_back(tempPuzzle);
            }
            else solvePuzzle();
           
            removeNumberAt(candidateX, candidateY);
        }
    }
}

void SudokuPuzzle::copyPuzzle(SudokuPuzzle &newPuzzle)
{
    //Deep copy the grid into newPuzzle.

    for (int x = 0; x < 9; x++)
    {
        for (int y = 0; y < 9; y++)
        {
            newPuzzle.addNumberAt(myGrid[x][y], x, y);
        }
    }
}

void SudokuPuzzle::showSolution(int index)
{
    //Show the solution in possibleSolutions at the specified index.
   
    if (index < 0 || index >= possibleSolutions.size()) return;

    possibleSolutions[index].printPuzzle();
}

SudokuPuzzle SudokuPuzzle::getSolution(int index)
{
    //Return the solution in possibleSolutions at the specified index.

    if (index < 0 || index >= possibleSolutions.size())
    {
        SudokuPuzzle newPuzzle;
        return newPuzzle;
    }
       
    return possibleSolutions[index];
}

int SudokuPuzzle::getNumberOfSolutionsFound()
{
    //Return the number of solutions in possibleSolutions.

    return possibleSolutions.size();
}