TIL: Algorithm for Random Numbers that sum to a value.

So I was on Reddit today, and someone had posted in the learning python subreddit about his random number function was running slow…

Specifically, he was creating a list of 9 random numbers between 0 and 100 (inclusive) where the sum of those 9 numbers totaled 100. I’ve never had a situation that called for that before, and the code he gave was a simple while statement saying that while the sum of the list didn’t equal 100, empty the list, and generate 9 numbers between 0 and 100 and add them to the list, and keep going until you got a total of 100…

So I commented that of course that would take ages, as the loop probably had to run tens of thousands of times to randomly get 9 different numbers between 0 and 100 whose sum totaled 100.

My original suggestion had been to get your first number, then generate a second number between 0 and 100 – first number, and then keep doing that…

When someone else posted the following quite elegant solution… I asked where he learned it from, and he couldn’t remember where, but after further investigating, its one of those quaint mathematical formulae that always works!!!

So, in our example, we need 9 numbers between 0 and 100 (inclusive), so we take 0, and 100, and then 8 random numbers between 0 and 100, and then sort them all, so we end up with 0, 8 numbers from smallest to largest, 100…

Then you simply take the difference between each set…

So for example, let’s say our 8 random numbers were 85, 24, 58, 5, 31, 46, 94, and 7. We then take those 8 numbers, add 0 and 100 as well, and then sort them, giving us:
[0, 5, 7, 24, 31, 46, 58, 85, 94, 100]

We then simply calculate the difference between each number (5-0, 7-5, 24-7, 31-24, etc) which, by doing so would give us:
[5, 2, 17, 7, 15, 12, 27, 9, 6]

Which, in turn, sum up to 100!!

I was like… HOLY CRAP!!! I’m keeping that one in my back pocket!!!

Never B Flat, Sometimes B Sharp, Always B Natural

Call Me… I mean… Collatz…

So – I was working on a Project Euler problem (#14 specifically), that deals with Collatz numbers. For those that don’t know, a Collatz number is a sequence by which you get a number to 1. If the number is even, you divide by 2, if its odd, you multiply by 3, and add 1, and the number is how many steps it takes to get to 1. So for example, 13, has a Collatz number of 10, as it takes 10 steps to get to 1 ( 13 > (3*13 +1) 40 > 20 > 10 > 5 > (3*5+1) 16 > 8 > 4 > 2 > 1). Naturally, as you get bigger numbers, it takes more steps to calculate the number of steps…

Now, not only do I do this on Project Euler, I also do this on the HackerRank site… Because while the problem is basically the same, they always throw a twist in… Using this example, Project Euler wants to know what number below 1 million has the most steps to get to 1… HackerRank on the other hand will have anywhere from 3 – 15 ‘test’ cases you need to solve, that follow certain rules, and your program only has so much time to run to pass. So for this problem, a single test case will contain anywhere from 1 to 10000 integers, that range anywhere from 1 to 5 million, and the program has 10 seconds to figure out what the maximum collatz sequence is for any given number. As an example, if the number give was 10, you would have to respond with 9 (since the number 9 has the most collatz sequences of any number between 1 and 10), and then if you are given 15, you would have to respond with 9 (since again, it has the most sequences to get to 1 between 1 and 15) and then you are given 20, and have to respond with 19 (technically 18 and 19 both have the same number of steps (22) to get to 1 – however the constraint dictates that if there are multiple ‘max’ paths, take the higher number – thus 19)… So this is what a simple test case of 3 numbers is… now handle 15 test cases, each with a different set of up to 10000 integers, ranging from 1 to 5 million… and do it in 10 seconds…

So that was the task…

So first – we realize we want to use memoization… What’s that you ask? In simple words, its caching calculations so you don’t have to do them again… Using our 13 example from above, what if, when we calculated 13, we stored each step… so that when 13 was done, we knew that 13 takes 10 steps, but we also know that 40 takes 9 steps, and 20 takes 8 steps, and 10 takes 7 steps, and 5 takes 6 steps, and 16 takes 5, 8 takes 4, 4 takes 3, 2 takes 2, and 1 is 1 step. By keeping this cached, we then are able to save lots of steps. For example, if we get to the number 80, we start our function, and step 1 takes us to 40… now instead of doing the other 9 steps, we look in our cache and see that 40 takes 9 steps, so then we can say in 2 steps, that 80 takes 10 steps… So by doing that and storing the values, we can then simply look up a given number and see how many steps it takes once they are all calculated…

So I created a recursive function that checks our cache, and if our number exists, it returned the # of steps, otherwise, it return the number of steps of either the even calculation or the odd calculation, depend on whether the number was even or odd… so I create that, and add the function for reading the input sequence, and let her rip… My little program goes through the first 2 test cases… and then starts to time out… because she is taking longer than 10 seconds to go through all of the thousands of possible numbers… hmm….

So then I think, well since I’m already using memoization, what if I prebuild all of the values from 1 to 5 million, so then all I need to do is simply look up the max value for a given number…

Except building the initial list from 1 to 5 million takes 13 seconds (on my computer)… hmm…

So now its time to start thinking outside the box…

So first, I think, well, if I start at 5 million and work my way down, lots of things will have been created by the time I got to them (as an example, when 5 million process, it process 2.5 million, which in turn processes 1.25 million, which in turn processes 625000, which processes 312500, etc, etc… So I change my prebuild to go from 5 million to 1 down… This actually knocks 2 seconds of my prebuild…

So… more efficiency… I think and realize that at some point building downward, I’ll reach a number that afterwards, every number below it has already been processed… So I add some variables to my Collatz function, and after processing, it tells me that moving downwards, 125001 is the last uncached number… so now, I only have to build down to 125001, thus saving me 125000 function calls, plus all of those recursive function calls…

However, this only knocked about 2 seconds off as well… so I’m sitting here at 8+ seconds… failing my test cases because it still takes, even with the cache build of 8 seconds, 14 seconds to process all 5 million numbers… So now what…

I start timing individual numbers… how long does it take to calc the 3 million? So I run that in quarter million increments to see how much these higher numbers take, and then noticed something… 3.75, 4, 4.25, 4.5, 4.75, and 5 million all return the same maximum… 3732423 at 579 steps. And then the light bulb goes off… If I ‘hardcode’ a list with that value, I don’t need to test any value above 3732423, because that is the max… that will save 1.27 million function calls + all of the additional recursive calls… So I write a function that sees if the value is greater than or equal to my list value, and if so, report the list value, otherwise, do the cacluations…

That one step shaved literally a second of my test time…

So, to make a long store short (too late), I ended up hand calculating (using my function), the top 50 #s… to understand how fast that made the program, when you see that #50 is the number 97, you realize that for all 4999903 numbers above 97 that could be in the list of integers, there are only 50 possible answers for the maximum collatz… which of course, means my program ran in under 1 second for all 10000 entries, no matter what they were…

So there, that is the story of how I didn’t use a function in the normal way, to gain some major efficiency in a program! If you want to see my program, you can see it on my github page here

Call Me – Blondie
My Way – Frank Sinatra
Never B Flat, Sometimes B Sharp, Always B Natural

So very disappointed…

So, I’ve been using HackerRank quite a bit recently, and as I’ve been delving deeper into the Python ‘domain’, I’ve been getting more and more disappointed by it. Not by python, but by the site and it’s ‘testing problems’.

By that I mean the problem explanations can be somewhat vague or even sparse in the explanation of what they are looking for, and the test cases are for the simplest answer, and while I don’t expect them to cover every test case (especially in the samples), 2 or 3 different ones that cover some of the variants to me are expected, as to me, this site is about the learning process.

And once you do solve it, they don’t do well at explaining why they did what they did, just that they did.

To me, it makes the learning process tedious…

But I digress…

But not really…

Because I was then like, well, its a new week, I’ll jump on edX and work on this weeks lessons for MITs Python course (as discussed here and here), so I log in, and what do I behold?!

Week 4
No new materials out this week. Catch up on your problem sets!

So no luck there either…

Luckily for me, I was listening to my current favorite python podcast Talk Python to Me… more specifically, this podcast about CheckIO.org, a python ‘game’ testing site. I’m liking this a lot as well, and currently, more so than HackerRank.

Tomorrow, I will post a programming update on where I rank on the various sites!

Take It to the Limit – The Eagles
Never B Flat, Sometimes B Sharp, Always B Natural

Prim and Proper…

So HackerRank ‘tricked’ me with an email, and I ended up gettung into their algorithm section before I felt I was ready for it. (Keep in mind, this was not a mean trick, or something dishonest or shady on their part – they just sent an email that said ‘You should try our Algorithm domain’, and like a dummy, I did.)

So to get the first badge under Algorithms, I had to complete something on graph theory (which I haven’t gone over since high school geometry – 30-ish years ago), and in a new language (python)… but I’m nothing if not dedicated, so I take the plunge and give it a try…

Here is the problem I had to work on:

Given a graph which consists of several edges connecting the nodes in it. It is required to find a subgraph of the given graph with the following properties:

  • The subgraph contains all the nodes present in the original graph.
  • The subgraph is of minimum overall weight (sum of all edges) among all such subgraphs.
  • It is also required that there is exactly one, exclusive path between any two nodes of the subgraph.

One specific node is fixed as the starting point of finding the subgraph. Find the total weight of such a subgraph (sum of all edges in the subgraph)

So, given that, I came up with a basic solution… First Solution and the thing about this is that it worked. The catch was for a test case that had 10000 vertices, it took almost 3 minutes to run, so the test case timed out, and it failed me!! 🙁

So, cracking my knuckles, and rising to the challenge, I employed my google-fu and began to track down some more efficient solutions. Since this discussed Prim in the title, I researched Prim and MST’s and came up with this algorithm to do this calculation, and after some more searching found a python version. After reviewing, I noticed that it wasn’t all that different from my solution, just that it used some python imports that I didn’t know about (mainly heapq and using dictionaries from collections) – apparently using the dictionary stuff from collections speeds that process up, and heapq speeds vertex jumping up significantly… my new solution basically did the same thing, but much more efficiently. Thus solving the problem in a few seconds.

What I take from this is that my original solution was right (and it worked, just not as efficient), and the only real changes the new solution made were to use python modules that were basically unknown to me to make what I was doing more efficient.

So all in all, I’m happy with the results.

I got me 3rd Percentile algorithm (which I’ve since increased to the 4th Percentile) – so I’ve got a ways to go in the Algorithms domain. But I’m going back to the python domain to finish it before i wander off into any of the other domains!

Under Pressure – David Bowie & Queen
Never B Flat, Sometimes B Sharp, Always B Natural

Programming Update

So as I stated in an earlier post, I’ve been taking the edX MITx 6.00.1x Intro to Computer Science and Programming w/ Python to aid in brushing up on modern computer science (I last attended school in the early 1990’s). At the moment, I’m just ‘auditing’ the course, but I think I’ll end up paying to get the certificate of verification…

So, during week 2, I made a mistake when turning in one of my assignments… I rushed to get it done, and did it in my IDE (PyCharm) so that I could test my results and such, I just copied my coded answer from PyCharm to the site’s code input… But because I was doing it in PyCharm, and rushing, I turned it in without any comments… shame on me! As I was peer reviewing others code – I felt stupid for rushing something that didn’t need rushed and not even leaving a basic comment… But, hopefully, I turn this into a learning process to slow down and comment…

I think part of it is I’m also using HackerRank and CodeWars and ProjectEuler to practice python with… and of course, none of the answers on these sites involve comments, so I tend to not use them…

I’m a little disappointed with myself regarding this, and I think I need to adjust my style regarding that… But it is a learning experience…

I’m also going to start listing my ‘scores’ with these sites ‘challenge’ sites, as I’ll have a way to measure my progress – and be publicly accountable (to anyone who actually reads)…

So, as of September 9th:

  • HackerRank: I have 375 points, 367 Hackos, and am ranked # 5531
  • CodeWars: I have 50 Honor, am at level 6 kyu (a decreasing scale – so lower is better), and have an Honor Position of # 28252
  • Project Euler: No ‘ranking or scoring’ system, I’ve completed problem #1 (I’ve been spending my time on HackerRank and CodeWars, as well as Reddit’s DailyProgrammer subreddit)
  • Reddit’s Daily Programmer – also non-ranking/scoring. I will be posting my results to my GitHub page. I’ve currently done challenge #281.

So again – I do this so I can update myself and see how I’m progressing!!! I’m going to work towards doing at least one item per week for each site, if not more (I might try at least 1 per day on some of them)…

But for now, that should cover it!

Play That Funky Music – Wild Cherry
Never B Flat, Sometimes B Sharp, Always B Natural

Yay!! edX started… wait, What?!?

So the edX Python course started 14 hours late…

Anyways, I’m going through it, and one of the first main loop concepts talks about while loops… And the 3rd programming exercise on while loops says that they provide the variable end, use a while loop to calculate the sum of all the integers leading up to end (meaning, if end = 6, we expect 21 [1 + 2 + 3 + 4 + 5 + 6])…

Well, of course I already know a better way to do this… I just blogged about this particular thing a few days ago when talking about Project Euler #1 so I went to use that answer, and of course, it didn’t like it (it was looking for a while statement), so I cheated and wrote:

while True:

print(end * (end + 1) / 2)

which solved it…

I posted on the site as well that I was disappointed with that particular exercise, as I would think that MIT would actually have an exercise that taught you the proper way to solve the problem (meaning there are lots of good valid reasons to use a while statement, as the prior exercises had done, I felt that they shouldn’t use one that had a better solution than the one they wanted)…

But hey… onwords and upwards!

Home Sweet Home – Motley Crue
Never B Flat, Sometimes B Sharp, Always B Natural

edX, CodeWars, and CriticalRoleRenamer

So, I was supposed to be starting a Python course through edX, but apparently it’s starting late (was supposed to start about 12 hours ago, so I was going to give my first thoughts after reading the syllabus and such, but alas)… so that sounds like something I’ll have to do in the future…

Yesterday I started with CodeWars and completed 11 exercises (codewars calls them Katas), although 2 of them were simple parts of the sign up process. I’m going to try to do at least 2-3 a week, when not focusing on my own projects.

Speaking of my own projects, over the weekend, I wrote my first Python program for personal use. As I’ve stated before, I’m a gamer, and I’ve been trying to watch this show called Critical Role, which involves a group of voice actors playing Dungeons & Dragons. It’s actually a good show to see the game in action… But I don’t have 3 hours an episode to sit their and watch… but what I do have is that time when I’m walking, and driving to work, etc…

So I used a program called ClipGrab and downloaded the episodes from YouTube, and converted them to mp3 files, and then put them on my phone, so that my podcast player (Pocket Casts) would see them. Now I can listen (as admittedly, there isn’t a great need to ‘watch’ the program) while I’m exercising, or driving, etc. Plus Pocket Cast lets me increase the speed and jumps sound pauses (silence in the conversation), so that 3 hour episode only takes me about 2 hours…

But I digress…

When being downloaded and converted, the youtube naming convention (which admittedly is probably just the name the uploader used) isn’t the most consistent, or friendly, especially when you have things like Plex or Kodi for media serving content… so I wrote a python program that grabbed the episode number and title from the file name, and renamed the file… Taking it from, for example, “Enter Vasselheim – Critical Role RPG Show: Episode 16.mp3” to “Critical Role – S01E16 – Enter Vasselheim.mp3”, which is a common naming convention. The neat thing was, by using regex (again, another thing I’m learning), I was able to grab the title, and then episode number, but with code, helped normalize them… because sometimes they used a pipe (|) instead of a hyphen to separate, sometimes they had the colon after show, and sometimes not, and sometimes they had the show title uppercased… But I was able to rename the whole thing (which literally took less than a second of time to run) so that the episode title was standard Capital Case, and based on thetvdb.com, had the season numbers properly meshed out…

While its a simple program, all in all, i’m rather proud of it… My next step will be using it for other shows I listen to! You can see it in its current form on my GitHub page!

Rocky Mountain Way – Joe Walsh
Never B Flat, Sometimes B Sharp, Always B Natural

Project Euler #1

So I’ve been threatening to do this basically since day 1…

To aid in my programming learning, I’m doing various programming puzzles… via Project Euler, Exercism.io, Coding Bat and others…

But I think another part of the learning process is explaining things, or sometimes at least why I did what I did…

So I want to discuss my solutions… Instead of boring you here with code, I’ll link my GitHub Gist, as well as my JSFiddle solutions.

First of all, the problem:

If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.
Find the sum of all the multiples of 3 or 5 below 1000.

Now, first, the key word here is under. But we’ll get to that in a second.

At first, this looks like it can be solved with a simple loop, and really, for this exact problem, it can be, with little concern about cost. However, what if this was changed to 1 million, or 10 million, or 100 million…

Well, on most modern-day computers, looping through 1 to 1,000,000 takes about 2 seconds… 1 to 10,000,000 takes about 20 seconds… while 1 to 100,000,000 takes over 3 minutes… while 2 seconds is not horrendous, 20 seconds is bad, and 200 seconds is down right insufferable – at least from today’s web standards…

The solution for this is simple, and usually overlooked… Its taking a simple equation 3 times and your done.

Wait, what?!

All this is looking for is the sum of all multiples of 3 under 1000 (so 3+6+9+12…+996+999) and the sum of all multiples of 5 under 1000 (important to note since 1000 is divisible by 5, if we use it, we’ll get the wrong result) (so 5+10+15+20…+990+995). Now, if you remember your high school math, there was this equation that allows you to get the sum off all the consecutive integers that made up a number… so, for example, the sum of the consecutive integers for 10 is 1+2+3+4+5+6+7+8+9+10=55. The formula is n(n+1)/2… or 10(11)/2 >> 110/2 = 55. So having that, all we have to do is realize that the sum of 3+6+9+12 (quick example) = 30 is the same as (1+2+3+4)*3 >> 10 * 3…

Now, if that’s the case, then the sum of consecutive multiples of three would be 3 * ((n/3)(n/3+1) / 2), or in this case, we know that 999 is the last, and 333 is 1/3 of that, so 3 * 333 * 334 / 2 = 166833, and the same would be true of for 5 (5 * 199 * 200 / 2 = 99500). So we would think, at first glance, that our answer is 266333. However, we’d be wrong!

Wait, what?!

Well, we forgot one thing. In using this equation, every multiple of 15 (3×5) is being added in twice… once for the 3 side of it, and once for the five side of it… so all we have to do is subtract out one of them… so 15 * 66 * 67 / 2 = 33165… Now, if we take 166833 + 99500 - 33165 = 233168, which would be the exact same value our loop gives us…

So if we did this to get the results for 100,000,000 instead of 1000, we’d get our solution in under 1/100th of a second, vs 3 1/3 minutes, and at way fewer CPU cycles, which could be very important for some algorithms you write!

So that is my first foray into Project Euler, and the programming puzzles!

Young and Innocent -Elefante
Never B Flat, Sometimes B Sharp, Always B Natural

How can I… #1 – Find a max value

So, one of the things I wanted to do was to help others with Python as I was learning… and I came across the following reddit in the learnpython subreddit…

If you follow the link, you can see my response… but basically someone was trying to go through an array of numbers (called a list in python) and come up with the max value (I know you can use the max method provided – but I think the exercise was to see how you’d write it)…

My answer, was the following:

def max_number(nums):
    largest = 0
    for num in nums:
        largest = num if num > largest else largest

    return largest

Which for me, was neat as it was one of the first times I used boolean expression from javascript in python (in javascript, it would have been largest = (num > largest) ? num : largest;) but even though its a simple exercise, I felt good that I came up with the pythonic solution relatively easy!

More to come… Until next time…

Never B Flat, Sometimes B Sharp, Always B Natural

50 Ways To Leave Your Lover…

A song written by the american singer-songwriter Paul Simon.  The song was originally released in December 1975.  Its also the first thing that popped in my head when I read my assignment for today…

My assignment?  Well, part of learning is learning from others with experience… Part of why this blog exists is I started following John Somnez at Simple Programmer and signed up for his FREE “Create A Blog to Boost Your Career” [which, for the record, I suggest you try, if you have even the smallest inkling of wanting to be a programmer…]

But I digress…

In today’s lesson, [which I don’t mind talking about, since he gives this away if you sign up… and I’m not repeating the whole lesson verbatim, just focusing on the one concept in it] it was discussed that I need to make a list of of blog post ideas… and he says to put 50 ideas on the list… which immediately caused the title song to pop into my head… which brings us to here…

[By the way, you’ll notice I love using ellipses… if it offends you, i’m sorry that your offended by it, but I won’t stop using them…]

Of course, the idea of brainstorming 50 ideas isn’t about filtering what is good or bad, just making the effort to come up with 50 ideas… and for me, the first idea was to blog about creating the list… which is what this post is about! Now, the nice part is the course comes with 21 ‘generic’ topics… the idea is to use those generic ideas and come up with specific post ideas… for example, one of the topics is “Tutorials and how-to’s”… so the idea is not to use that as 1 entry, but think about writing a how to on setting up my old linux computer as a Retro Game Console box (to use with only legal roms, of course!)

So here is my initial list of 50 ideas:

  1. Creating my initial 50 ideas {being used right this moment}
  2. Maintaining 50 items (every so often coming up with new ideas and blogging about those)
  3. How To’s set up KODI on my old linux machine
  4. Book review on Soft Skills
  5. Course review on Learning Python
  6. How to on retro gaming rig
  7. Programming discussion on my table manager program as I design it
  8. Programming discussion on my amatuer author new story finder
  9. Solving and discussing my solution to Project Euler challenges (although, techinically, this could be broken out into 556 posts currently)
  10. Solving and discussing my solutions to Code Wars Kata Challenges (similar to project euler, this could be hundreds of possible posts)
  11. Solving and discussing my solutions to Daily Programmer subreddit challenges (see above 😉 )
  12. Progress discussion on freecodecamp course
  13. Discussing my opinions on the Solus linux project
  14. Discussing my opinions on the Ubuntu-Mate linux project
  15. Discuss my programming workflow
  16. Solving non-propietary problems with my customers
  17. Converting my DJ computer to linux
  18. Fixing and updating personal coding solutions
  19. Reviews of programs I personally use
  20. Discussion of current blog posts of other bloggers that I find inspiring and/or useful
  21. Discussion my solutions to stack overflow questions I answer
  22. Discussion of my solutions to reddit javascript questions I answer
  23. Discussion of my solutions to reddit python questions I answer
  24. Discussion of various solutions to reddit python questions I’ll be asking
  25. How to set up my site with Let’s Encrypt
  26. any interesting stories involving my band shows
  27. any interesting stories about songs i’m writing
  28. any interesting stories about songs I’m learning that other people have written or performed
  29. any interesting stories about the creation of a new musical project
  30. Review of Automate the Boring Stuff in python
  31. Top # favorite linux apps
  32. Answering any computer/linux/music related questions brought before me
  33. Building vs Buying a computer
  34. Why I run Linux
  35. Product reviews of my personal computer items (like monitor, printer, etc)
  36. Discuss various other blogs I read
  37. Discuss various websites I use
  38. Setting up my Raspberry PI as a spare download machine
  39. How I clean up my music collection
  40. How I clean up my movie collection
  41. Gaming Tool designs
  42. Diary of updating Trish’s Computer and coverting her to Linux
  43. Interviews with Trish about starting to use Linux
  44. Commenting about anyone else I convert to Linux
  45. Personal health
  46. My computer workflow
  47. Why I hate Windows
  48. Comments/Reviews on various VLogs and web tutorials I see
  49. Life Lessons
  50. What I’ve learned this week

So there are the 50 things that my ‘initial’ blog post ideas contain…  As suggested, once I use some of these up, I’ll have come up with more ideas (and some of these are generic topics themselves, which can generate multiple blog posts as well)

Next time I’ll talk about my experiences with the Learning Python course I took and review it (as I’ve finished it) and how I’m currently implementing what I’ve learned!