Regular Expressions, Part III (Phone Number – basic)

As I promised, here is another discussion about regular expressions, and more specific, phone numbers. The first part of this, I’m going to discuss the HackerRank Python Regex problem on validating phone numbers with regular expressions, but as its a very simplistic case, I’m going to expand on it.

Because, it’s pretty simplistic, I’m simply going to show you the code for the Hackerrank problem.

^[789]\d{9}$

Since it’s so simplistic, I figure I’ll just go over the whole thing. First, notice the ^ and $ anchors. Again, these refer to the fact that everything is occurring on the evaluation. I’ll explain this better last…

The next piece is [789]. This simply tells us that we are looking for a 7, 8, or 9 as the first digit.

Next is \d{9} which is 2 parts, the first \d meaning we are looking for any digit, and the second {9} meaning we are looking for exactly 9 of them.

All that combined, along with the anchors I mentioned before, means that we are searching the input we are using regex on for 10 numbers, starting with a 7, 8, or 9. Nothing else will generate a match.

Now, suppose we didn’t care about starting with a 7, 8, or 9. The first thing we would do is drop the [789] and change {9} to {10}. Simple enough!

However, lets make this a bit more realistic. This makes no allusion to formatting… it doesn’t quantify the real formatting for a North American phone number. So let’s sat that formatting (which I’m simply going to refer to North American Phone Numbers as phone numbers).

A little education for us here. Phone numbers are divided into 3 sections. The first 3 digits are referred to as the area code. The next 3 digits are referred to as the Central Office code, frequently called the exchange, and then the last 4 digits are the subscriber number. Now there are some rules for the area code and the exchange.

For both the area code and exchange, the first digit must be between 2 and 9 (thus no 0’s or 1’s). The next 2 digits can be anything, however, for the exchange, they can’t be 1 and 1 (as those are for special services such as 911 or 411, etc). For simplicity, I’m going to leave it at that, and not worry about the extra formatting or extensions and such… this is more for a basic understanding…

We know we will use our anchors still, so ^$ go in immediately. Also, the last 4 digits are simple as well. I’m going to use parenthesis in the regex to simply group everything, so we know which sections are for which. We know that \d{4} will work for our last for digits, so we’ll add those.

So we currently have ^(\d{4})$. Now, the area code is pretty simple. It’s 3 digits, but can’t be 0 or 1, so that means [2-9]\d{2}, so we’ll add that in the front, giving us ^([2-9]\d{2})(\d{4})$. Now comes the hard part.

The exchange is a bit more difficult, because it can’t be #11. It can be #1#, or ##1, so it has to be broken down into multiple pieces.
The first piece is if the 2nd digit is a 1. That would be [2-9]1[02-9]. Remember the first digit can’t be 0 or 1 (always, from the above rule). We already stated that in this piece, the second digit is 1, which means the third piece can’t be 1, thus the [02-9].
Now the next piece will be the 3rd digit being 1. The nice thing is all the other rules apply, so all we need to do is switch the 2nd and 3rd numbers, giving us [2-9][02-9]1.
Now the last piece is the one that makes logical sense. If we’ve covered the case where the 2nd digit is 1, and the case where the 3rd digit is 1, then all we need to do is cover the case where neither digit is 1. We already have that rule [02-9], so all we do is put that in both spots, giving us [2-9][02-9][02-9]. There are all 3 pieces. Now we just put them together.

If you remember back in the Roman Numerals post, we talked about alternation, or the pipe |. That is basically a way to say this or that, and would look like this|that. All we have to do is combine the 3 pieces, putting that alternation between each piece, giving us [2-9]1[02-9]|[2-9][02-9]1|[2-9][02-9][02-9].

Which gives us our final result of ^([2-9]\d{2})([2-9]1[02-9]|[2-9][02-9]1|[2-9][02-9][02-9])(\d{4})$. All done.

For the record, I tested the above on regexr.com, adding the following phone numbers: 5123456789 and 5123114567. It only matched the first one! Yay!

Until next time!

Never B Flat, Sometimes B Sharp, Always B Natural

Regular Expressions, Part II (Roman Numerals)

So, lately with HackerRank, I’ve been going over their regex section in python. I thought I’d review some of what I’ve done, for my own sake to review later on down the road!

Here are some of the regex problems I’ve solved… For the record, I’ve modified the original post of this to what it is now – before, I just listed the regex expression, and what the whole expression did… Now, however, I’m explaining what each bit does for my own understanding! I’m also reducing how many I do per post so there will be more posts about it as I go into a deeper detail for them…

The one that I will be discussing today is a regular expression to validate roman numerals.

For roman numerals, the expression is:
^M{0,3}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$

We start of with ^ which is called an anchor and it is used to say we are starting from the beginning of the string we are comparing…

Next is M{0,3} which tells us we are expecting the letter M between 0 and 3 times. Which covers any thousands digit.

The next piece is (CM|CD|D?C{0,3}) and it involves our first grouping. The parenthesis defines our grouping which can be used to later to pull out this specific information. We next have letters between pipes (|), which is called alternation. It’s saying we are next looking for CM or CD or D?C, which the question mark here means that the D is optional – meaning we are looking for C alone, or DC together, and that is followed by {0,3} which, like before, means we are looking for 0-3 of… so that whole grouping says we are for CM (900), CD (400), or D (500) with up to 3 C’s… which covers hundreds portion of any number over 99.

The piece after this is (XC|XL|L?X{0,3}) and it is also a grouping (which again, we can pull from later). This section covers our 10’s digit, and just like the last grouping, is looking for XC (90) or XL (40) or L (50) and/or up to 3 X’s (10 each).

The last piece is (IX|IV|V?I{0,3})$ and covers our 1’s digit. It’s also grouped so we can pull out this information later. Just like the prior 2 groups, it’s looking for IX (9) or IV (4) or V (5) and/or up to 3 I’s (1 each). The last piece, the $, is referring to the end of the line.

So if we put this all together, we are saying in the line we pass in, if there is a character that isn’t an M, D, C, L, X or I, this will fail. Also, if proper characters aren’t in the proper order, this will fail, and if there is anything after the lowest number, it will fail.

All in all, a pretty neat expression!

Next time, I’ll go over phone numbers.

Never B Flat, Sometimes B Sharp, Always B Natural

JACU II

Once again, it’s another installment of JACU (or Just Another Computing Update)!

HackerRank: I currently have 1577.91 points, and am ranked 793. That means I’ve earned 570 points, and risen 698 ranks! Not bad. I’m almost through regex in python (of which there will be a couple of posts for)

CodeWars: I’m currently at 5kyu with 154 honor points, and am ranked # 10923 [93%]. I’ve increased 1 kyu, and 63 honor, as well as risen 8374 ranks (and 10%). I’m going to work on these a bit more as well.

CheckIO: I’m currently level 9, with 543/750. I completed the last 2 puzzles in Home, as well as finished Elementary, and have started on O’Reilly. There are some puzzles that are locked, since I’m not a paying member… I’ll think about remedying that…

CodeEval: I’m currently at 70 points – but haven’t completed the challenge, as they don’t tell you what is wrong, so I’m not sure where the bug is in my submission (I get the correct answer on my system), so I’ve sort of stalled on this one… I’ll be coming back to it soon.

Project Euler: Using HackerRank’s Project Euler contest, I’m ranked 1307th/48187, so that’s a good increase of 1608 ranks, and about 5000 more contestants joined, so I’m going strong here. I’m currently working on Project #23 (which has been solved on the PE site, just having an issue solving 1 test case, and not getting any help on the details…

So that is where I’m at! I will be blogging about my daily programmer exercises, once they are closed!

The More I Drink – Blake Shelton
Never B Flat, Sometimes B Sharp, Always B Natural

Call Me… I mean… Collatz…

So – I was working on a Project Euler problem (#14 specifically), that deals with Collatz numbers. For those that don’t know, a Collatz number is a sequence by which you get a number to 1. If the number is even, you divide by 2, if its odd, you multiply by 3, and add 1, and the number is how many steps it takes to get to 1. So for example, 13, has a Collatz number of 10, as it takes 10 steps to get to 1 ( 13 > (3*13 +1) 40 > 20 > 10 > 5 > (3*5+1) 16 > 8 > 4 > 2 > 1). Naturally, as you get bigger numbers, it takes more steps to calculate the number of steps…

Now, not only do I do this on Project Euler, I also do this on the HackerRank site… Because while the problem is basically the same, they always throw a twist in… Using this example, Project Euler wants to know what number below 1 million has the most steps to get to 1… HackerRank on the other hand will have anywhere from 3 – 15 ‘test’ cases you need to solve, that follow certain rules, and your program only has so much time to run to pass. So for this problem, a single test case will contain anywhere from 1 to 10000 integers, that range anywhere from 1 to 5 million, and the program has 10 seconds to figure out what the maximum collatz sequence is for any given number. As an example, if the number give was 10, you would have to respond with 9 (since the number 9 has the most collatz sequences of any number between 1 and 10), and then if you are given 15, you would have to respond with 9 (since again, it has the most sequences to get to 1 between 1 and 15) and then you are given 20, and have to respond with 19 (technically 18 and 19 both have the same number of steps (22) to get to 1 – however the constraint dictates that if there are multiple ‘max’ paths, take the higher number – thus 19)… So this is what a simple test case of 3 numbers is… now handle 15 test cases, each with a different set of up to 10000 integers, ranging from 1 to 5 million… and do it in 10 seconds…

So that was the task…

So first – we realize we want to use memoization… What’s that you ask? In simple words, its caching calculations so you don’t have to do them again… Using our 13 example from above, what if, when we calculated 13, we stored each step… so that when 13 was done, we knew that 13 takes 10 steps, but we also know that 40 takes 9 steps, and 20 takes 8 steps, and 10 takes 7 steps, and 5 takes 6 steps, and 16 takes 5, 8 takes 4, 4 takes 3, 2 takes 2, and 1 is 1 step. By keeping this cached, we then are able to save lots of steps. For example, if we get to the number 80, we start our function, and step 1 takes us to 40… now instead of doing the other 9 steps, we look in our cache and see that 40 takes 9 steps, so then we can say in 2 steps, that 80 takes 10 steps… So by doing that and storing the values, we can then simply look up a given number and see how many steps it takes once they are all calculated…

So I created a recursive function that checks our cache, and if our number exists, it returned the # of steps, otherwise, it return the number of steps of either the even calculation or the odd calculation, depend on whether the number was even or odd… so I create that, and add the function for reading the input sequence, and let her rip… My little program goes through the first 2 test cases… and then starts to time out… because she is taking longer than 10 seconds to go through all of the thousands of possible numbers… hmm….

So then I think, well since I’m already using memoization, what if I prebuild all of the values from 1 to 5 million, so then all I need to do is simply look up the max value for a given number…

Except building the initial list from 1 to 5 million takes 13 seconds (on my computer)… hmm…

So now its time to start thinking outside the box…

So first, I think, well, if I start at 5 million and work my way down, lots of things will have been created by the time I got to them (as an example, when 5 million process, it process 2.5 million, which in turn processes 1.25 million, which in turn processes 625000, which processes 312500, etc, etc… So I change my prebuild to go from 5 million to 1 down… This actually knocks 2 seconds of my prebuild…

So… more efficiency… I think and realize that at some point building downward, I’ll reach a number that afterwards, every number below it has already been processed… So I add some variables to my Collatz function, and after processing, it tells me that moving downwards, 125001 is the last uncached number… so now, I only have to build down to 125001, thus saving me 125000 function calls, plus all of those recursive function calls…

However, this only knocked about 2 seconds off as well… so I’m sitting here at 8+ seconds… failing my test cases because it still takes, even with the cache build of 8 seconds, 14 seconds to process all 5 million numbers… So now what…

I start timing individual numbers… how long does it take to calc the 3 million? So I run that in quarter million increments to see how much these higher numbers take, and then noticed something… 3.75, 4, 4.25, 4.5, 4.75, and 5 million all return the same maximum… 3732423 at 579 steps. And then the light bulb goes off… If I ‘hardcode’ a list with that value, I don’t need to test any value above 3732423, because that is the max… that will save 1.27 million function calls + all of the additional recursive calls… So I write a function that sees if the value is greater than or equal to my list value, and if so, report the list value, otherwise, do the cacluations…

That one step shaved literally a second of my test time…

So, to make a long store short (too late), I ended up hand calculating (using my function), the top 50 #s… to understand how fast that made the program, when you see that #50 is the number 97, you realize that for all 4999903 numbers above 97 that could be in the list of integers, there are only 50 possible answers for the maximum collatz… which of course, means my program ran in under 1 second for all 10000 entries, no matter what they were…

So there, that is the story of how I didn’t use a function in the normal way, to gain some major efficiency in a program! If you want to see my program, you can see it on my github page here

Call Me – Blondie
My Way – Frank Sinatra
Never B Flat, Sometimes B Sharp, Always B Natural

J.A.C.U.

So – I’m going to name these Program Testing updates (or coding updates) JACU – for Just Another Coding Update… in honor of Blue Thunder’s JAFO… {google it}

As I start participating in any other sites, I’ll be logging information here… I’ve done reddit’s daily programming once or twice, and I might be signing up for stack exchange’s Code Golf soon… but I’ll have to review it as it seems like something I might not be interested in…

But for now… I digress…

So, we’ll just go down the list… Since my last update, here is where I stand…

    HackerRank:

  • I’m in the 4th Percentile in Algorithm’s – Haven’t done any of these since I completed the initial trek – focusing on…
  • Python: 1007.91 points (boosted by doing 1 partial contest), with a rank of 1491… I’ve completed Introduction, Basic Data Types, Strings, Sets, Math, Itertools, & Collections
  • Week of Code #23: I took part, and did some, but it was a bit beyond me… but still placed 2642nd/10494 [or top 25%]… I’ll probably try a few of their simpler contests to get a better feel, then do more of these!
    CodeWars:

  • 6 kyu (still) – although I’m about 90% finished with level 6
  • 91 Honor points
  • Honor Position # 19297 [83%]
    CheckIO.org:

  • Level 6: 177/235
  • Home: 110 pts – 53% completed… The last 2 are moderate and challenging rated, and moderate is re-writing min/max functions… for me, that is a couple of hours spent on this 1 test… so I’ll wait until I feel more comfortable with args. However, I’ve unlocked O’Reilly, Scientific, Elementary, & Electronic Station.
  • Elementary: 60 pts – 50% completed
    CodeEval:

  • I literally just confirmed my account – so I don’t have anything here yet… will start either later today or tomorrow!
    Project Euler:

  • I’ve finished the first 13 puzzles
  • using HackerRank’s ProjectEuler contest, I’m ranked 2925th/43394

So there is my progress since I last checked in with my coding challenges!

Until next time!

Eye of the Tiger – Survivor
Never B Flat, Sometimes B Sharp, Always B Natural

Prim and Proper…

So HackerRank ‘tricked’ me with an email, and I ended up gettung into their algorithm section before I felt I was ready for it. (Keep in mind, this was not a mean trick, or something dishonest or shady on their part – they just sent an email that said ‘You should try our Algorithm domain’, and like a dummy, I did.)

So to get the first badge under Algorithms, I had to complete something on graph theory (which I haven’t gone over since high school geometry – 30-ish years ago), and in a new language (python)… but I’m nothing if not dedicated, so I take the plunge and give it a try…

Here is the problem I had to work on:

Given a graph which consists of several edges connecting the nodes in it. It is required to find a subgraph of the given graph with the following properties:

  • The subgraph contains all the nodes present in the original graph.
  • The subgraph is of minimum overall weight (sum of all edges) among all such subgraphs.
  • It is also required that there is exactly one, exclusive path between any two nodes of the subgraph.

One specific node is fixed as the starting point of finding the subgraph. Find the total weight of such a subgraph (sum of all edges in the subgraph)

So, given that, I came up with a basic solution… First Solution and the thing about this is that it worked. The catch was for a test case that had 10000 vertices, it took almost 3 minutes to run, so the test case timed out, and it failed me!! 🙁

So, cracking my knuckles, and rising to the challenge, I employed my google-fu and began to track down some more efficient solutions. Since this discussed Prim in the title, I researched Prim and MST’s and came up with this algorithm to do this calculation, and after some more searching found a python version. After reviewing, I noticed that it wasn’t all that different from my solution, just that it used some python imports that I didn’t know about (mainly heapq and using dictionaries from collections) – apparently using the dictionary stuff from collections speeds that process up, and heapq speeds vertex jumping up significantly… my new solution basically did the same thing, but much more efficiently. Thus solving the problem in a few seconds.

What I take from this is that my original solution was right (and it worked, just not as efficient), and the only real changes the new solution made were to use python modules that were basically unknown to me to make what I was doing more efficient.

So all in all, I’m happy with the results.

I got me 3rd Percentile algorithm (which I’ve since increased to the 4th Percentile) – so I’ve got a ways to go in the Algorithms domain. But I’m going back to the python domain to finish it before i wander off into any of the other domains!

Under Pressure – David Bowie & Queen
Never B Flat, Sometimes B Sharp, Always B Natural

Programming Update

So as I stated in an earlier post, I’ve been taking the edX MITx 6.00.1x Intro to Computer Science and Programming w/ Python to aid in brushing up on modern computer science (I last attended school in the early 1990’s). At the moment, I’m just ‘auditing’ the course, but I think I’ll end up paying to get the certificate of verification…

So, during week 2, I made a mistake when turning in one of my assignments… I rushed to get it done, and did it in my IDE (PyCharm) so that I could test my results and such, I just copied my coded answer from PyCharm to the site’s code input… But because I was doing it in PyCharm, and rushing, I turned it in without any comments… shame on me! As I was peer reviewing others code – I felt stupid for rushing something that didn’t need rushed and not even leaving a basic comment… But, hopefully, I turn this into a learning process to slow down and comment…

I think part of it is I’m also using HackerRank and CodeWars and ProjectEuler to practice python with… and of course, none of the answers on these sites involve comments, so I tend to not use them…

I’m a little disappointed with myself regarding this, and I think I need to adjust my style regarding that… But it is a learning experience…

I’m also going to start listing my ‘scores’ with these sites ‘challenge’ sites, as I’ll have a way to measure my progress – and be publicly accountable (to anyone who actually reads)…

So, as of September 9th:

  • HackerRank: I have 375 points, 367 Hackos, and am ranked # 5531
  • CodeWars: I have 50 Honor, am at level 6 kyu (a decreasing scale – so lower is better), and have an Honor Position of # 28252
  • Project Euler: No ‘ranking or scoring’ system, I’ve completed problem #1 (I’ve been spending my time on HackerRank and CodeWars, as well as Reddit’s DailyProgrammer subreddit)
  • Reddit’s Daily Programmer – also non-ranking/scoring. I will be posting my results to my GitHub page. I’ve currently done challenge #281.

So again – I do this so I can update myself and see how I’m progressing!!! I’m going to work towards doing at least one item per week for each site, if not more (I might try at least 1 per day on some of them)…

But for now, that should cover it!

Play That Funky Music – Wild Cherry
Never B Flat, Sometimes B Sharp, Always B Natural

Really, I can have any() thing?!?

So, I’ve been using HackerRank, working my way up (at this writing, I’ve got 205 points in Python [309 Hackos points] and I’m ranked 10505), and one of the python items I just finished doing is on string validators…

These let you test if a string is just alpha, alphanumeric, just digits, is upper or is lower case…

But I digress…

The point of the exercise was given a string, determine if any of it was each of these (alpha, alphanumeric, digit, upper or lower), and print if each was true or not.

This is where, looking around online, I discovered the python any() command…

I was able to take this:

results = [False, False, False, False, False]
S = input().strip()
for c in S:
    if c.isalnum():
        results[0] = True
    if c.isalpha():
        results[1] = True
    if c.isdigit():
        results[2] = True
    if c.islower():
        results[3] = True
    if c.isupper():
        results[4] = True
print(*results, sep='\n')

which I thought was a decent solution prior to my discovery, into this:

S = input().strip()
print(any(s.isalnum() for s in S))
print(any(s.isalpha() for s in S))
print(any(s.isdigit() for s in S))
print(any(s.islower() for s in S))
print(any(s.isupper() for s in S))

which I thought was pretty useful… I’m liking python more and more every day I use it!

Anyway You Want It – Journey
Never B Flat, Sometimes B Sharp, Always B Natural

Yay!! edX started… wait, What?!?

So the edX Python course started 14 hours late…

Anyways, I’m going through it, and one of the first main loop concepts talks about while loops… And the 3rd programming exercise on while loops says that they provide the variable end, use a while loop to calculate the sum of all the integers leading up to end (meaning, if end = 6, we expect 21 [1 + 2 + 3 + 4 + 5 + 6])…

Well, of course I already know a better way to do this… I just blogged about this particular thing a few days ago when talking about Project Euler #1 so I went to use that answer, and of course, it didn’t like it (it was looking for a while statement), so I cheated and wrote:

while True:
    break;

print(end * (end + 1) / 2)

which solved it…

I posted on the site as well that I was disappointed with that particular exercise, as I would think that MIT would actually have an exercise that taught you the proper way to solve the problem (meaning there are lots of good valid reasons to use a while statement, as the prior exercises had done, I felt that they shouldn’t use one that had a better solution than the one they wanted)…

But hey… onwords and upwards!

Home Sweet Home – Motley Crue
Never B Flat, Sometimes B Sharp, Always B Natural

edX, CodeWars, and CriticalRoleRenamer

So, I was supposed to be starting a Python course through edX, but apparently it’s starting late (was supposed to start about 12 hours ago, so I was going to give my first thoughts after reading the syllabus and such, but alas)… so that sounds like something I’ll have to do in the future…

Yesterday I started with CodeWars and completed 11 exercises (codewars calls them Katas), although 2 of them were simple parts of the sign up process. I’m going to try to do at least 2-3 a week, when not focusing on my own projects.

Speaking of my own projects, over the weekend, I wrote my first Python program for personal use. As I’ve stated before, I’m a gamer, and I’ve been trying to watch this show called Critical Role, which involves a group of voice actors playing Dungeons & Dragons. It’s actually a good show to see the game in action… But I don’t have 3 hours an episode to sit their and watch… but what I do have is that time when I’m walking, and driving to work, etc…

So I used a program called ClipGrab and downloaded the episodes from YouTube, and converted them to mp3 files, and then put them on my phone, so that my podcast player (Pocket Casts) would see them. Now I can listen (as admittedly, there isn’t a great need to ‘watch’ the program) while I’m exercising, or driving, etc. Plus Pocket Cast lets me increase the speed and jumps sound pauses (silence in the conversation), so that 3 hour episode only takes me about 2 hours…

But I digress…

When being downloaded and converted, the youtube naming convention (which admittedly is probably just the name the uploader used) isn’t the most consistent, or friendly, especially when you have things like Plex or Kodi for media serving content… so I wrote a python program that grabbed the episode number and title from the file name, and renamed the file… Taking it from, for example, “Enter Vasselheim – Critical Role RPG Show: Episode 16.mp3” to “Critical Role – S01E16 – Enter Vasselheim.mp3”, which is a common naming convention. The neat thing was, by using regex (again, another thing I’m learning), I was able to grab the title, and then episode number, but with code, helped normalize them… because sometimes they used a pipe (|) instead of a hyphen to separate, sometimes they had the colon after show, and sometimes not, and sometimes they had the show title uppercased… But I was able to rename the whole thing (which literally took less than a second of time to run) so that the episode title was standard Capital Case, and based on thetvdb.com, had the season numbers properly meshed out…

While its a simple program, all in all, i’m rather proud of it… My next step will be using it for other shows I listen to! You can see it in its current form on my GitHub page!

Rocky Mountain Way – Joe Walsh
Never B Flat, Sometimes B Sharp, Always B Natural