Saturday, July 9, 2022

off the cuff: Rogers outage 2022-07-08

 The outage that rocked Canada on 2022-07-08 and removed Rogers from the internet for the majority of a day, isn't over. There will surely be more information published, a root-cause analysis, and further inquiry as to how this happened, I am positive it will happen.  However, I don't want to speculate too much about why this all happened, I want to talk about emergency preparedness.


For those coming into this blind (maybe you're not Canadian), one of the affects of the outage was that the national debit system, Interac, went completely offline.  It didn't matter if you had a different service provider, if you tried to use debit, things simply would not work.  Another affect was the 911 outage for all Rogers connected devices, which I'd also like to speak about.


On Interac and the state of the network: This should not have been allowed to happen. Period.  This is a massive failure by Interac in providing services.  From what I've heard (some of this is third party information, so don't shoot me), they did have multiple providers, however, their primary provider was Rogers, and their backup provider was reliant on Rogers to work.  This may have been that Rogers was providing transit for the backup, or it may have been that they provided last-mile connectivity for them, I don't know; but issues like this should be obvious to the technical staff at Interac.  This service has become critical financial infrastructure, connecting ATMs together (even between banks), and companies to banks for payment processing.

My opinion is that Interac should look at their providers (banks) and clients (businesses) and see what the majority of people are using for their ISP, and directly link into those networks through internet exchanges.  This way, as long as the bank/company has an internet connection at their premise, into the providers network, they should also have a connection into interac's network.  I am certain that a non-trivial number of their providers are using Bell or Telus, probably both.  So Interac having no presence into either of those networks is a critical failing of their networking, and (again, in my opinion) needs to be addressed.  Failures like this should not be allowed to happen.

On the other side, I am fairly certain most of their providers (the banks) are already redundantly linked into multiple ISP networks.  however, on their client end, the businesses they serve need to take a long, hard look at their internet connectivity and decide if they should only have one ISP, or if they should have a failover ISP just in case.

To anyone looking to do failover - do some research.  Keep in mind that anything delivered by coax (TV cables) are probably all supported at the last mile by the same company.  So getting a "cable" internet package from a third party, such as Teksavvy, and a backup connection from the local cable provider, will result in the same vulnerability.  Generally, Cable and DSL are provided by different companies, so look to move to something that is DSL.  Teksavvy is a good example here, since they can typically provide cable or DSL connections to the same location.  There's plenty of these wholesale connection resellers around; Teksavvy, start.ca, VMEDIA, Distributel.... that's just for Ontario.  Look for local ISPs and get a DSL/Cable package (whatever is in opposition to what you currently have).  Combining that with a fairly inexpensive dual-wan router, and you're all set.

I'd like to point out that the advice above isn't new, it's just more relevant right now to get it out there.  Anyone running a business where debit/credit transactions comprise more than 40% or 50% of your revenue, should be thinking of redundancies in that system and how it connects to the internet, since, if it goes down, all that revenue disappears.  Sure, some people will go to the trouble of going to an ATM, getting cash, and returning for their desired product, but it's also quite likely that the customer will simply find a competitors store where debit/credit is not affected, and make their purchase there.  If the costs of a second internet connection and a fairly basic internet failover device is less than the profits lost from a day of not having debit, this pays for itself the first time you have a failure; since most internet outages, especially in the last mile, take upwards of a day, usually more, to fix.

Banks already know this and will very likely be doing this already.

Short story about Interac being down: I was at work yesterday, and for those that don't know me personally - I do not carry cash.  I didn't have a lunch, I planned on buying something during my allotted lunch time by the usual methods at a nearby shopping area.  Typically, that means debit.  So with no food in hand, I was left with very little choices, knowing that Interac was down, I tried talking to the shop, asking if they took credit (which should not be affected by the outage) and they did not - it's likely that the restaurants ISP was rogers. in any case, there's a bank branch nearby for my bank, so I went specifically to their ATM to get some cash - because I know that almost everything else is going to rely on the interac network to function.  So I'm waiting in line for the ATM, and noted several people in line that did not have a debit card for the bank in question. they had some competing bank.  Every single person who tried to withdraw money using a competing bank's card in my bank's ATM was met with failure.  A few tried several times, holding up the line.  The irony was that I work in an area where there are no less than three banks within a 5 minute walk.  Several of those people who failed at taking out money, were less than 150m from a bank where they would have been met with success.

The moral of this story is that when things fail, you have to start treating entities like banks, as islands unto themselves.  If it's not your bank, it won't work. period.

I was able to get cash and feed myself, so that was helpful, for me at least. not sure what happened to all those cross bank ATM users.  I don't like to carry cash, and the change I got when I finished my transaction now lives in my house.  I have a collection of money here that I never use at this point.  If I had known about the issue before leaving for work, I surely would have simply taken some money with me.


Regardless.  The people I feel most for are residential users.  Apart from paying far too much for redundant internet, what can be done here?  not much.  Personally, I have a DSL line from a Bell wholesaler, and a cellular line from Rogers.  Even that small of a diversification can keep you connected through a crisis like this.  If Rogers goes down, I have WiFi at my house. If Bell has a similar failure, I can at least connect with people via my phone.  This is a difference that most people have not even considered until this outage.  I've seen several people now moving to switch providers because of this issue, I would recommend a moment of pause before you rashly transfer all of your services onto a different provider.  This type of outage isn't impossible for any provider, and "putting all your eggs in one basket" will make you vulnerable to the same problems.  Diversify.  At the very least, go out and get a pay as you go SIM card from a different provider (make sure they're using different antennas) for your cellphone, and a pre-paid voucher for airtime on that provider.  Just put it somewhere it will be available to you all the time.  Maybe that's your wallet, or your car.  You don't have to activate it until you need it, just have it in case something goes completely sideways.  At least you'll have access to cellular calling, texting and data, in the event of an emergency.  This will rely on having an unlocked device, you can verify if your device is unlocked or not by simply inserting the SIM card from the other provider, you may get an error that it's unable to be used in your device.  If so, talk to your cellular provider and have them unlock your device, something that the CRTC mandated that they must be able to do for their devices.  It's a bit of a pain in the butt, but once you have your device unlocked, you're all set to go.

One thing I've done is that I signed up with a third party for my main telephone number.  In my case, it's VoIP.ms.  You don't really need to know a lot about VoIP or how to use it, to make this functional for you; it does cost a bit more money, since you're getting billed twice for the "same" line, but protects you in other ways.  What I did was ported my cellular number over to VoIP.ms, and set it up in the system to forward all calls to my cellphone number, which was a new reassigned number, since my old number was moved to VoIP.ms.  I still get all my calls.  The downside to this is that when you text someone, it will show up as your new number, but calls will complete to whatever number you forward to via VoIP.ms's website.  Coupling this with the backup, pay as you go SIM card, you can effectively redirect all of your calls to the pay as you go number (whatever they assign you) in an instant via VoIP.ms's online website.  You won't miss any calls, and everything will simply work.  It's something that can be completed in minutes and ensures that anyone needing to call you can get through.

That solution isn't right for everyone, and VoIP.ms does charge monthly for service, so it's an additional spend that people may not want to undertake. Also, you add a point of failure, that if VoIP.ms is down in any way for any length of time, you may miss calls.  So it's a risk either way.

Unfortunately, in my situation, I wasn't prepared, I did not have a pay as you go SIM waiting in the wings, though I considered purchasing one midway through the outage (though, how would I pay for it? heh).


For preparedness, I will be pursuing my ham radio license later this year, since simplex radio doesn't require any fancy cell towers to operate.  There's also AREDN, which is a wireless data network run by ham radio operators, which can function as emergency communication.  There's a lot to know in this arena and I won't clog up this post with information about it, but those are viable options for continuing to be able to communicate while things are non-functional, like what happened yesterday.  Luckily, I was in places I knew, around people I knew, and I didn't need google maps to show me where to go, and I didn't have any sort of emergency that required me to have communication.  I'm sure most were in the same boat, we survived the outage without incident (this time) and came out the other side no worse for wear.  My concern is, what happens when that's not the case?

Take care of yourselves, be prepared for things exactly like this.  If anyone wants to partake in the ham radio course with me, and get certified, please reach out, I'd love to have some nearby ham friends I can QRP with.  Cheers.