Saturday, November 11, 2023

WiFi and the smart home.

 Hello again everybody. If you've ever used any smart home products then you know how much these products can enhance your life, if not, then I'll include here a quick primer on some of the home automations I have used and how it has enhanced my life.  After that I want to discuss how these things impact my network and the decisions I've made and some of the struggles along the way.

For the uninitiated, smart home products can range from a simple smart speaker, such as an amazon alexa, or google home device. These are great for quickly checking the weather or asking quick questions to while you're otherwise busy, or have your hands full; as an example, I usually ask my google home about the weather while I'm getting ready in the morning, so I know if I need to reach for a sweater, or not, or if I need to get something warmer than a sweater, or grab and umbrella or anything.  I've also asked about the closing times of local businesses, or travel times to destinations before I depart.  Smart home gadgets don't stop there, and one of the most popular smart home things is smart lights.  Depending on what smart lights you buy and how you have them set up, you may even be able to ask your smart speaker to turn on the lights and it will do so.  In addition to this there's a wide variety of sensors available such as motion, presence, temperature and humidity, as well as others.  These can help you get a better understanding of your environment and what may be happening with it.  In some cases these can also trigger other smart gadgets, such as lights, so the lights come on when you enter a room.

These all communicate over a variety of protocols, you may have heard buzzwords like matter, or thread, as well as other keywords like zigbee or zwave; if you're looking at getting started on smart home stuff, then all of this can be a bit overwhelming.  I'll do my best to briefly demystify some of this.  Matter and Thread are both universal protocols designed to standardize how smart things communicate.  To my understanding of both, they regulate what messages each device uses, not how it communicates them, eg, they use the same words to convey standard commands and reports, but otherwise they can be speaking in whatever "language" to get that message to it's intended recipient.  The "language" analogy here is more about how it gets those commands and reports to it's intended destination, such as a smart hub or target device.  The language it uses is usually either WiFi, bluetooth, Zigbee, or Z-Wave.  These are physical communication protocols.  Thread and Matter communicate on top of WiFi/bluetooth/zigbee/zwave to get the message across.

The next thing to understand is that in the majority of cases, these are hub/spoke type networks, where everything is relayed through a central controller.  Unless you're using association groups (which is a topic unto itself), all communication must be processed by a hub.  You've probably heard the term "mesh" used in conjunction with zigbee/zwave, and while zigbee/zwave can operate as a communications mesh, it does not use the mesh for control - again, unless in association groups as previously mentioned.  This means that when you press a smarthome button, it relays that the button has been pressed to the controller over the wifi/bluetooth or the zwave/zigbee mesh, and the controller then has a programmed action that should be taken when that action is received; such as turning on a light, or playing some music.

Briefly, association groups can allow some devices, such as switches and buttons, to send commands directly to their associated lights or other smart devices, this is uncommon, so I won't touch on it too much as there's a bit more nuance and complexity with getting something like that configured and working; it's also not relevant to my point.

The key take-away from all of this is that some physical protocol (zigbee/zwave/wifi/bluetooth) communicates over some logical protocol (zigbee/zwave/IP/thread/matter) to do things.  You can almost always also control this over an app or website for your smarthome.  In the case of something like LIFX smart bulbs, the bulbs themselves connect over wifi to the cloud, and can be controlled with an app or website to toggle the lights on and off.  In this context, the hub is the cloud server that LIFX has provided for you to use with their products.  This is using the IP protocol to communicate commands.  The benefit here is that something cloud controlled like this ties in nicely to other cloud-based smart home devices, such as google home, which can accept a command like "turn on the living room lights" send it off to the LIFX cloud, which relays it to the lights to execute the "lights on" command.

There's plenty of clever engineering and technology that exists behind the scenes to make this work, and it's all very easy, but for many, this is a problem.

Many smart home enthusiasts want local control.  What this means is that the controller, and all associated components are located locally in your home.  There are some products that do this, however, none more widespread as home assistant.  Home Assistant is essentially a server software that you run on a system inside your home, that allows communication between it and all of your smart home hubs and devices.  It, in and of itself, is not a hub, but can communicate to hubs via USB/direct connection, or over IP. By directly connecting the hub either over your local IP network or via a USB device to home assistant, you can gain local control over all of your devices; some devices can also be added directly over IP (WiFi/Ethernet connection). This comes with it's own caveats. Most notably, you need a fast, reliable way to enable that communication and often that comes with a more expensive home network.  In the case of zigbee/zwave, you need additional devices to enable the home assistant server to communicate with them.

Ideally, you would want to pick a physical communication protocol and stick with it for the most part, since otherwise you need a flurry of dongles and adapters or hubs to communicate to all of your smarthome things.  So, as we lead into the point of this post, let's review each of those physical communication methods and go over some very brief pros and cons of each.

1. WiFi.  This is easy.  Everyone has wifi, so it's extremely simple to plug in wifi smart home gadgets and get them working.  This is why most common smart speakers are wifi.  WiFi will work for everyone.  The issue is that when you start to add a large number of WiFi devices, for example, when you start replacing your lightbulbs en masse with wifi smart bulbs, this can quickly and easily overwhelm your $100 best-buy wifi router; leading to inconsistent behavior, and sometimes loss of connectivity.  Unless you like rebooting your router multiple times a week, you'll probably need to upgrade your wifi to get enough coverage and reliability that you can proceed with replacing all of your non-smart home devices with smart ones.  The upside here is that it's readily available, and doesn't generally require anything additional to what you already have.  The downside is that when you get to the point of having dozens of devices, you may need to upgrade.  Additionally, if your WiFi goes out, then you're pretty stuck, and cannot control your smart home things (and can't really use the internet as well, so I hope you're good enough at networking to fix it without help from google).

2. zigbee.  I'll be listing zwave separately, since, while similar in function, how zigbee and zwave perform their functions is quite different.  The plus side here is that zigbee is an open standard; this can also be considered a downside, but I'll get into that in a minute.  Being an open standard, it is still a standard, and all zigbee devices need to be validated before they can be labelled as zigbee.  The open part of this standard is that anyone can make zigbee compatible microchips or devices with little to no oversight, apart from certification for the official zigbee logo at the end of the process.  This translates to a massive number of zigbee devices, rivalled only by the number of WiFi devices out there.  however, zigbee operates primarily on the 2.4Ghz ISM band, which is the same as WiFi and your microwave oven; this means there's a high probability of interference.  Interference can come in many forms, and there's no good way to determine that it's happening or that it was responsible for any problems you may have experienced... at least not without rather expensive equipment and specialized understanding of how to use that equipment. This can affect other protocols like with WiFi and bluetooth devices.  The plethora of devices here makes it a compelling option, and since it doesn't rely on WiFi to work, it won't overload your wifi router, and generally maintains its own mesh.

3. zwave.  This is very similar in most aspects to zigbee, however, it uses a 900mhz spectrum frequency instead, this keeps it away from WiFi, bluetooth, and zigbee.  This is however a closed standard.  Only one company, Silicon Labs, if I have my information correct, is allowed to produce z-wave standard chips.  It's important to note that z-wave can operate on 2.4Ghz, but it most often is running on the 900mhz band.  Due to regulatory differences, the actual operating frequency of zwave devices varies around the world, so devices are generally programmed for your local region, most notably, EU, US, etc. many of the regional areas involve more than one country, for example, the US region includes Canada and I believe mexico. EU is fairly self-explanatory, etc. however, devices are usually not region-locked, so buying a zwave device from an EU seller for use in the US usually only requires a device update file for the US, and the device will operate on the correct frequency.  Since z-wave is a more closed standard than zigbee, the range of devices is a bit more limited.  Very common/standard devices such as smart switches and smart lightbulbs have a good number of options, but other things like bulb-free smart light fixtures or some sensor arrangements are a bit more difficult to come by.  So there's a fairly large up-side when it comes to frequency use, since 900mhz is mostly not used in consumer electronics now (unlike 2.4Ghz) but due to the single-supplier chip source, it's usually a bit more rare to come by a zwave option of a smart home device that you like.

4. bluetooth.  Last but not least is bluetooth.  The uses for bluetooth in the smart home are usually fairly limited.  Usually these devices are WiFi, with bluetooth as an option, though some bluetooth only devices exist.  Usually getting these connected to a smart home hub such as home assistant is a bit more involved.  Anyone who has paired their cellphone to the stereo in their car will understand the issue; while sometimes this can be downright painless, other times it can be a nightmare of issues that you don't fully understand.  The connection can be unreliable at times and generally is very limited in range.  Due to this, most smart home device makers avoid bluetooth in general, but it is an option in some cases.  If you're unaware, bluetooth operates on the same 2.4Ghz spectrum, same as zigbee and WiFi; with all the caveats previously mentioned about that band.  It uses a clever frequency hopping technique that generally avoids ongoing connectivity problems, but as you may have experienced, it's not perfect... at times it's not even what I would consider to be good.  It's also relatively low power compared to other protocols, so range is more of an issue than with other options.

So that's the protocols.  "What's the point", you may be asking.  I've certainly taken a long enough time to get around to it.  As readers of my blog will already know, I'm a network technician/engineer with a strong understanding of WiFi and wireless stuff.  So the primary point here is to choose carefully. Almost everything "smarthome" is wireless.  There's very few options to have an ethernet or wired connection to a smart device.  It's far more rare than even bluetooth.  I've heard of ethernet connected lighting for businesses (sort of like smart lights, but over a wire, and getting power over ethernet), but no such device has been made available for consumers as of the time of writing; at least, none that I'm aware of - and if anyone is aware of such a device, please let me know.  Almost all smart home devices are some variant of WiFi, zigbee, zwave or bluetooth (in descending order of popularity).

Choosing WiFi sounds good on the surface before you start to install them, but you may find that as you install them, your WiFi isn't good enough to handle that many devices connected all at once, all the time; this may lead to very expensive upgrades to your WiFi system to try to support them.  Choosing Zigbee may sound great as an alternative, with all the options of devices that are available, and keeping them off the WiFi, but you may find that the 2.4Ghz spectrum in your area simply has too much interference for them to work reliably.  So then what? zwave? is that the right answer?  Well, maybe, it solves a lot of the interference problems of bluetooth, WiFi and zigbee, but adds another, you may not be able to get the devices you really want with a zwave option, so you may be limited in what sensors and other devices you're able to locate and buy to use with your zwave smarthome.

So if all the options are varying levels of bad, what's the solution?  We've reached my critical point. All options are varying levels of bad.  Having a smarthome is wonderful, you can turn off the lights after you've gotten comfortable in bed, you can check the outside temperature from anywhere.  You can do all sorts of interesting and amazing things with home automation, but it's important to pick the right devices and set your own expectations of how well they're going to operate.  Sometimes, they will fail, and that should be expected. Sometimes they'll go offline or stop functioning, and that should be expected.  Sometimes they'll get so frustrating that you'll want to throw them out of your window, and yeah, sometimes that too.

By no means am I trying to convince anyone not to buy into smart home stuff, all the benefits of the smart home devices I've tried, and used, have far outweighed the problems with it.  The day to day benefits of it are immense.  I'm not here to tell you what to buy, or which protocol to use.  Do what you like, but know, dear reader, that there are tradeoffs that the smart home makers are not going to tell you, and you should be prepared for that.  Right now, the smart home is a minefield of problematic things that operate on problematic frequencies that are already overcrowded as-is; and if it's not, the choices are so limited you may regret the decisions you make along the way.  No matter what, plan ahead.  Plan for when the smart home stuff stops working; have a backup to turn off the lights if they're stuck on, a way to reset the lights so they turn on when they won't otherwise illuminate.  Have options.

At the end of the day, I hope you're all healthy and happy and you can achieve all the goals you want in life.  Take care now.

Thursday, August 3, 2023

Off the cuff: Hear me, Internet.

 This post is for the non-networking folks as a word of caution above all else.  This is a very unplanned post and will likely be more of a rant.


The internet has quickly and easily become the foundation for most of our lives and social interactions.  Many companies are now delivering voice services over the same wire as the internet, to most locations, and certainly newly built areas.  Internet access has become the key tenant on which all other communication, whether personal or business, is done.  To that end, I would like to express how important it is to do things in a way that makes sense.

It's impossible for me to advise every person on what to do, and I hope these posts are finding their way to the people that need them.  While I might not be able to help every individual directly, I'm trying to. I'm on reddit and lemmy, and almost daily, I'm answering questions with the number one concern among all cases being how expensive a solution is.  Cost is not the thing you need to be concerned about here folks.  Reliability and longevity are vastly better arguments.  So many people get caught up on not spending over x dollars on their network that they completely miss the point of it all.

It seems to be a common misconception that a router is a router is a router.  The mindset being, if it provides you with WiFi internet, then nothing else really matters, just buy the newest WiFi number and find the cheapest option, and be done with it; but as many have found out the hard way, that's not how this works.  It's not how any of this works.  I cannot express the number of horror stories of people buying cheap, off the shelf, wireless routers, that do not serve their needs.  This is prevalent, and companies will sell you anything they can market in a way to make it enticing for you to buy, while cutting every corner, making the product little more than ewaste right from the factory.  This is a plague, and it's not getting better.  It's perpetrated by ISPs too, as so many are still handing out half-baked, garbage router/modems, not suitable for anybody except the smallest of connected homes.

When putting these devices up to any scrutiny, regardless of manufacturer, they fall apart quickly when you start loading them up with users.  I can hear it already, the chorus of people saying "but there's only x people in my family, I don't need much".... yes you do.  Sure, you may only have 2 people in your household (as an example), but each person has a cellphone, WiFi connected watch, tablet, laptop, and any number of various IoT things, including TV's smart speakers, lights, switches, buttons... even your damned washing machine and fridge has WiFi now.  So no, you're not "just 2 people" you're 2 people with a buttload of WiFi connected crap.

Once upon a time, you could easily head-count and estimate how much you need, but with everything that historically wasn't smart, becoming smart, and IoT stuff being a household staple, you're loaded to bear with random WiFi devices you don't give any thought to.  Yes, your apple TV counts.  Historically, one of my local ISPs used an infamous modem/router combo called the "2wire", aka model 2701HG-G.  This unit was INFAMOUSLY bad, yet most home-users at the time didn't notice.  The model has been decommissioned by everyone.  This was released at a time where IoT wasn't even a thing anyone said, it had not been defined yet.  The unit was fine for a couple of wireless devices, and one or two wired devices, but would fall over around the 8-10 user mark. More would just make it fall over faster.  Clients would experience slow or unresponsive connections, if they could connect at all. Moreover, this problem only seemed to get worse over time.  The longer the unit was powered on, the more likely it would be to fail.  This unit, and others like it, are why the first bit of advice from your ISP has always been to "restart the modem" for troubleshooting.  Rebooting the unit would generally fix the problems with it, for a while.

This is not a solution. It was so common, some of the first IoT class devices were literally smart plugs that would sit between your wall and the modem, so if the WiFi stopped working it would automatically disconnect the device from power, and reconnect it a few seconds later, to force it to restart.  This is a bad solution to a problem that shouldn't exist.  Simply due to the corner-cutting of these devices, it was a product that was needed.  These devices only made me grumpy.  Their mere existence flew in the face of what should be, and the fact that they were made, and even popular, angers me.  I'm not angry at the people who invented it, nor the people that used it, I'm angry at the ISPs and companies that allowed such ewaste to be sold as a product that requires this kind of fix from a third party to make it work reliably, in any form.

Modem/routers have gotten better. ISPs have stepped up their game for this but not nearly by enough.  Even the third-party consumer-focused routers haven't really gotten much better.  Sure, you can support many more clients than the 2wire could with newer modem/routers, but by no means are they any comparison for a proper solution.... Historically, routers didn't have WiFi.  Nothing built in anyways.  These all-in-one routers, especially cheap AIOs are still garbage, they're just better garbage.  They're still ewaste in my eyes.  Most modem/routers cannot do what they would imply on paper.  They don't have the horsepower, and the programming is usually half-baked.  They have memory leaks, which is what I suspect was the problem with the 2wire, cheap hardware, they're underpowered, and under-spec'd; all of this so that the ISP can save a buck providing you with something they try to convince you is "good enough".  I've long given up on ISPs providing any devices that are of any value.

The good news is that most modem/routers can operate in "bridged mode", which basically disables the router functions, and limits them to modem operations only - aka, converting from whatever connection the ISP uses, whether fiber, DSL, or cable (DOCSIS), to ethernet.  In this mode, most ISP provided equipment is very sufficient, with more than enough power and capability to do that job; it is honestly the only saving grace of the uplift in equipment that ISPs have made.  Beyond bridged mode operation, these units are effectively paperweights, unable to do the basic tasks required of the modern home network.  I expect this will not change, at least, not anytime soon.

Moving on from that, even if you're keen enough to buy your own router, and put the modem into bridged mode, you're probably still using hot garbage for your network.  People tend to buy cheap, off the shelf (of their local best buy or something), devices which are marginally better, but still closer to ewaste than a useful piece of equipment.  Cheap network device manufacturers know this, and they know, for consumers, the key requirement is cost.  Because of this, most inexpensive routers are in the same boat; this boat has fewer holes in it, but it's still sinking.  Most will balk at consumer wireless routers that are more than $100-200.  I understand it, you don't want to waste your hard-earned money, but by buying cheap, that's exactly what you're doing.  The under $100 market is rife with old, outdated, and otherwise underpowered devices.  Many won't stand up to the requirements of 1Gbps internet, nevermind anything faster, and as internet service gets faster and faster, you're going to be left very far behind in terms of capability and performance, dramatically shortening the time your bargain router is going to last.

WiFi is getting faster and faster, with some pretty serious caveats which are slowly going to ruin the airspace in populated areas, the solution so far has been to absorb more and more airspace with WiFi to meet the needs for speed and diversity, which is horrible; but a whole other matter.  Fact is, the right way to fix the problem is to have many low powered wireless access points strewn around your home.  This reduces the overall impact and footprint of the wireless networks, avoiding conflicts to neighbors networks; unfortunately, everyone would need to be on-board with this to make it viable, and since woody next door doesn't even want to talk to you without accusing you of something, or yelling, it would be near impossible to get everyone on-board with changing how you do WiFi so it can be better for everyone.

The key thing holding this back as a future, is the fact that near-zero homes, built before the last 10 years, had any ethernet built into the structure.  There's a non-trivial number of homes still being built without it, and even those that come with ethernet, are not getting the right ethernet.  Most new-build homes that get ethernet have a handful of ethernet placed into walls around the home, nothing in the ceiling.  Most access points worth their salt, are ceiling mounted.  So where the heck are the APs supposed to go?  Some companies have tried to bridge this gap by making wall-mounted APs, this is a good step, but builders need to get with the times.  Ceiling ethernet for APs is the most important thing to have for modern builds, allowing common folks, who have little to no prior networking, construction, or other experience, to install fairly good, well suited, wireless networks in their homes with little more than a screwdriver and some ethernet patch cables.

I have to commend Ubiquiti here.  Their UDR product has an access point built in, with a controller for it, and it functions as a consumer modem, and has 2x PoE ports for additional access points (or cameras, etc). It's not expensive, around $200, and additional access points start at $130 for a U6 lite at the time of this post.  Provided you have the required cabling, you can have a three AP system for under $500, which will be good to go for the next 10 years.  One mark against them is the limit of using a single 1Gbps ethernet link for WAN.  As available internet speeds exceed 1Gbps, these become less useful.  I know many are still stuck in the sub 1Gbps space, including me... but this is slowly but surely changing.  Hopefully they will release a new UDR type unit, with much the same capabilities but with a multi-gig (2.5/5G) or 10G WAN connection sometime in the future.  This is the way things should go, with a few well-placed APs, running relatively low transmit power, and one central main unit that's also an AP, you can quickly, easily, and very reliably serve your home for internet connectivity.  low power means that as the signal attenuates outside the bounds of your home, it will have very little, if any impact to your neighbors, and if all systems by the inexpensive consumer networking vendors had a similar product, which all your neighbors started using, then WiFi would be improved for all.

All of this circles around the main problem, which is having a spot where you can put the access points to best serve the needs of your home.  Most condos do ethernet wiring by default now, and newly-built homes at the very least, have the option to, but again, not the right wiring.  I have yet to see any pre-built home that has a single ethernet in the ceiling.  The builder could easily take their floor plans and designs to a professional wireless networking person, such as myself, but not necessarily me, for planning.  That pro could then run the plans through a WiFi planning software, and find the optimal WiFi placements to best serve the home with the least amount of transmission power.  At that point, it would be a fairly trivial matter for the builder to simply put ethernet for access point drops right where it would be most ideal to place an access point.  This takes all the thinking out of the process for anyone buying these homes, and long-term would ensure that neighborhoods, apartments and condos which are built with this in mind, would be good places to be with great WiFi options.

The whole process would be to move in, install the APs in the pre-designated locations, and that's it. You as the occupant of such a property would need to do little more than buy the equipment, get up on a ladder and install it by screwing the mounting plate to the ceiling, plugging in the AP, and securing it to the mounting plate.  Once that's done, plug in the required cables at the network box, and configure it to your liking, or by following a quick-start program.  The system would otherwise set itself up, and you, the end user, would in turn, get very good, very reliable, and very fast WiFi, with very little effort.  It would be about the same effort as installing a smoke detector, something I think most of us have done, or at least know how to do.

So my advice to anyone reading, if you're planning on buying a new build, ask for this.  If needed, find a professional to do the mapping, give the placements to the builder and ask them to put ethernet in the ceiling in the designated locations.  If you're already in a place that's finished and you have no ethernet in the ceiling; then if you have access to an attic, you can either do it yourself, or hire a low-voltage contractor to install the required connections, if you don't have access to the ceiling via an attic, then you will need to cut open walls, etc, to run the cables, and it only gets more difficult from there - thus, getting this idea into the hands of builders is the most important thing to do; so this becomes a thing of the past.

Look, as you may know from my previous posts, I'm not a fan of WiFi.  Wired when you can, wireless when you have to; but not everyone subscribes to that, or can do that, in a practical sense.  So having the ability to quickly and easily install WiFi in your living space that serves you well, is going to become very important in the coming years.  Getting your hands on good routers that support APs, such as the UDR I've mentioned, will be the other half to this.  Everyone has preference on what consumer networking vendor they like, and not everyone is a fan of how Ubiquiti does things, so having options in that market is going to be important.  Most rely on some form of wireless router, plus a PoE switch to make all this work, and it's embarrassing.  Even very inexpensive options on this front can easily be more costly than the UDR options, and they can easily be worse, as most consumer wireless routers don't support adding APs, so you may end up having two disjointed systems, or having to turn off the wireless on your router entirely.

The hardware will come, it will catch up.  Without the wiring in place, and the opportunity for people to use such a system as the UDR represents, then this idea will die.  IMO, the system the UDR represents, what it could be, and how prevalent it could be, is the idea worth fighting for.  If we, the consumers, are not asking for it, and/or buying into it, then it will not make it passed this point.  If the UDR or similar systems become a popular option for homeowners, then the future isn't as interference-prone as it currently appears.  Right now, people are buying a single wireless router, placing it wherever they can, usually not in any ideal location, and cranking the power to the max just to get signal to the furthest reaches of their home, this is creating significant interference due to the massive transmit power of these units.  There are MESH based solutions trying to fix this problem, by putting small "pods" or similar which hang off power outlets in distant corners of your home, but no MESH system will ever compare to a properly wired access point system in terms of reliability and speed.

To everyone thinking $500 is too much, let me put it this way.  Say you own the system for 5 years, if you adjust the cost of the system, over 5 years, it's a grand total of $8.30/mo.  Now consider how much you're spending monthly for internet - at all - and how much the modem/router rental is from that cost.  Yeah, exactly; and your ISP is making PROFIT in that time, for every month you are "renting" their equipment.  If anyone is saying "but I don't have a modem rental fee, it's free", no it's not, and yes, you're paying for it, it's just that they've moved that cost from one you can see, into the service fees for the internet.  You're paying more for internet so you can have a "free" modem.  Modem fees are generally $5-10/month, so even a $500 system, as long as it lasts 5+ years, you're spending about the same, or less for your own system, that is more reliable, better in almost every way, and more robust, than whatever you could be provided by the ISP.  If you consider that a good system like this may serve your needs for more than 5 years, that cost only goes down.  If you spend that time putting aside $10/month into a savings account, or hiding it in your mattress or something, you can fully replace the system in 5 years with one that's just as good, or better.  Considering most people pay more for that per month on something like coffee, that's a downright bargain, considering what it provides to you.  Netflix, facetime, reddit (or other social media of choice), chat with everyone you know, games.... the value is enormous, isn't having that reliable and good, worth more than you're spending on folgers?

When you think about it, you'd be an idiot not to want this.  Save your money by buying something better.  It will last longer and you'll be happier for it.

Monday, July 24, 2023

Hot Take: IT is the lifeblood of companies.

I have wanted to write about this for some time, it's an idea that, when you really think about it, it is both true and almost scary.  Simply put, modern business does not function without I.T.; as IT, I am the lifeblood of the company.  Horrifically, most business owners don't understand this, so I will explain.

As a preamble, this is a long, rant-y post, and it's off-the-cuff, so please take it as it is, and forgive my (I'm sure, plenty of) grammar and spelling transgressions.

In the past, before computerization, companies largely relied on paper; forms filled out by various individuals along the chain, adding to each document as things went along, creating a "paper trail" of activities, overseen by managers and decision-makers to make things go.  The sales cycle would go from telephones and in-person discussions with clients, to a quote, to an order, to fulfillment, which may or may not go through manufacturing steps, each with their own related paperwork, then to final delivery, which then gets passed over to accounts receivable for final collection.  At the end of the day, the money collected goes over to finance, who pays everyone for their work, and pays the bills to companies that provided materials, all of which have this same process happening internally.  The cycle continues, more sales are made, and more products are delivered.  This is the business process.

Presently, every step of this, aside from perhaps manufacturing, has been digitized, and relies on IT to work.  Sales are done by a combination of phones (often over VoIP), E-Mail, and in-person discussion (usually coordinated by the first two), the quote is prepared digitally on some sales software, sent electronically, and often approved/signed digitally.  That invoice is now sent, digitally, over to the fulfillment, which if there's manufacturing involved, every step of manufacturing is often tracked and sometimes even performed by or on computers.  Final delivery is generally arranged online for physical deliveries, whether through your parcel service of choice, or to schedule in-house drivers and get the product to the client; additionally for parcels, if you're not scheduling the pickup online, you're probably doing it by phone, which is VoIP, which is ALSO I.T. DRIVEN; then the final collection is usually done with a combination of email and phones, payment is shifting away from physical cheques to EFT, so now that process is entirely on the computer, and accounting/payroll is done entirely on the computer as well.

This is how most companies are going... "paperless"; even product resellers are using this process, in addition to inventory tracking, etc, to get things going, to the point at which there are WiFi connected devices that are scanning and tracking the inventory and ordering more.  The entire process, top to bottom, at every step, has I.T. involvement, and it's not slowing down; steps that are not computer driven are becoming computer driven.  Management has instantaneous access to workers punch-in and punch-out data, knowing who is on-site and working at all times, they're easily able to look up the status of any order and where it is in the chain of delivery to the client, from the deliverable being manufactured on the shop floor, to where the delivery driver is physically located right this second.  Management has an unparalleled amount of transparency into what workers are currently doing, how things are moving along, and when delivery to the client's door will happen.

This entirely relies on I.T. 100%.  Companies have slowly transformed from their specific market, a company selling x product, into data management organizations, that make money by selling x.  Everything from CRM, marketing, sales, accounting/finance, management, communication in all forms, etc, all driven by the network, and servers.

Make no mistake, any modern business IS A DATA MANAGEMENT COMPANY; they just happen to make money by selling a specific product or set of products, either through manufacturing it, or reselling it.

Every step of that process is I.T. driven.  Networking is the core of the information systems.  As a networker, I know this intimately. As an associate of mine (also a networker) has said previously:

"Networking is the bottom of the stack; no one cares about you, but everyone depends on you.  You will be the first point anyone looks to when there is a problem and be expected to provide input and advice about everything running on the network whether that's your job or not.  You need to know maths (subnets/latency), be able to visualize data flows in your head, and be able to build solutions with sub-second resiliency.  If you are ready for that, then come aboard. I'll pour the scotch."  - Laz Pereira

Laz is absolutely correct here; he's an excellent networker, and someone that a company could only be so lucky to land as a technical resource.  The fact is, when I.T. started, it was a convenience that mutated into a core competency for any thriving modern business.  Every sector, seemingly without exception, is computerized, relying on servers which rely on networking.  The problem is, since it was historically seen as a convenience, many (especially older) business execs still see it that way, as a convenience, not a requirement.  Meanwhile, without I.T. productivity would grind to a halt.  Imagine if Amazon, the largest product reseller out there, at least for their front-end business, suddenly had a major I.T. outage at a distribution center.  The workers, who are massively over-worked and my heart goes out to them, have their entire life dictated by the whims of their I.T. system.  They're expected to collect all the orders in a timely fashion, and are timed for their collection times based on where they are, and how long the system thinks it should take to get to the next item for collection.  This extends to their drivers, who are equally pushed to complete a number of items per hour delivered, all governed by a device they carry.  Whether you're inside the warehouse or in a truck, a small device connected to a large system of servers via a network infrastructure, dictates everything they do.  If that system, or the network on which it resides, was to be interrupted for even a few hours, their business would GRIND TO A HALT during that time.

Amazon is an extreme example, but simply, they have the infrastructure to ensure that instances of down-time are minimized into irrelevance, but other companies have the same problem, and do not have the same protections of redundancy.  There are huge swaths of industries that are ill-equipped for a major IT outage.  I'll give you a real-world example; I heard this third-party, but I have high confidence that the important details are correct.

A local business-owner, whose business manufactured plants, of all things - mainly growing them from seeds and distributing either fully-grown plants or clippings for sale, had a major loss and significant down-time.  What happened taught a powerful lesson to the owner.  The company suffered loss by fire.  Their primary office complex caught fire.  I want to be clear, they own acres of land and have several manufacturing, processing and distribution buildings, they run their own delivery service for their products, and they're very vertically integrated.  As with most businesses, they had tracking for manufacturing, using WiFi and small devices for inventory tracking (like hand-held scanners), VoIP, and their entire office structure was based on a paper-less system, so 90% of internal communication for orders, sales, invoicing, accounting, etc. was all done entirely digitally.  During the fire, as I was told, they had IT support remoted into servers, WHILE THE BUILDING WAS ON FIRE, trying to back up all the data they could.  Unfortunately, up until that point, the company owner viewed IT as more of a cost center, than a critical business asset, like it is.  The key point of this story is that when they rebuilt the building and got back to business-as-usual, as part of the rebuild, IT was now a key focus.  I worked with the business after the fact to help with redundancy in their IT infrastructure.

The lessons learned by the company owner drove him to have on-premise (different building) network-attached backups, as well as off-site sync for those backups, and a backup compute cluster for his primary servers.  He got about 80% of the way to full redundancy; literally able to shut down the main servers, fire up the backups, which were pre-emptively restored to the backup cluster every hour, and be up and running with little more than one hour of lost data.  The issue I saw, as a networker, was that their network was severely vulnerable.  They had dual internet connections, on a high-availability dual firewall set-up, which were geographically located in different buildings.  The issue in all of this was that all of their inter-building connections were run through a single structure; they were using largely a single VLAN for most of their connectivity, and since all of the ISP connections they were using, while redundant, were in out-buildings, if the primary building was lost, ALL connectivity to the ISPs and other buildings would be lost.  The data would be intact, and the servers would operate, but with no internet connection, and no connection to any other building in their campus.  I tried to warn them of this and propose solutions, but the discussion fell flat and I never got to the point where I could convince them to invest in the new data connectivity required to bring the system to be fully redundant; another technician took over, who was not a networker, and I don't think he understood that a problem existed, and as far as I know, the system is still vulnerable to this; which is why I won't mention the company name.  I don't want to call them out for making this error, at the end of the day, the company owner is still responsible for this oversight, and for hiring someone who doesn't have the ability to visualize the data flow, and understand where the data is moving to/from which would make the problem very obvious.

Fact is, I.T. is still seen as a cost center, and it's not.  It's become a critical business asset which needs to be taken as seriously as worker productivity.  Companies like Amazon understand this, but it's unsurprising that a multi-national monster like Amazon would.  They built-up AWS, in no small part, to support themselves; until AWS came around, there were not many cloud-based systems with the breadth and depth to support Amazon's growth.  Not all companies understand how critical it can be to have highly available, reliable and fast access to their data.  The contrast of company owners wanting to save money on the cost-center that is I.T. and what I.T. is actually doing for them, and how critical I.T. is for modern business, is staggering.

From personal experience, as some of you may know, I work IT support right now, working mainly at an MSP, which basically inserts me into companies as their de-facto I.T. guy.  It's hard for me to express the frustration of trying to do the best thing for my customer, and getting push-back on it from every side.  Whether it's on a new network switch which fits the customer needs but the sales team thinks is "too expensive" and the client would never approve it, to server infrastructure and systems, that my coworkers think is "unsupportable" because they don't understand it, and are not willing to learn it.  I'm constantly pushing for better and better equipment, knowing that a companies trajectory can turn on a dime; explosive growth could obsolete the capabilities of a system within a few years, well within the warranty period.  Planning for that potential growth and over-shooting what actually happens is rarely a bad thing; additionally, investing $1000, or even $10000 per switch in a data-critical environment is not even expensive for most businesses, it's just more than they would like, often opting for inexpensive solutions that barely meet their needs today, or are worse than their current equipment.  I've witnessed the removal of several high-end systems to be replaced with newer, but worse, solutions all because sales and company leadership doesn't understand the impact they're having on the infrastructure that underpins their business.  It's incredibly frustrating to see a company hamstrung by the "upgrade" they paid far to little for, thinking they were investing in their future I.T. capability and stability.  All of this happening because some sales person took it upon themselves to decide that what I spec out for the client is too expensive.

Here's some advice for sales and my fellow technical resources alike: ask the company for their "burn rate".  To explain, a burn rate is the amount of money that will be lost - per hour - if the IT infrastructure were completely lost; aka, network down.  This figure should include the cost of all labor that's now standing around not doing their job because they can't work since the entirety of their work is done on computer-backed systems (phones, email, wifi, internet, etc), along with the cost of any electrical, property, or other intermediate costs while down (paying rent on a building which isn't able to do anything, paying to keep the lights on, etc), and add that to the estimated number of sales lost during the outage, since sales can't do their work either.  They don't have to tell you this number - just know it themselves.  Fair warning: most don't know what this number is.  I have yet to find a company that goes back to the drawing board and figures it out during a sales discussion on the solution to buy; but it's an important thing to know as a business.

If the solution provided enhances their up-time/availability, and reduces outages from ~1% to 0.1% year over year, then that reduces their overall lost productivity from I.T. outages from 3.6 days, to 8.7 hours.  That's a lot of extra productivity, even losing two days at 8 hours a day, for an outage, is 16 hours of burn rate avoided.

I understand that estimating the down-time for a given solution is a bit of a minefield; but there's no doubt in my mind that higher-rated, higher-grade, better and more costly equipment from a reputable manufacturer, versus a cut-rate, budget solution from a sketchy division of a company or sketchy company, or by hacking together a solution that's a hodge podge of various interdependent systems, simply the former is more reliable.  Like the difference between buying an HP server, with NBD support, vs a custom-built "server" from off-the-shelf components, the HP server is known, predictable, and supported by a third party with the availability and components to repair any malfunction of the system in a day or less, while the custom-built server is unknown to anybody who didn't build it, and there is nobody that knows how to fix it immediately; to me that's the difference between a surgeon, armed with the knowledge and information of full x-ray, MRI, and other scans, before performing a laparoscopic, precision surgery, and just cutting open the patient in an exploratory fashion, trying to root around in their innards to find the problem.  The latter will do more damage, take longer under-the-knife, and have a longer recovery time, than that of the former, by a large margin.

Doctors don't go in blind, why should we?  Buy the HP server (or dell, or lenovo, or whatever brand); with support/warranty, and you basically have an expert, on-call, for when surgery is required, able to read the system logs and diagnostics to find the issue (troubleshooting - or in the case of our previous medical example, perform the required x-rays, MRI's, ultrasounds, whatever, to find the issue) then go in and precisely, and quickly fix the problem without even touching anything that's not broken.

This is just one of many, MANY examples.

To conclude this very long, very rant-y post, the perception that I.T. is a cost center is antiquated, and any business owner that still sees things this way is short sighted at best.  I.T. is the lifeblood of your business, and should be treated as a critical resource in the grand scheme of the operation of the business.  Anything less is irrational, short-sighted, and downright dangerous to the continuity of your company, your workers, their livelihoods, and yours alike, as well as a disservice to your clients and partners.  Pay your I.T. workers well, they keep the proverbial lights on when it comes to operations; listen to them when they raise concerns, pay attention and take it seriously.  We don't really gain anything or lose anything if the company operates smoothly.  We will find work even if your business goes under; and bluntly, most of us, don't care if you want to run the business into the I.T. stone age with bad gear.  At the end of the day, it's your company; we typically don't have any stake in whether or not you succeed.  Our only driving force for even suggesting improvement is because job hunting sucks.  Make our lives easier, and we'll reward you with years of dedicated service; loyalty notwithstanding.

Do the right thing for you, your company, your workers, your clients and your I.T. folks.

Saturday, July 9, 2022

off the cuff: Rogers outage 2022-07-08

 The outage that rocked Canada on 2022-07-08 and removed Rogers from the internet for the majority of a day, isn't over. There will surely be more information published, a root-cause analysis, and further inquiry as to how this happened, I am positive it will happen.  However, I don't want to speculate too much about why this all happened, I want to talk about emergency preparedness.


For those coming into this blind (maybe you're not Canadian), one of the affects of the outage was that the national debit system, Interac, went completely offline.  It didn't matter if you had a different service provider, if you tried to use debit, things simply would not work.  Another affect was the 911 outage for all Rogers connected devices, which I'd also like to speak about.


On Interac and the state of the network: This should not have been allowed to happen. Period.  This is a massive failure by Interac in providing services.  From what I've heard (some of this is third party information, so don't shoot me), they did have multiple providers, however, their primary provider was Rogers, and their backup provider was reliant on Rogers to work.  This may have been that Rogers was providing transit for the backup, or it may have been that they provided last-mile connectivity for them, I don't know; but issues like this should be obvious to the technical staff at Interac.  This service has become critical financial infrastructure, connecting ATMs together (even between banks), and companies to banks for payment processing.

My opinion is that Interac should look at their providers (banks) and clients (businesses) and see what the majority of people are using for their ISP, and directly link into those networks through internet exchanges.  This way, as long as the bank/company has an internet connection at their premise, into the providers network, they should also have a connection into interac's network.  I am certain that a non-trivial number of their providers are using Bell or Telus, probably both.  So Interac having no presence into either of those networks is a critical failing of their networking, and (again, in my opinion) needs to be addressed.  Failures like this should not be allowed to happen.

On the other side, I am fairly certain most of their providers (the banks) are already redundantly linked into multiple ISP networks.  however, on their client end, the businesses they serve need to take a long, hard look at their internet connectivity and decide if they should only have one ISP, or if they should have a failover ISP just in case.

To anyone looking to do failover - do some research.  Keep in mind that anything delivered by coax (TV cables) are probably all supported at the last mile by the same company.  So getting a "cable" internet package from a third party, such as Teksavvy, and a backup connection from the local cable provider, will result in the same vulnerability.  Generally, Cable and DSL are provided by different companies, so look to move to something that is DSL.  Teksavvy is a good example here, since they can typically provide cable or DSL connections to the same location.  There's plenty of these wholesale connection resellers around; Teksavvy, start.ca, VMEDIA, Distributel.... that's just for Ontario.  Look for local ISPs and get a DSL/Cable package (whatever is in opposition to what you currently have).  Combining that with a fairly inexpensive dual-wan router, and you're all set.

I'd like to point out that the advice above isn't new, it's just more relevant right now to get it out there.  Anyone running a business where debit/credit transactions comprise more than 40% or 50% of your revenue, should be thinking of redundancies in that system and how it connects to the internet, since, if it goes down, all that revenue disappears.  Sure, some people will go to the trouble of going to an ATM, getting cash, and returning for their desired product, but it's also quite likely that the customer will simply find a competitors store where debit/credit is not affected, and make their purchase there.  If the costs of a second internet connection and a fairly basic internet failover device is less than the profits lost from a day of not having debit, this pays for itself the first time you have a failure; since most internet outages, especially in the last mile, take upwards of a day, usually more, to fix.

Banks already know this and will very likely be doing this already.

Short story about Interac being down: I was at work yesterday, and for those that don't know me personally - I do not carry cash.  I didn't have a lunch, I planned on buying something during my allotted lunch time by the usual methods at a nearby shopping area.  Typically, that means debit.  So with no food in hand, I was left with very little choices, knowing that Interac was down, I tried talking to the shop, asking if they took credit (which should not be affected by the outage) and they did not - it's likely that the restaurants ISP was rogers. in any case, there's a bank branch nearby for my bank, so I went specifically to their ATM to get some cash - because I know that almost everything else is going to rely on the interac network to function.  So I'm waiting in line for the ATM, and noted several people in line that did not have a debit card for the bank in question. they had some competing bank.  Every single person who tried to withdraw money using a competing bank's card in my bank's ATM was met with failure.  A few tried several times, holding up the line.  The irony was that I work in an area where there are no less than three banks within a 5 minute walk.  Several of those people who failed at taking out money, were less than 150m from a bank where they would have been met with success.

The moral of this story is that when things fail, you have to start treating entities like banks, as islands unto themselves.  If it's not your bank, it won't work. period.

I was able to get cash and feed myself, so that was helpful, for me at least. not sure what happened to all those cross bank ATM users.  I don't like to carry cash, and the change I got when I finished my transaction now lives in my house.  I have a collection of money here that I never use at this point.  If I had known about the issue before leaving for work, I surely would have simply taken some money with me.


Regardless.  The people I feel most for are residential users.  Apart from paying far too much for redundant internet, what can be done here?  not much.  Personally, I have a DSL line from a Bell wholesaler, and a cellular line from Rogers.  Even that small of a diversification can keep you connected through a crisis like this.  If Rogers goes down, I have WiFi at my house. If Bell has a similar failure, I can at least connect with people via my phone.  This is a difference that most people have not even considered until this outage.  I've seen several people now moving to switch providers because of this issue, I would recommend a moment of pause before you rashly transfer all of your services onto a different provider.  This type of outage isn't impossible for any provider, and "putting all your eggs in one basket" will make you vulnerable to the same problems.  Diversify.  At the very least, go out and get a pay as you go SIM card from a different provider (make sure they're using different antennas) for your cellphone, and a pre-paid voucher for airtime on that provider.  Just put it somewhere it will be available to you all the time.  Maybe that's your wallet, or your car.  You don't have to activate it until you need it, just have it in case something goes completely sideways.  At least you'll have access to cellular calling, texting and data, in the event of an emergency.  This will rely on having an unlocked device, you can verify if your device is unlocked or not by simply inserting the SIM card from the other provider, you may get an error that it's unable to be used in your device.  If so, talk to your cellular provider and have them unlock your device, something that the CRTC mandated that they must be able to do for their devices.  It's a bit of a pain in the butt, but once you have your device unlocked, you're all set to go.

One thing I've done is that I signed up with a third party for my main telephone number.  In my case, it's VoIP.ms.  You don't really need to know a lot about VoIP or how to use it, to make this functional for you; it does cost a bit more money, since you're getting billed twice for the "same" line, but protects you in other ways.  What I did was ported my cellular number over to VoIP.ms, and set it up in the system to forward all calls to my cellphone number, which was a new reassigned number, since my old number was moved to VoIP.ms.  I still get all my calls.  The downside to this is that when you text someone, it will show up as your new number, but calls will complete to whatever number you forward to via VoIP.ms's website.  Coupling this with the backup, pay as you go SIM card, you can effectively redirect all of your calls to the pay as you go number (whatever they assign you) in an instant via VoIP.ms's online website.  You won't miss any calls, and everything will simply work.  It's something that can be completed in minutes and ensures that anyone needing to call you can get through.

That solution isn't right for everyone, and VoIP.ms does charge monthly for service, so it's an additional spend that people may not want to undertake. Also, you add a point of failure, that if VoIP.ms is down in any way for any length of time, you may miss calls.  So it's a risk either way.

Unfortunately, in my situation, I wasn't prepared, I did not have a pay as you go SIM waiting in the wings, though I considered purchasing one midway through the outage (though, how would I pay for it? heh).


For preparedness, I will be pursuing my ham radio license later this year, since simplex radio doesn't require any fancy cell towers to operate.  There's also AREDN, which is a wireless data network run by ham radio operators, which can function as emergency communication.  There's a lot to know in this arena and I won't clog up this post with information about it, but those are viable options for continuing to be able to communicate while things are non-functional, like what happened yesterday.  Luckily, I was in places I knew, around people I knew, and I didn't need google maps to show me where to go, and I didn't have any sort of emergency that required me to have communication.  I'm sure most were in the same boat, we survived the outage without incident (this time) and came out the other side no worse for wear.  My concern is, what happens when that's not the case?

Take care of yourselves, be prepared for things exactly like this.  If anyone wants to partake in the ham radio course with me, and get certified, please reach out, I'd love to have some nearby ham friends I can QRP with.  Cheers.

Sunday, December 26, 2021

Bell Canada vDSL demystified + GPON/FTTH

 I've been using Bell Canada's DSL lines in some shape or form for the past few decades at least.  The first internet connection I set up myself and managed myself was a Bell Sympatico DSL line.  I learned a lot back then, about DSL line filters and PPPoE and the nuances of getting that set up on my D-Link router with Sympatico's modem.  This was at a time when modems were just that, modems.  The recent release of Bell's Homehub series (and the 2wire that preceded it) are all modem/routers, this was before the 2wire was widely distributed as the Bell modem of choice.

I've never encouraged or endorsed anyone using a modem/router from any provider, they are all terrible. They are the digital equivalent of the phrase "jack of all trades, master of none"; and that rings true with all providers I've encountered so far.  They're not great routers, but the underlying technology can do the modem tasks pretty damn well, if you strip away everything else it's trying to do on top of it.  Most notably, I've found that the DHCP servers on these devices are slow, and frequently crash, so your lease times out, they also are not great at DNS, the queries are slower than going to the internet directly.  Not only are they pointing to Bell's global DNS servers, which are frequently not as fast for response time as other globally accessible DNS, but they add non-trivial delay in and of themselves.  On top of that, they are not great NAT devices, they frequently forget NAT sessions and their session limit seems to be quite low, so if you throw any number of clients at it (beyond a very minimal amount of 2-3), you'll frequently get oddities with your connection that things just stop working or don't work at all.  This requires that you restart your modem/router constantly, which isn't great.

Very quickly: The world of today is built upon the internet.  This foundation should be infallible, reliable and consistent.  The reality is that, even globally for routing, it's a convoluted mess of policies and protocols that enable us to communicate, but the equipment providing that connection in your home should not be under constant question and scrutiny to ensure it's working as intended.  In my opinion, at least for very simple networks, the connectivity provided by the network should not be the thing you're constantly trying to fix.  It's something that's astonishingly easy to get right, yet so many companies do it wrong.  There's a litany of reasons it can go wrong.  I'll refrain from commenting further because my opinions on how a network should operate - regardless of scope (eg. home/business/enterprise/provider), are a whole post in an of themselves.

There's a good number of reasons to put Bell's equipment into bridged mode (operating as a modem only) or removing it entirely.  Both of these can increase the reliability of your network, whether at home or at work.  The only exception is if they're providing you with something better than the Home Hub series.  There are a few instances that I have seen that Bell has provided Cisco or Juniper (or similar) class of equipment.  I believe this is reserved for very specific business use-cases, from medium business up through enterprise connectivity.  Setting that aside for the moment, since those solutions are good and work consistently, I want to talk about the vDSL and GPON that is provided for home-based and small-business use cases.  This often involves a home hub.

The information that Bell won't tell you is enormous.  Their usual line is to use the provided gear and that's the end of the discussion, it can be quite frustrating as a technology enthusiast or networker looking to get something a bit more robust to run your network.  It seems to me that Bell's intention is that clients in the consumer and SMB space will use their gateway as the default gateway, never ask questions and just deal with how horrible the device is.

Let me make this perfectly clear: Bell has a strong, reliable and robust network.... until you get to the gateway that they provide for you.  I've used a lot of Bell's networks for the purposes of connecting to the internet, both as a consumer, and as support for businesses trying to navigate Bell's messaging.  In every case where something is wrong, the problem is 90% of the time, the provided equipment.  If you're having a hard time with Bell, that is very likely the culprit.  Honorable mention to those in rural areas where the copper lines are horrible; but once you get past the modem/router through the copper lines to the node, it's smooth sailing out to the internet.  Why they condemn their clients to the horrible products they put out, I'll never know.  I feel that knowledgeable clients should be able to buy their own gear for the purpose of using it on Bell's network, and Bell should supply options for those people specifically, that serves their needs.  They do not.

To be VERY CLEAR: Bell, please give us devices that are strictly modems.  There's a non-trivial number of users who would benefit from this, and this message, as far as I'm concerned, has been shouted from the rooftops for years.  When it comes to fiber, give us an ONT that works, and let us figure out the rest.

To be fair to Bell, 80% of the clients they service are home-based users that don't know networking well enough to actually do what's required to get things working, and that's fair. I don't think grandma smith down the street cares that her internet isn't super reliable, as long as she can play her facebook games most of the time; but Bell EXCLUSIVELY caters to those who have zero networking knowledge or expertise; and that's what I think should change.


Moving on to more important topics, vDSL on the Bell network is fairly simple and straight forward, at least for anyone capable of their 50/10 "high speed" packages. These connections are handled by VDSL (ITU G.933.1 or ITU G.933.2), usually topping out around Profile 17a, though evidence suggests they may be moving to Profile 30a in the near future. There's remarkably little information as to the DSL profiles available if you examine the provided routers (home hubs 1, 2, and 3 - the 4 doesn't have a DSL port), however, some information can be gained from looking at wholesale customers like Teksavvy or Start.ca.  There's a ton of other wholesale clients for Bell's services, but I'm going to focus on Teksavvy since it seems to be the most popular in my area.  For vDSL 50mbps service, the unit they offer is the SmartRG 516AC, this is from a line of SmartRG modems, which includes everything from the RG501 through the RG516AC and beyond, they all have similar or the same chipsets for DSL, with varying features (the 501 only has a single ethernet port, as an example, while the 516 has full modem/router + wifi capabilities). Looking at the spec sheets for the 516AC, they support Annex A, L and M up to profile 17a.

So breaking it down, the basic specs are vDSL2+ using Profile 17a, on either Annex A, L or M, should be sufficient.  I've done my own research and found the PTM is the mode being used over VLAN 35.

I recently acquired this information by picking up a Cisco EHWIC-VA-DSL-M (Annex M supporting Profile 17a), it's entirely possible you could get everything working using the Annex A version of the same (EHWIC-VA-DSL-A), however, I have not tested this.  I have every suspicion it will work, but I have no evidence.  I'm subscribed to a wholesale line via Start.ca, who has been very good to me.  I installed the EHWIC into a Cisco ISR G2 1921 for use, which comes with it's own caveats.

Relating to Cisco vs DSL: you do not need the EHWIC-VA-DSL module's ATM port, you can disable it with the shutdown command.  How this works is that the ADSL and vDSL modes are descrete interfaces and controllers in IOS. so the ATM features and functions for ADSL are not required at all.  If you're following in my footsteps at all, you may want to look up the firmware for the card, however, there isn't an easy way to find it.  This module is the same as the built in module for the 800 series routers and the firmware is actually listed on the Cisco website under those routers.  One of the options for that firmware is the firmware that actually says it's compatible with the EHWIC-VA-DSL modules, so select that.  If you don't have a service contract with Cisco for the unit, you may be out of luck for downloading the firmware from Cisco - my only suggestion here is that if you manage to acquire it by other means, verify it with the MD5/SHA512 hash from the official download to verify it is correct and has not been tampered with.

vDSL will automatically try to connect without additional configuration, this is a L1 link and the defaults will work with this.  If you wish you can go into the controller settings (command is: (config)# controller vdsl <unit/slot/port>  where the slot/port/unit for me was 0/1/0, but could easily be 0/0/0 depending on your specific configuration), and set it to use the command ' operating mode vdsl2 ' to prevent it from discovering that.  Since it will always discover the same mode every time, this could save a bit of time when getting connected.  There's some merit to setting the SRA command here too for Seamless Rate Adjustment, though not strictly required.  After that, you may note an Ethernet interface popping up under the same unit/slot/port number, in my case Ethernet 0/1/0.  Get into the configuration mode for this interface and perform a no shutdown.  That's all that's needed here.  Next you want to create an interface for vlan 35, I selected ethernet 0/1/0.35 for the purpose, though the subinterface number could be anything, and set ' encapsulation dot1Q 35 ' to set VLAN ID 35 on the interface. this is where you set your dialer interface to dial from (pppoe enable // ppoe-client dial-pool-number #).  which requires a dialer interface configured with your username and password, as well as several other options that have been covered at length in other posts/blogs/kb articles.

One issue I kept running into was that my 1921 refused to connect and closed out the vDSL connection immediately after it was established.  I tracked this to a debug log entry that said it "failed to add pppoe switching subblock". This appears to be a Cisco bug, and I believe what finally fixed this was the inclusion of they keyword "callin" in the ' ppp authentication ' command (full resulting configuration was: ' ppp authentication pap chap callin ' - which appears to do the trick).  Once all that is set, you should have a functioning connection.  All usual nuances of setting up NAT and routing need to be done as well before you have a useful connection, but it does indeed work.


GPON/FTTH/Fibe:  This was an interesting journey down a rabbit hole for me.  Bell is using GPON very similarly to PTM over DSL, on VLAN 35.  With their recent release of the HH4k, they now have units in the field that are also XGS-PON.  So, starting from the beginning, they use GPON with 2.488 Gbit/s downstream and at least 1.244 Gbit/s of upstream (ITU G.984). This technology uses a form of waveform division multiplexing (or WDM), to mux 1310nm light for upstream traffic and 1490nm wavelength for downstream (optionally video at 1550nm).  These are split using, what is essentially a prism so tx and rx are independent, resulting in full-duplex operation.  The addition of XGS-PON is logical, since it can co-exist with GPON.  XGS-PON (ITU G.9807.1), as far as I know, uses the same frequencies as XG-PON (ITU G.987), but with increased bandwidth on upload (Nearing 10Gbit/s with 9.953 Gbit/s). hense XGS-PON - or X (for 10) G (gigabit) S (Symmetrical), PON (Passive Optical Network).  To my understanding this bandwidth is shared, and Bell will only give you a 'cut' of the bandwidth available.  It is likely they are planning to roll out, or have rolled out XGS-PON in high-demand areas, to avoid having to install more GPON line terminals to handle the user load, and more lines/splitters to divide customers up into more ports on the OLT.  At the head end, they can simply splice off the XG-PON wavelengths and install an XGS-PON line terminal to provide the required bandwidth while continuing to serve slower committed-rate clients with GPON. This is an economical solution and demonstrates Bell's ingenuity when it comes to their client-handling equipment.

There's a catch with GPON, that the transceiver needs to be authorized with the OLT.  So Bell can authorize or de-authorize whatever they want on their network, providing a significant challenge to anyone trying to remove, eliminate or otherwise bypass the homehub equipment.  With the early releases of GPON, this was a fairly trivial matter as Bell included a G-010S-A SFP GPON fiber module with the HH3k, which provided the crossover from GPON to ethernet inside of their homehub, you could remove this module and connect it to whatever you wanted, and get service, this has been eliminated with the use of the Homehub 4000, since it has a built-in GPON and XGS-PON transceiver array, which cannot be removed or changed, and must be used to connect, as alternatives are not authorized to connect to the OLT.  There are three factors for authorization that are possible, first is the module's MAC address, which is very commonly a filter that ISPs will use to classify equipment as authorized or not.  Next is the ONT S/N, which is broken into two parts, the MFR ID, which is the first four letters, and the G984 Serial number, which is an eight character hexadecimal code.  These are printed on the HH4k or the G-010S-A modules and can be readily accessed.  The last possible factor is the SLID or Subscriber Local Identifier, which is not printed on the unit nor accessible by the firmware on the homehub.  Luckily, with a bit of wizardry, I was able to obtain this information from a G-010S-A, and resulted in a string of zeros.  It appears Bell isn't using this factor, but may in the future.  We simply do not know.

So if you are pursuing a bypass to the HH3k or HH4k for GPON (the HH4k will tell you if it's in GPON mode on the WAN mode page), you can simply replace the HH4k with a Nokia G-010S-A module (which can have the MAC, SN, and SLID programmed), model 3FE46541AA (or same from Alcatel/Huawei), and reprogram it with the MAC/SN/SLID and use that instead.  All PPPoE needs to be done over VLAN 35, and everything should just work from there.

There is a git repository on the subject, so you shouldn't have any trouble getting access to the module for reprogramming, or finding the reprogramming commands.

This, of course, is informational, I offer no guarantee any of this will be valid tomorrow, or work for anyone else. If you choose to pursue removing the Bell branded equipment for your own, then do so at your own risk.  I am posting all this information because I have been consistently frustrated by Bell's lack of transparency, and rather than have anyone else go through the process of figuring it out, I wanted to put it out there for anyone seeking to do the same, so you can learn from my mistakes (of which there are many) and reach your goals faster, with less effort.  I am certain that Bell will not appreciate using alternative devices, modules, or connection methods to their network, and I am entirely positive that they will refuse to help anyone who has something set up in an "unsupported configuration".  So beware of issues.  It is handy to have the homehub given to you as part of your subscription in case of any issues.  First thing to do when experiencing a problem is to revert back to Bell's equipment and test to see if things are working with that before calling them to complain that anything isn't working.  IMO, they won't even talk to you about it until you do.

But I will say that I've moved over from using the homehubs and ISP provided equipment and my internet is quicker (lower latency), and more reliable than ever before.  Bandwidth is still limited, of course, but I can get what I need to get done, that much faster because I'm not waiting on their systems to figure out what to do next. I have control over the hardware, and I can troubleshoot very intelligently before needing to revert back to the provider-approved and supplied gear to determine if my equipment is at fault, or if their network is at fault.  Simply put, now that I've replaced the garbage modem/router they provided, I haven't had to deal with customer support for internet issues in many years.  Outages still happen, but I can determine the cause and wait it out before having to call them.

Bear in mind, that I do this on a professional level, so troubleshooting network connections is part of my DNA.  If that's not you, then maybe consider something a bit more conservative and hang onto that homehub.... just put it into bridged mode and call it a day.

Tuesday, April 13, 2021

HPE 1950 CLI - an undocumented COMWARE mess

 Stuck with a HPE 1950 that won't play nice?  lost connection because of a VLAN change? need to add a default route because you forgot to add one before removing the DHCP option?  look no further.

The HPE 1950 is a wonderful SMB switch for L2 and light L3 duties.  They're robust, relatively cheap, and have PoE options.  Everything you could want in a switch.  Then why is the CLI and GUI such a nightmare?  The web GUI seems to be trying too hard to shoehorn really basic functionality into really fancy and unintuitive menus.  Once you get used to the options being all over the place, you then face the lack of guidance from HPE about how to actually make use of this switch.

I've had two run-ins with the HPE 1950 that I'd like to write about.  First, there was some shenanigans with HPE support when the "DHCP Snooping" option was enabled, and it dropped all DHCP traffic, causing a Sev. 1 down.  The second, I removed a DHCP interface before adding a default route, causing me to lose all access to the web UI, luckily, I had another way.

First, what everyone is here for:

How do we make the HPE 1950's CLI actually useful?

Simple.  Connect to the unit, either through telnet (which appears to be enabled by default) or SSH, if it's enabled, or via a console cable.  Whatever option you have, connect, log in, and you'll be dropped to a relatively useless prompt of <%hostname%>.

You'll notice rather quickly, this prompt isn't really good for anything.  You can factory reset the unit using "initialize", which is almost never the best solution to the problem.  You can change the default interface IP configuration using the "ipsetup" command, but as far as I've seen, this only affects the default interface.  If you have VLAN 1 (default) disabled, or otherwise secured, then this option isn't particularly useful.  The "display" command (aliased as "show"), only allows you to see PoE information, which again, isn't specifically useful.

However, there's one command here that will get us to the 'next level': xtd-cli-mode, or extended CLI mode.  This will allow a lot more options, but again, isn't necessarily useful.  The bonus on this is that it's password protected.  Luckily, HPE's own forums has the password published for all to use.  It is "foes-bent-pile-atom-ship".  why this password? ask HPE.  I really don't know.

Next, we have a "extended" prompt. show is useful, there's a lot more commands ( "?" will bring up what's available), some of them may solve your problem, unless you need to add a route or something.

To add a route, or change an interface or similar, you'll want to issue the "system-view" command.  This puts you into the real administrators seat.  the question mark will be your friend, since this mode is basically undocumented.

Some helpful stuff:

vlan # - add a vlan to the switch config. (brings up a sub-menu to add description and other things)

interface (gig/xge) 1/0/# - enter switchport configuration mode

interface vlan # - enter vlan interface configuration mode

Under interface config mode, you can also shut, and no shut, and also issue vlan commands.  "port" is the keyword (similar to "switchport" on Cisco), where "port link-type (access/hybrid/trunk)" for switchport mode access/trunk, and "port (access/hybrid/trunk) vlan # (untagged/tagged)" will add vlans to an interface.

To Summarize:

Connect by available method
enter "xtd-cli-mode"
enter password "foes-bent-pile-atom-ship"
enter "system-view"

Afterwards, use "?" the same way you would on any other router/switch CLI, and you should be able to figure the rest out.


Story time:

Several years ago, I was implementing HPE 1950's for a client for the first time.  I was checking out all the features.  After deployment, I continued looking into what we could/should implement for security, performance, etc.  They were using the switching entirely in Layer 2 mode, the only L3 interface was for management.  I came across the DHCP snooping option, and enabled it to see what it would do.  I was in for a ride.  I thought "snooping"? how bad could it be?  Well, apparently HPE's idea of snooping is also DHCP enforcement, where you have to authorize DHCP servers.  At the time, HPE had a bug in the 1950 where this option, once enabled, had no GUI button to turn it off.  This has since been fixed, but at the time I was stuck, so I quickly assigned it the LAN DHCP server's IP as an authorized DHCP server, and thought that would 'fix it' at least to the point where DHCP would function.  Wrong again.  I don't know, to this day, whether the feature was just completely broken, or if I had done something horribly wrong. DHCP was down, for everyone on the LAN.  This network just went live.  So I tried everything to try and fix it, even getting to the useless CLI modes I've listed above, with no success.  At the time I didn't have the "extended cli mode" password so I was at an impasse.  I called HPE and started a support ticket, the network is down, so I indicated it was the highest severity, and began working with a technician, while a coworker zipped around the office to static IP all the workstations they could, to try to get some people at least working until we fixed the problem.

After several hours of working with HPE we were nowhere.  No resolution in sight, but being keen-eyed helped me here.  I saw the HPE tech get into extended CLI mode, and while I didn't have the password, I saw him use the "sys" command (aka "system-view"), and when he disconnected to have a higher-level team callback (who calls back on a Sev.1 issue?), I re-entered system-view and issued the "no dhcp snooping enable" command, and fixed my own issue.  Years later I found the extended CLI mode password, and was able to complete the whole thing myself if needed, but to this day, I've never been so curious about the DHCP snooping, and whether it's been fixed, to test it out and actually enable it.

Second story:

I'm prepping a small set of 1950's in Layer 3 static routed mode for a client, this is a growing network, but they haven't yet budgeted for a truly L3/routed setup, much to my dismay.  So I proposed an approximation of one, that should be relatively simple to adapt to full L3 switching and expand exponentially, when the time comes.  I drew up an IP subnetting plan, broke down subnets per building, making each 1950 a gateway/router for the network.  I finally got their switches into my lab for prep.  There were about 6 new switches, only two needed routing enabled (the third building had an L3 capable switch already - which was adjusted after the fact).  Well, while I'm prepping the switches, I setup a management vlan.  VLAN 101 was used for the purpose.  The problem I ran into is that the native/default interface (VLAN1) being DHCP, obtained a default route from the DHCP server, which I had setup before hand.  After setting up VLAN 101, I removed the IP from the VLAN1 interface, and the unit dropped off the face of the earth.  It clicked relatively quickly for me that I forgot to add a default route through VLAN 101's gateway, to route back to my workstation on VLAN1.  No problem, I'll just switch my workstation to VLAN 101, except, VLAN 101 doesn't have DHCP, and I'm not in physical proximity to my lab.  So I go to my lab router, add DHCP to VLAN 101, change my VLAN to 101, and my lab workstation drops off the face of the earth.  facepalm. I didn't add VLAN 101 to the port I'm connected to.  what now?  So I jumped onto the lab router (a Cisco ISR), and started poking around.  I found that I could telnet into the problematic switches, since the lab router had IPs on all relevant VLANs, and local connectivity works without a DG being set.  I got into the switch that didn't have a default route and after getting into "system-view" mode, I was able to issue an "ip route-static 0.0.0.0 0 <gatewayIP>" with some extra parameters for priority and comments.  So now that switch is fixed.  I backed out and connected to the switch my workstation was connected to, I checked for ports that are up/up, and found my system on gigabit port 2, once into the interface config, issued 'port link-type hybrid' and 'port hybrid vlan 101 tagged' and poof, my workstation popped up.

I find it ironic that I was able to fix the IP routing issue before I got my workstation online again.  This happened because I was trying to remember how to get into what Cisco would call "configure terminal" mode, again.


I'm not perfect, but I'm pretty proud of my ingenuity on figuring all this out about a platform where the CLI is basically not documented.  I don't want this information to go to waste, and I hope it helps someone else.  Be well.

Tuesday, October 17, 2017

VMware VCSA 6.5U1a accept EULA

Strange error today, I wanted to share because I didn't find a solution on google immediately.

I came across the solution by cleverness alone.

Situation: VCSA updated to 6.5U1 and 6.5U1a is available (build 6671409); trying either the repository or the CDROM method of updating is unable to be completed: no 'accept' button available when trying to install.

For reference and searching: accept grey, unable to accept EULA, cannot accept EULA, cannot update, unable to update VCSA....

The problem: Chrome has no check box to accept the EULA, and the button to proceed is grey, so you have two options: read the EULA, or cancel, neither results in an install.

The solution: use Firefox.

I'm a huge fan of Chrome; but sometimes, Firefox just works.  this is one of those times.

Good luck out there in the tubes... until next time.