Sunday, December 26, 2021

Bell Canada vDSL demystified + GPON/FTTH

 I've been using Bell Canada's DSL lines in some shape or form for the past few decades at least.  The first internet connection I set up myself and managed myself was a Bell Sympatico DSL line.  I learned a lot back then, about DSL line filters and PPPoE and the nuances of getting that set up on my D-Link router with Sympatico's modem.  This was at a time when modems were just that, modems.  The recent release of Bell's Homehub series (and the 2wire that preceded it) are all modem/routers, this was before the 2wire was widely distributed as the Bell modem of choice.

I've never encouraged or endorsed anyone using a modem/router from any provider, they are all terrible. They are the digital equivalent of the phrase "jack of all trades, master of none"; and that rings true with all providers I've encountered so far.  They're not great routers, but the underlying technology can do the modem tasks pretty damn well, if you strip away everything else it's trying to do on top of it.  Most notably, I've found that the DHCP servers on these devices are slow, and frequently crash, so your lease times out, they also are not great at DNS, the queries are slower than going to the internet directly.  Not only are they pointing to Bell's global DNS servers, which are frequently not as fast for response time as other globally accessible DNS, but they add non-trivial delay in and of themselves.  On top of that, they are not great NAT devices, they frequently forget NAT sessions and their session limit seems to be quite low, so if you throw any number of clients at it (beyond a very minimal amount of 2-3), you'll frequently get oddities with your connection that things just stop working or don't work at all.  This requires that you restart your modem/router constantly, which isn't great.

Very quickly: The world of today is built upon the internet.  This foundation should be infallible, reliable and consistent.  The reality is that, even globally for routing, it's a convoluted mess of policies and protocols that enable us to communicate, but the equipment providing that connection in your home should not be under constant question and scrutiny to ensure it's working as intended.  In my opinion, at least for very simple networks, the connectivity provided by the network should not be the thing you're constantly trying to fix.  It's something that's astonishingly easy to get right, yet so many companies do it wrong.  There's a litany of reasons it can go wrong.  I'll refrain from commenting further because my opinions on how a network should operate - regardless of scope (eg. home/business/enterprise/provider), are a whole post in an of themselves.

There's a good number of reasons to put Bell's equipment into bridged mode (operating as a modem only) or removing it entirely.  Both of these can increase the reliability of your network, whether at home or at work.  The only exception is if they're providing you with something better than the Home Hub series.  There are a few instances that I have seen that Bell has provided Cisco or Juniper (or similar) class of equipment.  I believe this is reserved for very specific business use-cases, from medium business up through enterprise connectivity.  Setting that aside for the moment, since those solutions are good and work consistently, I want to talk about the vDSL and GPON that is provided for home-based and small-business use cases.  This often involves a home hub.

The information that Bell won't tell you is enormous.  Their usual line is to use the provided gear and that's the end of the discussion, it can be quite frustrating as a technology enthusiast or networker looking to get something a bit more robust to run your network.  It seems to me that Bell's intention is that clients in the consumer and SMB space will use their gateway as the default gateway, never ask questions and just deal with how horrible the device is.

Let me make this perfectly clear: Bell has a strong, reliable and robust network.... until you get to the gateway that they provide for you.  I've used a lot of Bell's networks for the purposes of connecting to the internet, both as a consumer, and as support for businesses trying to navigate Bell's messaging.  In every case where something is wrong, the problem is 90% of the time, the provided equipment.  If you're having a hard time with Bell, that is very likely the culprit.  Honorable mention to those in rural areas where the copper lines are horrible; but once you get past the modem/router through the copper lines to the node, it's smooth sailing out to the internet.  Why they condemn their clients to the horrible products they put out, I'll never know.  I feel that knowledgeable clients should be able to buy their own gear for the purpose of using it on Bell's network, and Bell should supply options for those people specifically, that serves their needs.  They do not.

To be VERY CLEAR: Bell, please give us devices that are strictly modems.  There's a non-trivial number of users who would benefit from this, and this message, as far as I'm concerned, has been shouted from the rooftops for years.  When it comes to fiber, give us an ONT that works, and let us figure out the rest.

To be fair to Bell, 80% of the clients they service are home-based users that don't know networking well enough to actually do what's required to get things working, and that's fair. I don't think grandma smith down the street cares that her internet isn't super reliable, as long as she can play her facebook games most of the time; but Bell EXCLUSIVELY caters to those who have zero networking knowledge or expertise; and that's what I think should change.


Moving on to more important topics, vDSL on the Bell network is fairly simple and straight forward, at least for anyone capable of their 50/10 "high speed" packages. These connections are handled by VDSL (ITU G.933.1 or ITU G.933.2), usually topping out around Profile 17a, though evidence suggests they may be moving to Profile 30a in the near future. There's remarkably little information as to the DSL profiles available if you examine the provided routers (home hubs 1, 2, and 3 - the 4 doesn't have a DSL port), however, some information can be gained from looking at wholesale customers like Teksavvy or Start.ca.  There's a ton of other wholesale clients for Bell's services, but I'm going to focus on Teksavvy since it seems to be the most popular in my area.  For vDSL 50mbps service, the unit they offer is the SmartRG 516AC, this is from a line of SmartRG modems, which includes everything from the RG501 through the RG516AC and beyond, they all have similar or the same chipsets for DSL, with varying features (the 501 only has a single ethernet port, as an example, while the 516 has full modem/router + wifi capabilities). Looking at the spec sheets for the 516AC, they support Annex A, L and M up to profile 17a.

So breaking it down, the basic specs are vDSL2+ using Profile 17a, on either Annex A, L or M, should be sufficient.  I've done my own research and found the PTM is the mode being used over VLAN 35.

I recently acquired this information by picking up a Cisco EHWIC-VA-DSL-M (Annex M supporting Profile 17a), it's entirely possible you could get everything working using the Annex A version of the same (EHWIC-VA-DSL-A), however, I have not tested this.  I have every suspicion it will work, but I have no evidence.  I'm subscribed to a wholesale line via Start.ca, who has been very good to me.  I installed the EHWIC into a Cisco ISR G2 1921 for use, which comes with it's own caveats.

Relating to Cisco vs DSL: you do not need the EHWIC-VA-DSL module's ATM port, you can disable it with the shutdown command.  How this works is that the ADSL and vDSL modes are descrete interfaces and controllers in IOS. so the ATM features and functions for ADSL are not required at all.  If you're following in my footsteps at all, you may want to look up the firmware for the card, however, there isn't an easy way to find it.  This module is the same as the built in module for the 800 series routers and the firmware is actually listed on the Cisco website under those routers.  One of the options for that firmware is the firmware that actually says it's compatible with the EHWIC-VA-DSL modules, so select that.  If you don't have a service contract with Cisco for the unit, you may be out of luck for downloading the firmware from Cisco - my only suggestion here is that if you manage to acquire it by other means, verify it with the MD5/SHA512 hash from the official download to verify it is correct and has not been tampered with.

vDSL will automatically try to connect without additional configuration, this is a L1 link and the defaults will work with this.  If you wish you can go into the controller settings (command is: (config)# controller vdsl <unit/slot/port>  where the slot/port/unit for me was 0/1/0, but could easily be 0/0/0 depending on your specific configuration), and set it to use the command ' operating mode vdsl2 ' to prevent it from discovering that.  Since it will always discover the same mode every time, this could save a bit of time when getting connected.  There's some merit to setting the SRA command here too for Seamless Rate Adjustment, though not strictly required.  After that, you may note an Ethernet interface popping up under the same unit/slot/port number, in my case Ethernet 0/1/0.  Get into the configuration mode for this interface and perform a no shutdown.  That's all that's needed here.  Next you want to create an interface for vlan 35, I selected ethernet 0/1/0.35 for the purpose, though the subinterface number could be anything, and set ' encapsulation dot1Q 35 ' to set VLAN ID 35 on the interface. this is where you set your dialer interface to dial from (pppoe enable // ppoe-client dial-pool-number #).  which requires a dialer interface configured with your username and password, as well as several other options that have been covered at length in other posts/blogs/kb articles.

One issue I kept running into was that my 1921 refused to connect and closed out the vDSL connection immediately after it was established.  I tracked this to a debug log entry that said it "failed to add pppoe switching subblock". This appears to be a Cisco bug, and I believe what finally fixed this was the inclusion of they keyword "callin" in the ' ppp authentication ' command (full resulting configuration was: ' ppp authentication pap chap callin ' - which appears to do the trick).  Once all that is set, you should have a functioning connection.  All usual nuances of setting up NAT and routing need to be done as well before you have a useful connection, but it does indeed work.


GPON/FTTH/Fibe:  This was an interesting journey down a rabbit hole for me.  Bell is using GPON very similarly to PTM over DSL, on VLAN 35.  With their recent release of the HH4k, they now have units in the field that are also XGS-PON.  So, starting from the beginning, they use GPON with 2.488 Gbit/s downstream and at least 1.244 Gbit/s of upstream (ITU G.984). This technology uses a form of waveform division multiplexing (or WDM), to mux 1310nm light for upstream traffic and 1490nm wavelength for downstream (optionally video at 1550nm).  These are split using, what is essentially a prism so tx and rx are independent, resulting in full-duplex operation.  The addition of XGS-PON is logical, since it can co-exist with GPON.  XGS-PON (ITU G.9807.1), as far as I know, uses the same frequencies as XG-PON (ITU G.987), but with increased bandwidth on upload (Nearing 10Gbit/s with 9.953 Gbit/s). hense XGS-PON - or X (for 10) G (gigabit) S (Symmetrical), PON (Passive Optical Network).  To my understanding this bandwidth is shared, and Bell will only give you a 'cut' of the bandwidth available.  It is likely they are planning to roll out, or have rolled out XGS-PON in high-demand areas, to avoid having to install more GPON line terminals to handle the user load, and more lines/splitters to divide customers up into more ports on the OLT.  At the head end, they can simply splice off the XG-PON wavelengths and install an XGS-PON line terminal to provide the required bandwidth while continuing to serve slower committed-rate clients with GPON. This is an economical solution and demonstrates Bell's ingenuity when it comes to their client-handling equipment.

There's a catch with GPON, that the transceiver needs to be authorized with the OLT.  So Bell can authorize or de-authorize whatever they want on their network, providing a significant challenge to anyone trying to remove, eliminate or otherwise bypass the homehub equipment.  With the early releases of GPON, this was a fairly trivial matter as Bell included a G-010S-A SFP GPON fiber module with the HH3k, which provided the crossover from GPON to ethernet inside of their homehub, you could remove this module and connect it to whatever you wanted, and get service, this has been eliminated with the use of the Homehub 4000, since it has a built-in GPON and XGS-PON transceiver array, which cannot be removed or changed, and must be used to connect, as alternatives are not authorized to connect to the OLT.  There are three factors for authorization that are possible, first is the module's MAC address, which is very commonly a filter that ISPs will use to classify equipment as authorized or not.  Next is the ONT S/N, which is broken into two parts, the MFR ID, which is the first four letters, and the G984 Serial number, which is an eight character hexadecimal code.  These are printed on the HH4k or the G-010S-A modules and can be readily accessed.  The last possible factor is the SLID or Subscriber Local Identifier, which is not printed on the unit nor accessible by the firmware on the homehub.  Luckily, with a bit of wizardry, I was able to obtain this information from a G-010S-A, and resulted in a string of zeros.  It appears Bell isn't using this factor, but may in the future.  We simply do not know.

So if you are pursuing a bypass to the HH3k or HH4k for GPON (the HH4k will tell you if it's in GPON mode on the WAN mode page), you can simply replace the HH4k with a Nokia G-010S-A module (which can have the MAC, SN, and SLID programmed), model 3FE46541AA (or same from Alcatel/Huawei), and reprogram it with the MAC/SN/SLID and use that instead.  All PPPoE needs to be done over VLAN 35, and everything should just work from there.

There is a git repository on the subject, so you shouldn't have any trouble getting access to the module for reprogramming, or finding the reprogramming commands.

This, of course, is informational, I offer no guarantee any of this will be valid tomorrow, or work for anyone else. If you choose to pursue removing the Bell branded equipment for your own, then do so at your own risk.  I am posting all this information because I have been consistently frustrated by Bell's lack of transparency, and rather than have anyone else go through the process of figuring it out, I wanted to put it out there for anyone seeking to do the same, so you can learn from my mistakes (of which there are many) and reach your goals faster, with less effort.  I am certain that Bell will not appreciate using alternative devices, modules, or connection methods to their network, and I am entirely positive that they will refuse to help anyone who has something set up in an "unsupported configuration".  So beware of issues.  It is handy to have the homehub given to you as part of your subscription in case of any issues.  First thing to do when experiencing a problem is to revert back to Bell's equipment and test to see if things are working with that before calling them to complain that anything isn't working.  IMO, they won't even talk to you about it until you do.

But I will say that I've moved over from using the homehubs and ISP provided equipment and my internet is quicker (lower latency), and more reliable than ever before.  Bandwidth is still limited, of course, but I can get what I need to get done, that much faster because I'm not waiting on their systems to figure out what to do next. I have control over the hardware, and I can troubleshoot very intelligently before needing to revert back to the provider-approved and supplied gear to determine if my equipment is at fault, or if their network is at fault.  Simply put, now that I've replaced the garbage modem/router they provided, I haven't had to deal with customer support for internet issues in many years.  Outages still happen, but I can determine the cause and wait it out before having to call them.

Bear in mind, that I do this on a professional level, so troubleshooting network connections is part of my DNA.  If that's not you, then maybe consider something a bit more conservative and hang onto that homehub.... just put it into bridged mode and call it a day.

Tuesday, April 13, 2021

HPE 1950 CLI - an undocumented COMWARE mess

 Stuck with a HPE 1950 that won't play nice?  lost connection because of a VLAN change? need to add a default route because you forgot to add one before removing the DHCP option?  look no further.

The HPE 1950 is a wonderful SMB switch for L2 and light L3 duties.  They're robust, relatively cheap, and have PoE options.  Everything you could want in a switch.  Then why is the CLI and GUI such a nightmare?  The web GUI seems to be trying too hard to shoehorn really basic functionality into really fancy and unintuitive menus.  Once you get used to the options being all over the place, you then face the lack of guidance from HPE about how to actually make use of this switch.

I've had two run-ins with the HPE 1950 that I'd like to write about.  First, there was some shenanigans with HPE support when the "DHCP Snooping" option was enabled, and it dropped all DHCP traffic, causing a Sev. 1 down.  The second, I removed a DHCP interface before adding a default route, causing me to lose all access to the web UI, luckily, I had another way.

First, what everyone is here for:

How do we make the HPE 1950's CLI actually useful?

Simple.  Connect to the unit, either through telnet (which appears to be enabled by default) or SSH, if it's enabled, or via a console cable.  Whatever option you have, connect, log in, and you'll be dropped to a relatively useless prompt of <%hostname%>.

You'll notice rather quickly, this prompt isn't really good for anything.  You can factory reset the unit using "initialize", which is almost never the best solution to the problem.  You can change the default interface IP configuration using the "ipsetup" command, but as far as I've seen, this only affects the default interface.  If you have VLAN 1 (default) disabled, or otherwise secured, then this option isn't particularly useful.  The "display" command (aliased as "show"), only allows you to see PoE information, which again, isn't specifically useful.

However, there's one command here that will get us to the 'next level': xtd-cli-mode, or extended CLI mode.  This will allow a lot more options, but again, isn't necessarily useful.  The bonus on this is that it's password protected.  Luckily, HPE's own forums has the password published for all to use.  It is "foes-bent-pile-atom-ship".  why this password? ask HPE.  I really don't know.

Next, we have a "extended" prompt. show is useful, there's a lot more commands ( "?" will bring up what's available), some of them may solve your problem, unless you need to add a route or something.

To add a route, or change an interface or similar, you'll want to issue the "system-view" command.  This puts you into the real administrators seat.  the question mark will be your friend, since this mode is basically undocumented.

Some helpful stuff:

vlan # - add a vlan to the switch config. (brings up a sub-menu to add description and other things)

interface (gig/xge) 1/0/# - enter switchport configuration mode

interface vlan # - enter vlan interface configuration mode

Under interface config mode, you can also shut, and no shut, and also issue vlan commands.  "port" is the keyword (similar to "switchport" on Cisco), where "port link-type (access/hybrid/trunk)" for switchport mode access/trunk, and "port (access/hybrid/trunk) vlan # (untagged/tagged)" will add vlans to an interface.

To Summarize:

Connect by available method
enter "xtd-cli-mode"
enter password "foes-bent-pile-atom-ship"
enter "system-view"

Afterwards, use "?" the same way you would on any other router/switch CLI, and you should be able to figure the rest out.


Story time:

Several years ago, I was implementing HPE 1950's for a client for the first time.  I was checking out all the features.  After deployment, I continued looking into what we could/should implement for security, performance, etc.  They were using the switching entirely in Layer 2 mode, the only L3 interface was for management.  I came across the DHCP snooping option, and enabled it to see what it would do.  I was in for a ride.  I thought "snooping"? how bad could it be?  Well, apparently HPE's idea of snooping is also DHCP enforcement, where you have to authorize DHCP servers.  At the time, HPE had a bug in the 1950 where this option, once enabled, had no GUI button to turn it off.  This has since been fixed, but at the time I was stuck, so I quickly assigned it the LAN DHCP server's IP as an authorized DHCP server, and thought that would 'fix it' at least to the point where DHCP would function.  Wrong again.  I don't know, to this day, whether the feature was just completely broken, or if I had done something horribly wrong. DHCP was down, for everyone on the LAN.  This network just went live.  So I tried everything to try and fix it, even getting to the useless CLI modes I've listed above, with no success.  At the time I didn't have the "extended cli mode" password so I was at an impasse.  I called HPE and started a support ticket, the network is down, so I indicated it was the highest severity, and began working with a technician, while a coworker zipped around the office to static IP all the workstations they could, to try to get some people at least working until we fixed the problem.

After several hours of working with HPE we were nowhere.  No resolution in sight, but being keen-eyed helped me here.  I saw the HPE tech get into extended CLI mode, and while I didn't have the password, I saw him use the "sys" command (aka "system-view"), and when he disconnected to have a higher-level team callback (who calls back on a Sev.1 issue?), I re-entered system-view and issued the "no dhcp snooping enable" command, and fixed my own issue.  Years later I found the extended CLI mode password, and was able to complete the whole thing myself if needed, but to this day, I've never been so curious about the DHCP snooping, and whether it's been fixed, to test it out and actually enable it.

Second story:

I'm prepping a small set of 1950's in Layer 3 static routed mode for a client, this is a growing network, but they haven't yet budgeted for a truly L3/routed setup, much to my dismay.  So I proposed an approximation of one, that should be relatively simple to adapt to full L3 switching and expand exponentially, when the time comes.  I drew up an IP subnetting plan, broke down subnets per building, making each 1950 a gateway/router for the network.  I finally got their switches into my lab for prep.  There were about 6 new switches, only two needed routing enabled (the third building had an L3 capable switch already - which was adjusted after the fact).  Well, while I'm prepping the switches, I setup a management vlan.  VLAN 101 was used for the purpose.  The problem I ran into is that the native/default interface (VLAN1) being DHCP, obtained a default route from the DHCP server, which I had setup before hand.  After setting up VLAN 101, I removed the IP from the VLAN1 interface, and the unit dropped off the face of the earth.  It clicked relatively quickly for me that I forgot to add a default route through VLAN 101's gateway, to route back to my workstation on VLAN1.  No problem, I'll just switch my workstation to VLAN 101, except, VLAN 101 doesn't have DHCP, and I'm not in physical proximity to my lab.  So I go to my lab router, add DHCP to VLAN 101, change my VLAN to 101, and my lab workstation drops off the face of the earth.  facepalm. I didn't add VLAN 101 to the port I'm connected to.  what now?  So I jumped onto the lab router (a Cisco ISR), and started poking around.  I found that I could telnet into the problematic switches, since the lab router had IPs on all relevant VLANs, and local connectivity works without a DG being set.  I got into the switch that didn't have a default route and after getting into "system-view" mode, I was able to issue an "ip route-static 0.0.0.0 0 <gatewayIP>" with some extra parameters for priority and comments.  So now that switch is fixed.  I backed out and connected to the switch my workstation was connected to, I checked for ports that are up/up, and found my system on gigabit port 2, once into the interface config, issued 'port link-type hybrid' and 'port hybrid vlan 101 tagged' and poof, my workstation popped up.

I find it ironic that I was able to fix the IP routing issue before I got my workstation online again.  This happened because I was trying to remember how to get into what Cisco would call "configure terminal" mode, again.


I'm not perfect, but I'm pretty proud of my ingenuity on figuring all this out about a platform where the CLI is basically not documented.  I don't want this information to go to waste, and I hope it helps someone else.  Be well.