So we're going to blame it all on the router, are we?
Tateru has a blog up today about efforts by Linden Lab's new "Project Shining" team to beat lag, and a discussion of the Belkin router. That links to Modem World with more info.
What she doesn't realize is that I'm the one that precipitated this entire thing around the Belkin.
Why? Because I repeatedly, persistently kept complaining about viewer 2.x and viewer 3.x shutting down my Internet connection.
Lindens never believed me; they all dismissed this as nonsense, as they always do, with that aggravating tekkie arrogance that so stops progress on this rickety over-patched world.
But now they believe me -- except they want to blame it on the router, not their software.
Oz Linden was even heard to make an admission of sorts in an office meeting, according to a comment from Inara Pey on Modem World:
Monty’s tests appear to have been in connection with the wi-fi side of the routers.
The comments are made at 24 minutes into the recording, with Monty broadly commenting on testing, specifically mentioned Linksys WRT and Belkin G series, prompting Oz to comment: “So the reports of Second Life ‘killing’ people’s wi-fi are, in fact, legit, in some sense,” a comment that prompted some TPV developers to agree – so one assumes the wi-fi issues have been widely reported / seen as a cause of problems for some people using SL.
Well, in fact, I have a DSL hook-up, with a router that creates the wi-fi network for three computers in our family. However, I could point out that I've also tried hooking SL directly into the Verizon DSL and not through the router, and I still had problems of visibility, jerking, stopping, crashing, etc. although not Internet cutoff.
What I'm told by Monty Linden from his preliminary findings is that the Belkin is one part of the issue, and another is this JIRA issue, "error 1004".
And that he also seems to be finding that the Belkin has deliberately programmed in a five-minute interruptive interval for when it is overloaded -- with stuff like the heavy load of a virtual world.
That makes sense. Wouldn't the manufacturer of a gadget make sure that it has a kind of circuit-breaker on it to avoid having it overloaded and burned out? Perhaps the world of Belkin router engineers doesn't envision heavy loads of virtual world coming through their product -- small wonder.
Now, you can look at all this and say, "These routers are to blame. Evil! Like evil telecoms! They deliberately throttle and deliberately choke innovative new media net traffic. Evil! They should have net neutrality!" Blah blah.
That's if you are a geek and a technocommunist and see things that way.
I don't.
I see the Belkin router as performing just fine, and as being what is available in Best Buy, which is what there is in my neighbourhood in terms of reasonably-priced computer products. In fact, it might be the only brand, I'd have to check. I've repeatedly bought Belkins. Now, I could go out now and spend another $30 or $50 and buy some other brand but how do I know it won't have the exact same issue? Others may, too, although I'm told that Cisco would be a good option for SL. So I may try that, as I have never adopted the adversarial geeky hatred of Cisco I've found in some quarters.
But what I think is more to the point is to get curious about WHY this happens with routers, not THAT the router "is the problem". And if I understood the Modem World discussion, Gwyn and others are remaining curious about this.
What I urge Lindens to do always and everywhere, in my never-ending quest to end opensource cultism and try to have an open society in our virtuality, is to become *more curious about their own software, and why it does things like this*. What their problem has always been from the get-go is this superior notion that the consumer and his shoddy products from Best Buy are the problem, instead of their esoteric software.
Remember when Doug Linden famously told us that "our computers were ready for kindergarten," as if they were all five years old? Maybe he was ready for kindergarten because he doesn't work there any more. I had a brand-new computer not playing SL then, despite being exactly what their web page specs were, and I really bristled at being told that my computer, which wasn't on solids yet, was going to kindergarten.
The problem is a very wide cultural gap. Most Lindens build their own rigs from parts from New Egg or their garages or whatever, and they have no idea what it is like to be a consumer. They have endless scorn for people reliant on store-bought items because part of their very manhood is wrapped up in their ability to build machines.
To be sure, Monty, who seems a phlegmatic and persistent sort, went out and bought a Belkin router to try to duplicate the bugs I kept reporting to him.Why? Because in his life, he uses a self-built router that he says could "survive a nuclear war". I don't doubt it. Come the heat death of the sun, my SL connection will naturally fry, along with myself, but I have no doubt that Monty's router will still be ticking. Still, he was willing to see "how the other half lives" in that vast continent of America and the world that shops at Wal-Mart and doesn't live in San Francisco.
Meanwhile, my immediate fix is not to go running out to buy yet another thing related to trying to make Second Life run, especially a nuclear-blast-reinforced sort of gadget, since I've already burned through desktops, graphic cards, fans, etc. over the course of my Second Life.
Instead, I'll just throttle down my game -- I'll check off the http textures and http whatsis on the "developer" or "advanced" menus -- that was a recommendation made to me by a Linden some time ago to get their game working when it crashed and shut off your Internet. That works pretty good, although lately, I find that even with those things unchecked, I still crash sometimes. Or I have another problem, which is that the pages increasingly needed to see stuff in SL just doesn't load.
Example: I usually contact tenants by pulling up their profiles right from their furniture. That's the quickest way, faster than even 1.23 search. Or if I see some product I want to buy, I pull up the merchant's store in her picks out of the object menu "creator". That was the heart of SL for me -- finding the people through the inworld objects.
Now that's all dead. Now I see CONNECTION CLOSED on a pink error screen, and I'm forced to use that awful 3.x search to pull up people, which is slow and strange.
But I'll live -- I'm a masochist and a misfit, remember? -- and I'll dumb down the draw distance to 98 or lower -- it's always best on the Mainland to keep a near drug-induced haze of visibility anyway with the stuff out there.
And I'll make sure all of those completely frivolous things like shadows, more ripply water, etc. are turned off. It occurred to me as I looked at my rental box throwing a shadow today that this aesthetic was really not a major requirement for me to enjoy the versimilitude of Second Life. Avatar foot shadows, as is well known, also get a pass from me.
But...Typical of this exercise was the disbelief that there was even a message that suggested you check your Internet connection, and that this couldn't have possibly come from Linden Lab, but had to have come from Microsoft. I publish it as exhibit A above, and now I've persuaded them.
I do have to point out that this "discovery" came about the hard way -- I kept blogging about it over and over and jumping up and down about it ever since 2.x was first rolled out several years ago, and finally an ex-Linden told another Linden that he really should get in touch with me and hear me out. That Linden (Monty) was then persistent through many messages, trials, errors, etc. etc. in dealing with a non-tekkie (me) to get the story. He remained curious and patient. That's the way one should always be about software: curious and patient. Not dogmatic and impatient. I'm grateful to Monty. But Monty is in a culture, a system, a place where there are people who are close-minded dominating tyrants and truth has a hard time being seen in that culture.
Because... there was still Oz to convince. Sigh. Oz is still knee-deep in User Stories from the Scrum Cult.
How's this for a User Story?
"Your virtual world software cuts off my Internet connection and it's taken me two years to get you to become a little more curious about that."
Sorry, but I'm not going to listen to that arrogant boor Latifa Khalifa, either, ranting about "crappy home routers".
Like I said, I'm a consumer. I go to Best Buy. I buy what's on the shelf that isn't $100. I don't buy to withstand nuclear attack. I buy to do normal stuff. The question isn't why I do this; the question is why Second Life doesn't work with these consumer realities, after 10 years being in beta. Yanno?
Tateru, of course, takes a different route, like the male dancer blaming his bad ballet on the fact of his testicles:
I’ll start by pointing out that the IEEE 802.11 networking standards aren’t the easiest set of networking protocols to implement correctly in device firmware. It’s all too easy to get them almost right, resulting in wireless access points and routers that work just fine for some kinds of workloads and that fall down spectacularly for others. As a bonus, RFC-2663 IPv4 Network Address Translation is also a commonly poorly-implemented feature in many network devices.
See, it's all those firmware makers that are to blame and these shoddy protocols, you know?
Continues Tateru:
In my experiences from 2000 to 2005, the vast majority of wireless access-points/routers of that period – while just fine for Aunt Tilley and her Facebook habit – turned out to be duds (ranging from extraordinary failure to far more subtle symptoms) once you hooked up a power-user or a small three or four person office. They might choke and die, they might toss stations off at random, they might just run some connections very slowly, or some might unpredictably stall, leaving you wondering.
But that's not true. How many offices has Tateru actually worked in, in her life? Facebook hardly existed from 2000-2005 and that is an anachronism, because Facebook i s a heavier load with its pictures and links and movies that Tateru lets on. I've worked in dozens of offices here and overseas. I've even had to serve as as a rough-and-ready "network administrator" in a small NGO that didn't have its own dedicated IT guy. Routers would have to be reset sometimes, sure. Like all machines. Every consumer knows that when a gadget doesn't work, you just reset it. That fixes it, most of the time. So? I've never seen routers "be the problem" in the overwhelming majority of offices I've worked in, some with hundreds of people, some with 10 or 12. What I've seen "be the problem" way more often is the geek in charge of the machines.
What's amazing is how fast this router alibi has translated into an Official Pronouncement from the Lab -- in one of its stunningly rare communications (they haven't updated their blog in months and months (unless you knew to look under "tools and technology"!) and who knows what they're doing). Tateru wrested this out of Peter Linden, who we've learned now is a "shredder" in a garage ban that Shamlet has written about:
“The Viewer can be hard on some home routers – a single Viewer instance can look like dozens of browser sessions (e.g. a small office) in certain aspects all by itself. In the worst cases, it can sometimes crash home routers or cause them to drop packets, which the user experiences as a brief network interruption while the router reboots or poor network performance.
There is nothing about Second Life that is intrinsically incompatible with any router, but we are working to make our network usage less hard on all routers; that’s a central element of the HTTP Library component of the Project Shining that we announced in our recent Tools & Technology post.”
Home routers! Aren't we quaint, we "home router" users! So...homey! So amateur and stupid! And again, that instant *lack of curiosity* and *willingness to pass the buck* with the "nothing intrinsically incompatible" stuff.
Tateru solves this the geek way: "Rather than messing with consumer network appliances, I’m going to grab an old PC and a Linux install disk and throw my own router together. It’s really what I should have been doing already."
Because the "home routers" open up a lot of connections and behave like a veritable office instead of a home.
Um, is there an "office router" for sale at Best Buy?
I want to point out another interesting thing about all this.
All these years I've rooted around in the inner files of The Sims Online and Second Life and such, I never knew that the app data/roaming/secondlife files had in them something called "Second Life Log". Look for this inside "logs" not "avatar name" -- that's why you may have missed it. We all know that a way to fix your game is "settings.xml" in the avatar's file, right?
Well, another thing to look at with growing curiosity is this log. I was warned not to hand this over to even a Linden lightly, because it contains possibly private information. I'm not sure what that could be, really, but it does in a sense contain a total track of everything you do in Second Life. Everywhere you go. Every thing you do. And even people who come up to you while you're AFK and try to talk to you.
And when you read those files, you see the enormous amount of things in Second Life that just plain go wrong. Region handshakes. Fonts. Avatar baking. Whatever. Warnings, failures, problems. It's almost like the norm for the operation of this software is always to be out on the edge warning, failing, falling short. At least that's the impression an uninitiated person like myself gets reading it.
As I'm blocked from the JIRA (banned for "edit wars" merely by persisting that my bug was not a feature with the awful Soft Linden, remember?), I can't even read the pages about "error 1004" so I look forward to someone giving me the headlines on that one. Meanwhile, check out your own "Second Life Log" and see what you find in there.




Comment I put on Tateru's blog that ended up in moderation:
The back story, as I'm the one who pushed this for several years and finally got the Lindens' attention, and finally Monty went out and bought a Belkin to test, God bless him:
http://secondthoughts.typepad.com/second_thoughts/2012/07/so-now-were-to-blame-it-all-on-the-belkin-router.html
And the issue here is to remain patient and curious about software, and resist that geeky culture that always wants to blame the user and their crappy home routers and this or that brand which "everyone knows" is crappy. Because the real question is why this software, which has been in beta for 10 years, can't work with the regular consumer products available in the regular normal computer stores, yanno?
Loki flags a problem that any of us can experience even just buying a Belkin: that the ISP, like Verizon, then requires new settings, a personal phone call, jiggering, etc. etc. And that can talk half a day or more to accomplish.
Wolf may have his finger on the heart of the problem, "Second Life has to change how it uses bandwidth. Some of the Project Shining elements (The cache operation: did they never notice?) will pay off. But I do sometimes wonder if the Lindens have ever understood the Internet. HTTP textures, for instance: how often were we told to try turning that off?"
The Lindens need to change their software and stop torturing consumers.
Posted by: Prokofy Neva | July 20, 2012 at 04:12 AM
beta or obsolete...
but get the money first.
never gonna end..
virtualized futures awate the stupid.
BTW- sad MIPS day... 20 something reported going comic book violent at the batman premiere in colorado...
guns, video game media, comicbooks, and adulthood.
Posted by: c3 | July 20, 2012 at 05:50 AM
I gave up on the Linden Lab SL clients about six months ago, when both the released version and the beta version were crashing my machine every five minutes. I am sadly not exaggerating here.
I reluctantly moved to Firestorm, accepting the trade off of not being able to trust the developers for a client that actually ran. So far I've been happy with it.
Unfortunately, I can't recommend Firestorm for you, because you have a SL business to run and a number of enemies who would happily take advantage of access to Firestorm code to screw with you.
I used to think LL might have made a mistake in open sourcing their client. Now, I have to wonder if someone at the company realized that making a client that consistently worked for the majority of their users was either beyond or abilities--or of no interest to them.
Posted by: Carl Metropolitan | July 20, 2012 at 05:51 AM
@Carl
I think they open sourced the client to get free labor from all those developers who gladly provided SL a variety of viewers. Open Source also fit the ideology of many of the Lindens back then. LLs corporate culture seems to be changing and becoming more professional but they (and we) have to live with all those early screwed up decisions.
Posted by: Amanda Dallin | July 20, 2012 at 10:11 PM
Just to add my two cents... Prokofy told me about this story yesterday. I had read Inara Pey's article which mentions the "strange behaviour" with some routers that Linden Lab admits that will be incompatible with future versions of their viewer, and I was quite curious about what exactly causes that incompatibility. I still remain curious, even more so these days, because it's not so easy to crash a router by piping traffic — HTTP traffic, at that! — through it. But apparently that's exactly what happens!
Now, I'm apparently not geeky enough to have anything against Cisco: I do own a Linksys router (now a Cisco brand), a WAG200G which has a label saying "Manufactured 05/2006" and another saying it's a "Home Gateway" — even though at the time I bought it, it used to be on the shelf for "small office/home office" appliances. I had previous good experiences with similar routers from Cisco/Linksys for the small office environment and never had any problem with them. The WAG200G is so old that Linksys doesn't offer more upgrades to its firmware (I think they stopped doing so in 2008 or 2009) but there is a small community of developers who continue to improve the firmware — which is open source, and, yes, as you guessed, it's just Linux, like the majority of routers these days. So in theory the difference between assembling your own self-made router and a shelf-based one is not the operating system: it's just that you'll have a larger box :) And of course you'll be able to buy whatever parts you wish for it. On the other hand, it's understandable that companies like Cisco or Belkin will assemble a router using whatever parts fit best into a tiny enclosure, and they're supposed to know what they're doing: I'm hardly convinced that they are going to pick broken parts for it! Also, I believe that most "self-built" small routers will use OpenWRT for their software (https://openwrt.org/) which can be installed on pretty much every small router out there — yes, even Belkin and Linksys ones! Larger hardware — i.e. a PC — can just run "normal" Linux on them, but I'd need an expert to tell me why "normal" Linux would avoid the kind of problems that "specialised, router-specific" Linux has. Tateru's comments regarding the difficulty of implementing certain protocols on routers might have applied to the ancient days when router manufacturers would develop their own operating systems, but, these days, everybody uses Linux. Everybody. The reason? Because Linux is free and *does* (allegedly) implement everything just right — and you can fit it into little memory (I think my Linksys has just 16 MBytes).
So, hmm, there is something lacking in this kind of argumentation which I'm probably missing. Again, I would find it very surprising that a router manufacturer would grab a copy of Linux — normal, specialised, embedded, whatever — deliberately change its networking subsystem to introduce "errors" and "badly supported protocols", and sell the package as a "home router", so that they could sell more expensive routers for "office users", using a flawless copy of Linux. I mean, it's not theoretically impossible to assume that, but I just find it hard to believe.
Nevertheless, my ancient WAG200G from Linksys also has problems connecting to SL. They're not as serious as what Prokofy reports, but they nevertheless exist: teleports failing, the connection being reset when logging in to a busy region, sudden spikes in traffic even though "nothing" is happening, way slow performance from the in-world browsers, and similar small (but annoying) faults. I never attributed this to the router, but just to something in SL. What exactly goes beyond me. I *can* look at the router's logs by logging in to it, and have caught it several times during the times it "hangs" when using SL, but never found anything remotely suspicious. Whatever is happening doesn't register on the logs. Strange, but true.
On the other hand, as Prokofy so well described, the SecondLife.log is crammed full with all sorts of errors. It has always been that way. It's the sole application with more errors on the log — even though Chrome comes as a close second. And, yes, Chrome also crashes the router sometimes, when loading some Flash games which I'm fond to play.
So mmh even though this goes against by better judgement, I can "believe" that the combination of certain software and the kind of traffic patterns it has can, indeed, crash a router (or reset it, or just make it drop connections, etc.). Oh, and it's not just a question of Wi-Fi — except for my old laptop, the rest of my home computers are all directly plugged into the router itself. So whatever is happening is not to blame on Wi-Fi, but on the combination of router and software. But I still find it very, very strange. There is not really much that can "go wrong" on a router to make it *crash* — specially if we're just talking about retrieving textures via HTTP. Which is 99% of what most routers do: retrieve web pages via HTTP. Lots of them. Specially on Javascript/AJAX-based, intensive websites like Facebook, Google+ and the like. This is what is supposed to be "normal behaviour": retrieving things via HTTP is what routers *do*. I find it incredibly strange that LL "suggests" to turn off HTTP retrieval and rely on UDP connections to avoid problems with the router: common sense would suggest exactly the reverse approach!
And it's a pity, really. The latest versions of the LL viewer have increased FPS performance on my outdated hardware threefold (like Prokofy, I use very low settings as well — no shadows, no fancy water, not even fancy WindLight sky, unless I really want to take a picture). I was relieved to think that I could continue to use my old computers for a few more years. Now apparently I will have to replace my router if I still wish to enjoy SL! That's... well, absurd :) The only good thing is that even a "small office" router will be cheaper than a new computer, even if it costs me US$200...
Posted by: Gwyneth Llewelyn | July 23, 2012 at 02:28 PM
Well, Gwyn, that's all very enlightening. I'd love to hear Belkin's side of the story. And some other Linux developer's side of the story outside the Lab.
Do tell me what kind of new router you buy.
Posted by: Prokofy Neva | July 23, 2012 at 08:49 PM
i use whatever modem/router my ISP gives me as part my package
that way if is a problem then when i call them up then they will tell me how to fix. they tell me that they optimise their end only for the modem/router they provide. if i use any other brand then they say sorry cant help. talk to manufacturer
at the moment i got a thomson speedtouch off them. it goes ok. is not a wireless one tho. just cable
Posted by: elizabeth (16) | July 23, 2012 at 10:32 PM
None of the problem routers run Linux. Linux needs a reasonable amount of RAM and Flash memory, in particular about 4MB of Flash and 8 to 16MB of RAM.
When trying to make routers cheaper to manufacture, the manufacturer is inclined to save on those components and give the router a lot less memory. Typical for such low end configurations is for example VxWorks RTOS. It's hard to say what is the culprit - i consider it unlikely to be VxWorks itself, but it could be the custom software written on top of it, or it could be just the memory contention from being crammed into too tight fit of a device.
Saying that the routers don't have a problem is obviously wrong. If you need to reset or power-cycle a router once a week in normal use, something is very distinctly wrong with its software, and it's very probable to make it behave two and a half orders of magnitude worse just by subjecting it to a higher load. Most likely, it simply leaks memory. In home use, this is usually of little regard because the connection is reset every 24h from the network end (at least it is in my country), allowing the router to free its memory. Newer routers can still be buggy but not require manual resetting, because they can restart themselves if they ever lock up.
A simple recipy to make an average router suffer temporary failure: get a bittorrent client. Disable uTP protocol, if supported. Crank up number of connections from default (usually around 20). Try to download something. Many routers start failing at 50 connections or less, better ones fail at 200.
It's possible to make many routers run more stably by disabling stateful firewall and other features. Also not putting heavy use on WLAN can improve router stability - though i'm not sure why - memory? CPU time?
Also, handshake when establishing a connection is a heavy process, much more difficult than maintaining a connection. Both routers and operating systems have a limit on a number of "half-open" connections, i.e. connections which are in the state of being established. Windows XP will cut down connections if the number of half-open exceeds, i think, 10. The limits have since been increased.
For this reason, in Singularity i tweaked texture fetch, allowing it to keep up to 32 connections, but only within bandwidth envelope (approximate), and never try to start more than a handful of connections simultaneously, staggering the process of establishing connections over time. This hasn't proven particularly perfect, but for many users better than LL's strategy. Again, there is a bit of bad layering between the systems, which means the head doesn't really know what the tail is doing, so we have to guesstimate how not to go over the limit. Unfortunately we got extra issues evident in recent release, i suspect from merging recent LL changes.
Also a bug has been found that SL viewer would do frequent DNS requests (aka name requests) which translate names of services or websites into IP addresses, for the names that the viewer should already know and has a right to cache. This also requires routers to reserve memory for a reply, and in known cases has lead to router crashes.
However, chances are, weak routers can be made work in the long run. The whole process of establishing and tearing down connections for every single inventory bit or texture is unnecessary. It's a leftover from the ancient standard which was already declared obsolete last century, HTTP 1.0. Most of the Internet supports newer (finalised 1999) HTTP 1.1, which allows to retain the same connection to satisfy successive requests, and allows the client to place a new request even before the old one has finished. Monty Linden is currently working on upgrading the complete SL infrastructure to support the newer protocol. The result will be that a much smaller number of connections will be sufficient for a high degree of bandwidth utilisation, and the problematic process of establishing and tearing down connections is eliminated.
Another major issue with HTTP fetch in Second Life is the service blatantly lying and when an error occurs, often times responding with junk and a success result code, and otherwise responding with invalid or wrong result codes. This is also something they might fix eventually, i hope. At least Monty said something along the lines of "i was just appalled as you are when i discovered this" about a particular bug we discovered independently, so it looks like he has his eye on these things.
Before, it didn't even make sense to say something, because no-one would listen, would have the expertise or inclination to listen. I'm not sure what to hold of Oz Linden's CV, because it puts him as a network solutions professional, and yet, all he seemingly did at LL was trying to herd nekos, make horrible colour palettes, and write trivial scripts.
Posted by: Siana Gearz | August 02, 2012 at 10:48 AM
But I'm a bit appalled that LL implemented HTTP 1.0. Ugh! I would have thought that to replace a mostly-UDP-based infrastructure, the first thing to do is to handle HTTP in the most persistent way possible. There are so many techniques for doing that, like long polling... but to the best of my knowledge, all these require the almost-15-years-old HTTP 1.1 protocol to be implemented. I hope Monty is seriously planning to upgrade their code to work with HTTP 1.1 well before the new 'developer version' becomes the 'stable' version :-P
Posted by: Gwyneth Llewelyn | August 04, 2012 at 06:06 AM
Gah! I lost almost everything from my other comment... hehe.
Oh well, it was very techy, and basically just saying that my Linksys router most definitely runs Linux, it has 16 MB of RAM (split among 14 MB for "memory", of which 1-1.5 MB are always free) and 2 MB of "disk" (100% full). Load average is between 0.1 and 0.2, briefly spiking to 0.4 when SL is launched, but then dropping back to 0.2 after a few minutes. There are few "extra" features turned on the router, except for a basic firewall (no complex filtering turned on). The processes taking the most amount of memory are those from UPnP, but I'm afraid to turn UPnP off — that usually gives some NAT problems with some specific applications, like video streaming and such.
So from the software side of things, the Linksys router is not catastrophically bad. Of course, it might have that problem with the VxWorks chip — which, no matter how good the software is, there might be no way to go around it. I actually mentioned this issue to some hardware geeks, running a shop for 25 years, and, pretty much like me, they were fond of recommending Linksys routers for "customers who don't want to have router problems, ever". They were as much surprised as I was to learn that there is now a replicable way of making a Linksys router drop an ADSL connection just by using software — SL in this case — which opens multiple HTTP connections. Surprised... and a bit incredulous.
But, alas, the evidence seems to be clear...
Posted by: Gwyneth Llewelyn | August 04, 2012 at 06:13 AM