20150603 /transparency/

published 03 Jun 2015

In 2007 when I first started publicly releasing osu!, I would release daily changelog updates and daily builds, with hugely visible changes and massive new features. Fast-forward to today, where finding the time and (more importantly) motivation to write a blog post is near impossible.

It’s not that I don’t want to. I just feel my time is beter spent elsewhere. Writing for me is something I enjoy, but it’s also a distraction from the never-ending list of tasks I have floating around my brain. I end up prioritising what to me feels most beneficial to the osu! user base.

But it’s time to take a step back.

In the recent months, there seems to be a group of users trying to destroy the osu! team’s image. They make it seem like we ignore support requests, break the game and are not interested in the community at large. While I could probably address this by meeting these people on their level (reddit seems to be the central campfire for bitching these days), I would rather stay away from such negativity.

At the same time, I want to give myself and the rest of the osu!team a stronger voice. I want to assure the user-base of osu! that we are making forward progress, addressing issues as they arise and in genearl doing a lot behind the scenes to keep things running smoothly that is (intentionally) hidden from players.

I’m going to start updating this blog with shorter posts focusing mostly on what I and the rest of the osu! team are up to on a day-to-day basis. Setting aside around 20 minutes a day to get something posted - even if it is very brief.

  • Updated username changing code to only free up usernames after all other checks have been completed (diallowed usernames etc.).
  • Installed new SSL certificate for *.ppy.sh (old one was expiring in 2 months).
  • Fixed “recent activity” not updating correctly for score events. This broke in a rewrite of the score submission code I completed yesterday.
  • FInished removing all usage of mdb2 in favour of mysqli.
  • Finished integration of a new payment system “¥Coins” which makes it much easier for Japanese users to support osu!.
  • Added a new column to all score tables to store whether a replay is available or not. SHould make things a bit more straightforward than the current system of storing this information in a separate table. Is a step forward to sending this out to the client to allow for more than top50 replays. Adding a single tinyint to the scores tables took a total of 9 hours! For people interested in how I’m doing it with no downtime, this is the answer.
  • Fixed beatmap playcounts not being updated anywhere near as often as they should have. This was actually a really silly one, and also means that our overall playcount (currently sitting at 3 billion) is heavily deflated. Here I was thinking we didn’t grow much recently as the time it took from 1 to 2 billion was the same as from 2 to 3, when in fact it was likely half or less.
  • Continued investigation into ongoing mysql performance issues (I’ll wall-of-text about this after I solve it), this time focusing on network IO. Added some extra config settings to bancho to allow dynamic disabling of the most network-intensive queries. Didn’t help.
  • Created a new monitoring dashboard for real-time monitoring. My default one provides a great overview of the last day, but I needed something I could have up on my second monitor while performing system maintenance or applying what could be breaking-changes.
  • Corrected some issues with username change history storage. This is something I recently added to chain into the ability to change usernames more than once. Coming soon!

Along with these kinds of bulleted lists, I will try to gradually expose more details on things going on behind the scenes that I believe you guys may find interesting. Any suggestions or questions are welcome in the comments section. I will either reply in to your comments or address them in the next day’s post.

Oh, and for those of you that enjoyed my osu!weekly posts focusing on news around the community, make sure you are following along with the osu!weekly news post series. This weekly posting by Tasha has basically replaced what I was trying to do previously.

comments

the last week of fail

published 29 Sep 2014

The last week has been quite a roller-coaster ride as far as keeping osu! above the water. In the interest of transparency I am writing up the various problems that arose over this period.

this kind of sums things up...

Issue #1

Internal IP address of main webserver was revealed. Onslaught of DDoS attacks hitting box directly rather than getting stopped at cloudflare. Digitalocean has a null route policy, which means that box is inaccessible for three hours after any DDoS hits. They also managed to find the bancho server IP, likely as it hasn’t changed for many years and was revealed in the distant past.

Daily DDoS attacks knocking osu! off the internet (and null-routing the web server).

null routes are annoying, but unavoidable

Resolution

Figure out how the IP was revealed. Found a few methods and patched them all:

  • PTR records (digitalocean makes these public based on hostname, so a scan of the digitalocean IP range searching for relevant hostnames is feasible). Moral of the story: never name your digitalocean droplets anything distinguishable (use random names).
  • Postfix. This is probably the silliest of all: the IP address was in mail headers, as all mail was sent from that server. To fix this I made a new relay server for mail which removes sensitive header information before performing the final send operation.
  • phpbb. When posting a forum post, phpbb was checking image dimensions with a call to getimagesize. This would run even on remote URLs, which meant the box’s IP was revealed. I removed this dimensions lookup as it wasn’t even being enforced.

To combat the DDoS attacks quickly, I ramped up around 10 web servers from a snapshot and had them waiting as hot backups should one of them get null-routed. This allowed for minimal downtime.

This meant reassigning new IP addresses to all services which may have previously been revealed.

I’m quite glad this came up, as patching the above security flaws makes me feel a lot more at ease, going forward.

Issue #2

Over the past few weeks, many of my digitalocean droplets were having sudden IO starvation, during which their IOPS would drop to zero for sometimes up to 5 minutes at a time.

Resolution

Digitalocean support suggested I redeploy the droplets, as my “old” droplets were running on an old version of their hypervisor code (and on older hardware). So I did this, which may have been my biggest - yet unavoidable - mistake.

Issue #3

Redeployed droplets were seeing very high and spiky steal% (yellow in the graph below), suggesting high host contention. I took the redeployment as an opportunity to upgrade the master database (32gb -> 64gb RAM, 12 -> 20 cores), but regardless of this it was performing so badly that the site would often come to a halt during peak.

yellow is stolen cpu time

Resolution

After a 10+ page back-and-forth with digitalocean I really didn’t get anywhere. The final solution was to keep redeploying until performance was satisfactory.

Even after redeploying I am still noticing spikes of steal%, which results in sudden CPU starvation. This is an ongoing issue to which I have no solution (although I do have some leads which I am investigating actively).

Keep in mind switching master database servers is not an easy task either. It requires synchronising multiple things happening at once: ensuring all slave servers are stopped at the same point in time (while the old master is in read-only mode); switching slaves to new master and ensuring they are still in sync with; updating configuration of all services reliant on the database; switching monitoring to understand the new database layout.

I had to do this master switch twice this week, which was a huge time-sink. I was able to partly automate the process along the way, which is a nice bonus.

Issue #4

A new kind of DDoS arose which was not being blocked by cloudflare. Someone was making use of a wordpress botnet to flood http requests at osu.ppy.sh. This totaled around 300mbit of incoming requests, which is enough to bring the most powerful of servers to a halt.

that's megabytes, not bits.

Resolution

Initial resolution was to switch cloudflare to “I’m under attack” mode, which forces every visitor’s browser to perform javascript computations before allowing access to the site. This required adding special rules to allow bancho and other services (which can’t perform javascript).

Longer term solution was to add filtering rules at an nginx level to avoid passing such bogus requests on to php workers. This reduces the bulk of the stress on the server allowing it to continue operation even under such an attack.

Update: I found out that this blocking can be done at a cloudflare level using a specific WAF rule.

Issue #5

When I was finally happy with a database deploy at digitalocean (lowish steal%), it went radio silent during peak one night, without any notice. Upon following up with digitalocean support, they said there was a problem with the hypervisor it was running on which in turn triggered a reboot of all droplets running on it.

But not only that, mine rebooted without a kernel, and thus had no networking (apparently). This happened due to another bug at DO’s end involving deploying from snapshots, but that is unimportant.

Resolution

This was completely out of my control. I’m waiting for follow-up on exactly how this happened from DO support, while also looking at my options to switch key infrastructure away (back?) to dedicated hosting, while leaving a hot backup at DO in case of failure. I will likely post an update if/when I decide to migrate to somewhere else.

So let me clarify: DigitalOcean are an amazing host. They offer computing power at prices which make sense, rather than inflated server rental rates that are oh-so-common in today’s market. They have been working very closely with me to overcome the aforementioned issues, going out of their way to do what they can. Their support is so far the most personal and expident of any datacenter I have tried (and I’ve been around..).

Much of what has occurred has not been their direct fault; mostly a series of unlucky events which happened to overlap. If I do decide to move away, it will not be moving away from DigitalOcean, but from cloud hosting in general (returning to self-managed infrastructure). I would not even consider another cloud provider due to the unrealistic costs.

At the end of the day, I would still recommend DigitalOcean. If you decide to give them a try, you’re welcome to use my referral link to help offset the costs of osu! servers (and gives you free credit too).

Let me also mention that while I felt that my infrastructure was robust to withstand such failures, I have determined a few areas which can be improved. And you can count on me to improve those areas.

comments

A quick update

published 28 Aug 2014

And so another month has passed. I probably don’t have as much to tell you as usual since I have been super-busy in real life, but it is all for a good cause! I have been working towards improving my working environment – and also allowing for expanding the osu! team – by renting a small office. This takes quite a bit of paperwork in Japan (especially as a foreigner) so it was quite a celebration for me to actually succeed in this.

Still in the process of moving stuff and getting well setup, but afterwards I should be able to livestream a whole heap more. We actually have a live camera of the office which you can view here. Keep in mind it will only be set to public at some times. I’ll likely tweet about it if we’re doing something interesting.

As for things on the osu! front:

  • The new update system is completed and mostly live. You can switch to it from the existing test build by clicking the little popup at the main menu.
  • Due to this system going live, the old test build is now officially decommissioned (even though you can still use it for the time being).
  • With this new system brings the ability to publish multiple update streams, including experimental ones (like an OpenGl only build which may be the future of osu!) for testing purposes. Look out for these in the near future!
  • I also managed to get osu! to “install” and update from a single executable, removing the need for the “osume.exe” updater. The result is quite magical!
  • Smoogipoo is beginning the rather huge rewrite of osu!mania to fix all the small issues that exist in the current implementation. He has made good progress on the key binding system and is working on skinning currently. The end result will be a new editor, a better working play mode and an overall better experience.
  • The new osu! website has gone through another iteration and is being designed actively. All I can say for now is that it looks amazing; if you saw the “old” new design then this one is just going to blow your mind!
  • I’ve spent a lot of my time fighting issues with infrastructure issues that are mostly not my fault and very hard to resolve. Things are still pretty stable, so I’d say I’m doing a good job even though you will never hear about it ;).
  • We have begun restructuring the team to resemble how the modding environment will be further down the line. This should hugely streamline the ranking process even before the complete new system is implemented.
  • The osu!idol karaoke contest is running again this year. Huge interest in this, so hurry if you want to take part in it!
  • I’m making good progress on restoring the osu!store and a stock of tablets. Expect to see availability again in early November, all going well. At least in time for the holiday season!
  • RBRat3 made a cool 3D version of the new osu! logo.
  • I learned that osu! can help with hearing loss.
  • Thanks to Rev3Games for an enjoyable trip along the history of the iNiS tapping series and the transition to osu!.

For those that missed it, I also answered quite a few questions in my previous post. Feel free to post more questions there if you have anything sensible to ask!

comments

ask me things

published 15 Aug 2014

I have an aversion to ask.fm, but I do know a lot of people out there have a lot of things they would like to ask. I plan on doing an AMA on reddit some day, but until then I’d like to leave this post here to gather questions in the comments which people would like answers to. I will post follow-up entries here answering the top-voted questions (or any I feel deserve an answer).

You’re welcome to ask any questions with no scope limits.

  • I will not answer questions that are stupid.
  • I will not answer questions from anonymous or invalid email addresses.
comments

A Quick Update

published 01 Aug 2014

What has happened in the last two months?

  • Had an amazing time at Japan Expo, meeting a huge number of French players and hopefully introducing as many new ones!
  • Gave an updated talk about the history of osu!, along with Q&A and live play in dual language (French and English). Watch it here!
  • I answered even more questions on episode 8 of osu!talk. Thanks to ztrot for having me on the show!
  • I received some open source mini-keyboard controllers for osu! from some chinese users. You can already make them yourself, but I also hope at some point we can offer these for sale at a low price.
  • osu! saw more development activity over the last month from people who aren’t me than ever before. This is very exciting to see, and makes me a little more confident that the osu! codebase isn’t in as bad a state as I perceive it to be!
  • Progress is being made towards an open source osu!. Piece by piece I am separating git repositories of various components so they can be released separately from each other as required.
  • We hit 200k likes on facebook. Hooray!
  • I completely rehauled the banning system behind the scenes to allow for more automation, as it was getting out-of-hand for us to handle manually. The results are promisingly good (or bad, in a sad way).
  • Work continues on a new update/release system which will allow for multiple release streams to exist. Users will be able to switch between stable/beta/cutting-edge. This will also allow for mgiration to dotnet40 while keeping a compatibility branch on dotnet20 while people migrate across.
  • A new game intro is in the works, including a long-overdue theme song. You may also notice that sound effects have been improved. This is all already live on test build but won’t be available on public for a while.
  • Download and update mirrors are centrally managed and traffic is automatically shifted as servers become available/unavailable. DNS changes are also automated via the Cloudflare API when server issues are detected, reducing downtime to only a couple of minutes.
  • I have been a bit busy with boring stuff like restructuring the way osu! is run as a business to make sure I can keep up with the ever-increasing workload. Trying to get more hands on board to get new features out to you guys faster than ever.
  • Huge kudos to Tom94 for taking my lead and rehaulling most of the song select code. The result is a more performant and slicker song select screen than ever before. And it’s only going to get better from here!
  • My sister made me an osu! stamp!
  • Someone used my design documents to make their own version of the osu!arcade unit!
  • We are running another fanart contest aiming to create a bunch of stickers, which may be used around the place in the future (both digitally and physically)!
  • Someone made a programming problem based on CtB.
  • People continue to be dishonest and unbelievably abusive.

I’m sure I’ve missed quite a bit here, but until next time. Follow me on twitter for more regular updates!

comments