20150605 /progress/

published 05 Jun 2015

I’m writing this for the second time, since it somehow disappeared the first time :(. That’ll teach me for trusting some chrome based markdown editor. stubbornly returns to sublime text 3

In recent times I look at the user-facing changes and and often disappointed in what I see. My own role in osu! has changed from being able to rapidly implement new features to only just being able to keep up with what is going on each day. The rare occasions I’m able to implement something new are therefore very enjoyable as a result, but I digress.

The ongoing tasks which I choose to complete myself rather than delegate remain that way for a reason. I believe each and every one of them requires my own attention. Delegating tasks is great and all, but there are two issues with it:

  • Delegating things that come up infrequently is often less efficient. An example would be emails about conventions, which are always different in nature. Sometimes people will want merchandise, sometimes they will want to promote osu! by presenting a panel. Teaching someone how to respond to these emails would require writing a manual of every possible scenario, or being able to trust their judgement, which leads on to the second issue..
  • I find it hard to trust people. Not trust as in confidentiality, but trust as in the ability to diligently handle a scenario in the same way I would myself. I am pretty sure I am yet to meet someone with the same values and level of attention-to-detail that I have. While this may sound like I’m bragging, if you ever run a company of your own you will understand this feeling.

Today was one of those days I spent catching up on mostly mundane run-of-the-mill tasks. You’re still interested? Okay then!

Today’s Worklog:

  • Fixed a loophole in the account recovery process that was abused overnight. Accounts can now be set into a state where users cannot initiate password recovery and must contact support.
  • Purged my inbox backlog down from 188 to 55. I read emails as the come in and when necessary reply within a minute or two, but less important emails are snoozed until a later date. I find it easier to process these in one large batch rather than as-they-come. This is what my inbox looks like now. Might take a few beers tomorrow to get through the remainder ;).
  • Brought company accounts up-to-date. Basically distributing money into the correct accounts so people are paid and servers are kept online and purring. I have an accountant to handle things at the end of the year, but daily financials are still my own responsibility. Need to attend to them once a week else they pile up quickly.
  • Got acceptance email for AWS Aurora preview. Not sure if I will use this going forward, but couldn’t resist the opportunity to test it for free! Spun up two beefy 16 core 108gb instances which I will be deploying the full osu! database to for performance testing over the coming weeks.
  • Figured out osu!keyboard stock issues. We needed to reserve a few keybaords for Taiko World Cup winners, but due a number of faulty units being found during initial quality checking, we ran a bit short.
  • Approved and finalised payment for the next osu!tablet production run, this time for 2,000 units! We still have stock from the last run too. I think I have finally got my head around the convoluted processes involved in international manufacturing and shipping, so stock shortages should be a thing of the past.
  • Our knowledge base is now fully stocked with content, hopefully helping users before they unnecessarily post to the forum!

So let’s briefly revisit something that happened last week: OVH had a downtime which lasted several hours as the result of a cable being cut. This meant the sudden loss of all our beatmap download mirror traffic! As I was “enjoying” an Owl City concert at the time, I had @nekodex - who you may also know due to his musical contributions to osu! - help me get a new mirror up and running.

osu! beatmap mirrors are deployed as a single file php script with no external dependencies. They use an s3 endpoint to retrieve maps, cache them locally and serve them to users using a secure checksum system. The script is not only self-containing, but updates can be pushed out in an automated process. Each mirror is monitored by the central osu! web server and mirrors which are unavailable or underperforming are automatically removed from the available list.

We had things back up and running in under 20 minutes as a result, but when things like this go haywire it is always wise to add countermeasures to ensure the same thing doesn’t happen again. I have started investigating where I will be adding some additional mirrors in order to add geographical redundancy to the system. As I don’t want to introduce any more complexity to the accounting side of the infrastructure setup of osu! (already across four datacenters!!) they will likely be placed in OVH’s European datacenters.

Going forward, I would love to open the mirroring system up to the public again, letting people lend their spare server bandwidth to the good of the osu! community. This would require some logic to ensure users were linked up with the closest mirror to their physical location, of course. Think a mini-CDN just for osu!. Alternatively maybe I should activate the dormant p2p code that has been sitting inside osu! for several years… ^^;

Have a good weekend. Play lots of osu!.

comments

20150604 /vocal/

published 04 Jun 2015

Many people say that it’s the vocal minorities that express the most negativity, and while this is definitely the case, I still aim for the unattainable 100% satisfaction. I hold myself to high standards that I have learned over the years I cannot expect from anyone else. This is actually one reason I am aiming to move osu! to an open-source model, but that story is for another day!

I go out of my way to answer every single email that comes to my personal address. The answers may be brief, but I take in every single word of feedback that users have to offer. I also check the forums, reddit, BBS and basically anywhere people are discussing osu!.

While some may say this is a complete waste of time and focus - and this may be true - I feel it’s something I cannot avoid if I want to keep up-to-date with the temperature of the community. Unique things happen almost every day, quite often which need multiple team members’ input to resolve, so it’s important to stay on the watch for ongoing and upcoming events.

Remember that osu! is actually a very simple game at its core. The gameplay has rarely changed over the years, while the ecosystem around that gameplay is constantly developing. The drive for more accessibility, more stats to track, better ways to communicate with friends is strong. The social and community aspects of osu! really make it feel to me less game, more web service.

Not to say we don’t have stuff in the pipeline to spice up the gameplay too ;).

Today’s Changes:

  • The forum search system now defaults to matching all keywords, rather than any. The sorting of results returned by the back-end (elasticsearch, for those playing along at home) has also been changed to relevance rather than freshness. Sorting from newest to oldest is still applied at the forum front-end as a last step. This should hopefully provide for much more relevant results. Once the new site is up, search will be overhauled and provided in a similar way to how you use the facebook search box. For now, let’s make do!
  • The help forum now shows only the “Chat with Support” button when someone is available. There are way too many posts in the help forum that could be answered in a single line (profile being in wrong mode, pp not updating etc.). While this is a band-aid measure while we get a full knowledge base system ready, it should help.
  • Began work on the osu!web (2.0) site for the first time in a long while. I have made a promise with myself to not add any new features to the old web, so the new username changing system is going to have to be done here.
  • Spent a good portion of last night getting my dev environment set up correctly for osu!web development. The new site is being written completely from scratch in Laravel 5. Learning a new framework as I go, already I already wrote the web store in it quite some months back.
  • In order to support username changes, which are going to be presented as an item on the store, we first need support for store items with infinite stock and custom display implementations. I made good progress on building the framework for this.

For those freaking out about the username change being a store items, here’s how it’ll work:

  • First Change: free (with supporter)
  • Second Change: $8 (equiv. 2 months supporter)
  • Third Change: $16 (equiv. 4 months supporter)
  • Fourth Change: $32 (equiv. 8 months supporter)

I’m sure you can see the pattern here. It will continue to double, likely with a cap at around $100. This may seem like an expensive purchase, but the idea here is to stop people from recklessly changing their name too many times. We discussed the cost amongst team members and decided doubling each time was the best approach to achieve this.

As for history, you will be able to see all previous usernames that a user was known as in their profile, as you’d expect.

early design mock-up

comments

20150603 /transparency/

published 03 Jun 2015

In 2007 when I first started publicly releasing osu!, I would release daily changelog updates and daily builds, with hugely visible changes and massive new features. Fast-forward to today, where finding the time and (more importantly) motivation to write a blog post is near impossible.

It’s not that I don’t want to. I just feel my time is beter spent elsewhere. Writing for me is something I enjoy, but it’s also a distraction from the never-ending list of tasks I have floating around my brain. I end up prioritising what to me feels most beneficial to the osu! user base.

But it’s time to take a step back.

In the recent months, there seems to be a group of users trying to destroy the osu! team’s image. They make it seem like we ignore support requests, break the game and are not interested in the community at large. While I could probably address this by meeting these people on their level (reddit seems to be the central campfire for bitching these days), I would rather stay away from such negativity.

At the same time, I want to give myself and the rest of the osu!team a stronger voice. I want to assure the user-base of osu! that we are making forward progress, addressing issues as they arise and in genearl doing a lot behind the scenes to keep things running smoothly that is (intentionally) hidden from players.

I’m going to start updating this blog with shorter posts focusing mostly on what I and the rest of the osu! team are up to on a day-to-day basis. Setting aside around 20 minutes a day to get something posted - even if it is very brief.

  • Updated username changing code to only free up usernames after all other checks have been completed (diallowed usernames etc.).
  • Installed new SSL certificate for *.ppy.sh (old one was expiring in 2 months).
  • Fixed “recent activity” not updating correctly for score events. This broke in a rewrite of the score submission code I completed yesterday.
  • FInished removing all usage of mdb2 in favour of mysqli.
  • Finished integration of a new payment system “¥Coins” which makes it much easier for Japanese users to support osu!.
  • Added a new column to all score tables to store whether a replay is available or not. SHould make things a bit more straightforward than the current system of storing this information in a separate table. Is a step forward to sending this out to the client to allow for more than top50 replays. Adding a single tinyint to the scores tables took a total of 9 hours! For people interested in how I’m doing it with no downtime, this is the answer.
  • Fixed beatmap playcounts not being updated anywhere near as often as they should have. This was actually a really silly one, and also means that our overall playcount (currently sitting at 3 billion) is heavily deflated. Here I was thinking we didn’t grow much recently as the time it took from 1 to 2 billion was the same as from 2 to 3, when in fact it was likely half or less.
  • Continued investigation into ongoing mysql performance issues (I’ll wall-of-text about this after I solve it), this time focusing on network IO. Added some extra config settings to bancho to allow dynamic disabling of the most network-intensive queries. Didn’t help.
  • Created a new monitoring dashboard for real-time monitoring. My default one provides a great overview of the last day, but I needed something I could have up on my second monitor while performing system maintenance or applying what could be breaking-changes.
  • Corrected some issues with username change history storage. This is something I recently added to chain into the ability to change usernames more than once. Coming soon!

Along with these kinds of bulleted lists, I will try to gradually expose more details on things going on behind the scenes that I believe you guys may find interesting. Any suggestions or questions are welcome in the comments section. I will either reply in to your comments or address them in the next day’s post.

Oh, and for those of you that enjoyed my osu!weekly posts focusing on news around the community, make sure you are following along with the osu!weekly news post series. This weekly posting by Tasha has basically replaced what I was trying to do previously.

comments

the last week of fail

published 29 Sep 2014

The last week has been quite a roller-coaster ride as far as keeping osu! above the water. In the interest of transparency I am writing up the various problems that arose over this period.

this kind of sums things up...

Issue #1

Internal IP address of main webserver was revealed. Onslaught of DDoS attacks hitting box directly rather than getting stopped at cloudflare. Digitalocean has a null route policy, which means that box is inaccessible for three hours after any DDoS hits. They also managed to find the bancho server IP, likely as it hasn’t changed for many years and was revealed in the distant past.

Daily DDoS attacks knocking osu! off the internet (and null-routing the web server).

null routes are annoying, but unavoidable

Resolution

Figure out how the IP was revealed. Found a few methods and patched them all:

  • PTR records (digitalocean makes these public based on hostname, so a scan of the digitalocean IP range searching for relevant hostnames is feasible). Moral of the story: never name your digitalocean droplets anything distinguishable (use random names).
  • Postfix. This is probably the silliest of all: the IP address was in mail headers, as all mail was sent from that server. To fix this I made a new relay server for mail which removes sensitive header information before performing the final send operation.
  • phpbb. When posting a forum post, phpbb was checking image dimensions with a call to getimagesize. This would run even on remote URLs, which meant the box’s IP was revealed. I removed this dimensions lookup as it wasn’t even being enforced.

To combat the DDoS attacks quickly, I ramped up around 10 web servers from a snapshot and had them waiting as hot backups should one of them get null-routed. This allowed for minimal downtime.

This meant reassigning new IP addresses to all services which may have previously been revealed.

I’m quite glad this came up, as patching the above security flaws makes me feel a lot more at ease, going forward.

Issue #2

Over the past few weeks, many of my digitalocean droplets were having sudden IO starvation, during which their IOPS would drop to zero for sometimes up to 5 minutes at a time.

Resolution

Digitalocean support suggested I redeploy the droplets, as my “old” droplets were running on an old version of their hypervisor code (and on older hardware). So I did this, which may have been my biggest - yet unavoidable - mistake.

Issue #3

Redeployed droplets were seeing very high and spiky steal% (yellow in the graph below), suggesting high host contention. I took the redeployment as an opportunity to upgrade the master database (32gb -> 64gb RAM, 12 -> 20 cores), but regardless of this it was performing so badly that the site would often come to a halt during peak.

yellow is stolen cpu time

Resolution

After a 10+ page back-and-forth with digitalocean I really didn’t get anywhere. The final solution was to keep redeploying until performance was satisfactory.

Even after redeploying I am still noticing spikes of steal%, which results in sudden CPU starvation. This is an ongoing issue to which I have no solution (although I do have some leads which I am investigating actively).

Keep in mind switching master database servers is not an easy task either. It requires synchronising multiple things happening at once: ensuring all slave servers are stopped at the same point in time (while the old master is in read-only mode); switching slaves to new master and ensuring they are still in sync with; updating configuration of all services reliant on the database; switching monitoring to understand the new database layout.

I had to do this master switch twice this week, which was a huge time-sink. I was able to partly automate the process along the way, which is a nice bonus.

Issue #4

A new kind of DDoS arose which was not being blocked by cloudflare. Someone was making use of a wordpress botnet to flood http requests at osu.ppy.sh. This totaled around 300mbit of incoming requests, which is enough to bring the most powerful of servers to a halt.

that's megabytes, not bits.

Resolution

Initial resolution was to switch cloudflare to “I’m under attack” mode, which forces every visitor’s browser to perform javascript computations before allowing access to the site. This required adding special rules to allow bancho and other services (which can’t perform javascript).

Longer term solution was to add filtering rules at an nginx level to avoid passing such bogus requests on to php workers. This reduces the bulk of the stress on the server allowing it to continue operation even under such an attack.

Update: I found out that this blocking can be done at a cloudflare level using a specific WAF rule.

Issue #5

When I was finally happy with a database deploy at digitalocean (lowish steal%), it went radio silent during peak one night, without any notice. Upon following up with digitalocean support, they said there was a problem with the hypervisor it was running on which in turn triggered a reboot of all droplets running on it.

But not only that, mine rebooted without a kernel, and thus had no networking (apparently). This happened due to another bug at DO’s end involving deploying from snapshots, but that is unimportant.

Resolution

This was completely out of my control. I’m waiting for follow-up on exactly how this happened from DO support, while also looking at my options to switch key infrastructure away (back?) to dedicated hosting, while leaving a hot backup at DO in case of failure. I will likely post an update if/when I decide to migrate to somewhere else.

So let me clarify: DigitalOcean are an amazing host. They offer computing power at prices which make sense, rather than inflated server rental rates that are oh-so-common in today’s market. They have been working very closely with me to overcome the aforementioned issues, going out of their way to do what they can. Their support is so far the most personal and expident of any datacenter I have tried (and I’ve been around..).

Much of what has occurred has not been their direct fault; mostly a series of unlucky events which happened to overlap. If I do decide to move away, it will not be moving away from DigitalOcean, but from cloud hosting in general (returning to self-managed infrastructure). I would not even consider another cloud provider due to the unrealistic costs.

At the end of the day, I would still recommend DigitalOcean. If you decide to give them a try, you’re welcome to use my referral link to help offset the costs of osu! servers (and gives you free credit too).

Let me also mention that while I felt that my infrastructure was robust to withstand such failures, I have determined a few areas which can be improved. And you can count on me to improve those areas.

comments

A quick update

published 28 Aug 2014

And so another month has passed. I probably don’t have as much to tell you as usual since I have been super-busy in real life, but it is all for a good cause! I have been working towards improving my working environment – and also allowing for expanding the osu! team – by renting a small office. This takes quite a bit of paperwork in Japan (especially as a foreigner) so it was quite a celebration for me to actually succeed in this.

Still in the process of moving stuff and getting well setup, but afterwards I should be able to livestream a whole heap more. We actually have a live camera of the office which you can view here. Keep in mind it will only be set to public at some times. I’ll likely tweet about it if we’re doing something interesting.

As for things on the osu! front:

  • The new update system is completed and mostly live. You can switch to it from the existing test build by clicking the little popup at the main menu.
  • Due to this system going live, the old test build is now officially decommissioned (even though you can still use it for the time being).
  • With this new system brings the ability to publish multiple update streams, including experimental ones (like an OpenGl only build which may be the future of osu!) for testing purposes. Look out for these in the near future!
  • I also managed to get osu! to “install” and update from a single executable, removing the need for the “osume.exe” updater. The result is quite magical!
  • Smoogipoo is beginning the rather huge rewrite of osu!mania to fix all the small issues that exist in the current implementation. He has made good progress on the key binding system and is working on skinning currently. The end result will be a new editor, a better working play mode and an overall better experience.
  • The new osu! website has gone through another iteration and is being designed actively. All I can say for now is that it looks amazing; if you saw the “old” new design then this one is just going to blow your mind!
  • I’ve spent a lot of my time fighting issues with infrastructure issues that are mostly not my fault and very hard to resolve. Things are still pretty stable, so I’d say I’m doing a good job even though you will never hear about it ;).
  • We have begun restructuring the team to resemble how the modding environment will be further down the line. This should hugely streamline the ranking process even before the complete new system is implemented.
  • The osu!idol karaoke contest is running again this year. Huge interest in this, so hurry if you want to take part in it!
  • I’m making good progress on restoring the osu!store and a stock of tablets. Expect to see availability again in early November, all going well. At least in time for the holiday season!
  • RBRat3 made a cool 3D version of the new osu! logo.
  • I learned that osu! can help with hearing loss.
  • Thanks to Rev3Games for an enjoyable trip along the history of the iNiS tapping series and the transition to osu!.

For those that missed it, I also answered quite a few questions in my previous post. Feel free to post more questions there if you have anything sensible to ask!

comments