Gradwell Blog

Monthly Archive October, 2007

Service Update for w/c 29th October 2007

We would like to take the opportunity to update our customers on service delivery for the week commencing 29th October 2007.

In general, the level of service we provided was high and this is reflected in customer satisfaction. There are a few ongoing developments to update on.

Call Drops
We worked on the problem of intermittent OSPF flaps between our Juniper routers which are causing network traffic to re-route internally, and appears to cause audio to drop. We have isolated this problem to intermittently missing multi-cast traffic on one of our inter-site network links, and we have deployed a second link between Telehouse and Sovereign House to work around this network link.

We are continuing to monitor this issue.

Mail Delivery
We improved mail capacity, replaced our mail logging database and also increased outbound capacity on our relay.gradwell.net cluster. We have observed spare capacity during our peak running hours on several consecutive days.

We are continuing to work on our mail file servers, as we continue to experience some intermittent issues with crashes on lon-file-5 and lon-file-6. We did complete our initial deployment and testing of a clustered NFS file system using Redhat’s GFS, and hope to put this into production shortly.

Customer Support
Response times from customer support has been good this week and we have begun our new support rota, with our office now manned 8am to 8pm and 9-1 on Saturdays and increased oversight on Sunday. This has improved our ticket handing and response rate.

Conclusion
Overall service has been of a good quality, and we have made good progress on our outstanding issues.

Technical Manager / CTO

Gradwell dot com Limited has doubled in size every year for the last 3-4 years, and this growth, in terms of finance, staff and customer base has been driven through a well defined strategy of sales & marketing, good quality customer support and service delivery, but also a strong focus on product development and technical innovation.

The company now has approximately 20 staff, of which 7 are devoted to technical functions – software development, systems administration and 2nd line support. The need now arises for a Technical Manager/CTO.
Read the rest of this entry »

Service Update for w/c 15th October

There have been a number of faults experienced by customers using our email and phone services in the last week and we wanted to update customers as to the current status for resolving those. Many of these issues are related to our connection to both the internet and the traditional phone network (PSTN) and we are committed to fixing those supplier issues, and, if necessary, changing supplier where appropriate.

Call Drops
Customers are seeing phone calls dropping out and disconnecting mid-call. Unfortunately this problem is occurring randomly and despite pretty exhaustive investigations into the problem but have been unable to find a way to replicate the call drops so that we can start to narrow down the cause.

However, we have changed one of our internet transit suppliers and identified a number of ways to reduce the latency on our core network, and since that work was completed on Tuesday evening we believe the number of call drops has significantly reduced.

The only knock on effect was that we caused some capacity issues on part of our new internet transit supplier’s network on Friday morning, and they will have that fully resolved by Sunday.

Echo
Since Monday morning we started to get reports of echo on calls. This timing matches a change made to one of our PSTN carriers which has resolved problems with dtmf recognition and voicemail recording quality. We have reversed out the change made to fix the echo problem and are continuing to work on our carrier to resolve the other issues.

Phones De-Registering

We started on Monday to see reports of Phones randomly de-registering from our servers, which appear to be due to DNS lookups failing. We believe these are, in turn, related to the network latency problems. As well as fixing network latency, we have added additional DNS servers for our outbound proxy and this seems to have resolved all error messages on our proxy servers.

Switch Crash
We also experienced a crash of one of our main PSTN switches on Thursday afternoon. This was resolved within about 20 minutes and we are working with the manufacturer to understand the nature of these occasional crashes. They have also begun to commission our second PSTN switch, which we hope to enter into service in the next few weeks.

Mail delivery
Mail delivery has suffering large delays due to an increase in spam and particularly bounce notifications generated from other people sending spam and this has lead to overloading on our mail servers. We have, as previously announced, begun a refreshment of the hardware in our hosting platform, and this week migrated spam + virus scanning onto some of our new quad-core servers.

This did improve mail throughput, but has also just moved the bottle neck along. We are continuing our program of upgrades, although it will take several weeks to complete the entire refresh.

Specifically, this week we’ve added two more quad core servers to handle outgoing mail on relay.gradwell.net and incoming mail for customers. We are also adding extra servers to store and process bounce messages, speeding up the ability to remove those from the main mail queues. Finally, we’ve made a few further improvements to our queue management software, including reducing the levels of unnecessary disk access on our mail servers, which has speeded up the mail flow.

For the last couple of days our monitoring has shown that, under normal operation we are managing the load ok, and problems are only occurring when a queue builds up, so we are proactively monitoring to avoid that from occurring.

Customer Support
During this time our customer support staffs have been extremely busy and therefore we have experienced delays in dealing with enquiries. It would be very helpful if customers reporting faults could provide as much specific information as possible as this greatly reduces the delay in identifying and resolving the problem.

Conclusion
We’d like to thank customers for their forbearance during this challenging week. We have made good progress in getting on top of the various issues that have arisen.

We look forward to next week, where we will continue to work on our systems to ensure they are all operating correctly.

MySQL 5 Server

We are pleased to announce we have a mysql5 server available for customers. You can create databases via the control panel: https://hosting.gradwell.net/login/mysqladd?menu_req=86