The whole world is the greatest OpenStack cloud how repair " fusing " reach " ghostliness " flaw

The whole world is the greatest OpenStack cloud how repair " fusing " reach " ghostliness " flaw

European nucleon considers to organize (CERN) operation is worn the OpenStack cloud with the greatest whole world, ever must restart its are whole the cloud infrastructure with repair " fusing " and " ghostliness " flaw.

The whole world is the greatest OpenStack cloud how repair " fusing " reach " ghostliness " flaw

On January 3, CPU safe flaw " fusing " and " ghostliness " announce, initiate global IT user and cloud operation trade big clang. On May 24, peak of OpenStack of Wen Ge China is met on, operation person announced the process of flaw of their rehabilitate fusing, elaborated repair process to cost the account of a long time.

Respecting OpenStack, mandatory move is compared without which operation business on the world large strong child collision quickly device and accommodate have cloud of OpenStack of core of 300 thousand computation the CERN of infrastructure is bigger. Arne Wiebalck is in charge of the integral operation of CERN OpenStack cloud, when " fusing " and " ghostliness " the flaw of and so on appears, he had be toed make answer, the repair measure with relevant deploy.

He says: "CERN connects regular meeting to rest during winter vacation two weeks, so this thing or everybody know when the home rests. So this thing or everybody know when the home rests..

CERN has the group of safety of a special responsible network, the operation group of Wiebalck and this safety group cooperate, make clear together alleviate " fusing " and " ghostliness " the action that flaw risk place requires or measure.

We decide to close finally stop whole cloud to have repair.

In view of the dimensions of CERN OpenStack cloud, shut and repair will naturally a painful process. The group of Wiebalck must restart more than 30 thousand fictitious machine, connect complete the 1000 CERN cloud users of on 10 thousand should restart incident.

Our cloud has been being produced the line goes up in the environment those who make an appointment with 5 years is long, this or first time must close entirely really stop.

CERN is done not have of course one-time close entirely stop, it is however in number day grading executive rehabilitate, close to stop and restart process. CERN used the process of iteration, shut about 200 fictitious machine to manage a program to be examined in order to check a mistake at first whether restore smoothly.

Although CERN resembles shop of most large IT using automation process euqally, but involve " fusing " and " ghostliness " flaw rehabilitate and restart, a large number of hands of meantime use a process or must dispatch manpower will carry out and monitoring.

It is manpower really, it is OK that we have a tool of course communication and liaison hundreds machine, but the colleague that is I and me really is moved in more or less hand carry out these processes.

OpenStack infrastructure

Clarke Boylan is the project technology chief of OpenStack infrastructure project, the system that OpenStack software uses in be being used at compose to build global cloud puts in him 's charge to be in charge of. The Wiebalck with CERN is similar, he also must restart a large number of systems with repair " fusing " and " ghostliness " flaw.

Boylan says, member of OpenStack infrastructure group partook to repair go back to work to make, use Ansible to configure administrative technique to ensure already the kernel deploy of repair reachs the designated position.

We still allocated hand observes carefully, ensure the service is in go up afresh line when move with expectant pattern.

Besides " fusing " and " ghostliness " flaw, still pair of potential performance demote of the problem anxious, the group of Boylan tried to monitor to this problem likewise. The firstest item of OpenStack infrastructure group, be as soon as possible patch of deploy Linux kernel.

Further, project of OpenStack Nova computation developed personnel to add a new function, can strengthen the control that indicates to CPU function, the person that allow cloud operation can limit the visit of more critical to CPU part, alleviate the patch influence to function.

Experience lesson

To OpenStack secret of the OpenStack Barbican before resembling in community manages project technology chief, show Cisco engineer Dave McCowan for such person, "Fusing " and " ghostliness " the person that the problem gives the cloud operation attended one class.

Experience is taught a lesson even if want to be any accident incident ready-made. When consideration compose builds the cloud and program tool, should know you may need to go up from hardware repair or change the any in the system.

未经允许不得转载:News » The whole world is the greatest OpenStack cloud how repair " fusing " reach " ghostliness " flaw

相关推荐

    无相关信息