WFTO – Weekly Falcon Test Overview 2009-04-17

WFTO – mid of April report

in my next reports I will interview my MySQL/Falcon team members. This week we start with
Vladislav Vaintroub, who is Russian but lives in Germany.  He is well known to speak at least
twenty-two foreign langauges, many of them English.

vlad_athens_2009

WFTO interview with Vladislav “Wlad” Vaintroub

Q: Hi Wlad. May you introduce yourself in two or
three sentences? Personally I am wondering why
your nick is Wlad instead of Vlad.

Wlad:
That’s simple : I use the IRC nick “wlad”, because when I joined MySQL “vlad”
was already taken by another MySQLer called Vladislav Safronov. In the past,
German government agencies gave me the name Wladislav and people knew me
as Wlad, during naturalization of the Cyrillic alphabet they changed that to Vladislav,
and now people know me as Vlad.

For myself I still like Владислав/Влад better and frankly do not care that much
about how it is latinized.

Q: In the last months we saw a lot of bugfixes
from you in the recovery area. Can you explain
Falcon’s recovery strategy a bit. I heard that
the recovery in Falcon has several distinct stages.

Wlad:
I am probably not the best person to ask about Falcon’s recovery as I did not design it.
All I have done is just fixing glitches in it for about three months, which is not really a
long time.

But if you want a simplistic view on it, then …

Falcon recovery is REDO-recovery – it has no UNDO phase. Falcon uses deferred updates
and until transaction commits and is completely recorded into log, nothing is changed
in its tablespaces.

There are 2 types of database log records “physical”: log creation of tablespaces,
page allocations and similar things, and “logical”: actual records and index entries.

The recovery is divided into 3 stages.

1) Determine what changes need to be reapplied.
An interesting detail in this process is something called “incarnations” in Falcon.
An illustration for this concept:

If a page N was allocated as index page, then freed during drop index and then
reallocated and reused as data page, then there are 3 different states of the
same page stored in the log (index, free, and data). Only the last state needs
to be considered, and some efficiency can be achieved by skipping prior
“incarnations” of this page.

2) Re-apply “physical” changes.
Pages are reallocated, and tablespaces recreated. Index pages are stored in and
restored from the log in “after split” state: this guarantees that Btrees are
valid after this phase. Pages with the last incarnation before the last checkpoint
are not recreated – it is a performance tweak.

3) Re-apply “logical” changes.
Store records and index entries in pages.

This stage is very similar to what Falcon does also during normal processing:
on commit, records and index entries found on different in-memory caches are
written to the log and background thread will bring them into tablespace pages.

Interesting to note is that recovery itself does not write log entries, and
that means changes done in phases 2 and 3 need to be idempotent to allow
recovery after a crashed recovery.

Q: I understand that you and Philip Stoev developed
an error injection mechanism to forcefully crash mysqld
at interesting parts of the code. You put some error
injections into Falcon’s recovery code to make critical
code paths even more robust. Can you explain us how the
error injection works, show us some code, and give us an
example about how one could crash Falcon at home?

Wlad:
Unfortunately I could not convince Philip to a debugger based scripting/automation:
set breakpoints on interesting places and then crash when breakpoints are reached.
So error injection was born. There is already a similar mechanism within the
MySQL core server based on DBUG macros (DBUG_EXECUTE_IF?)

Falcon’s one has some flexibility and adds a parameter and iteration counts to
injection points.

To see it in action, try in mysql client.

  CREATE TABLE t (i int) Engine Falcon;
  SET GLOBAL falcon_error_inject='type=SerialLogAppend,param=2,iterations=2';
  INSERT INTO t VALUES (0);
  INSERT INTO t VALUES (0);

the last query will crash with

  ERROR 2013 (HY000): Lost connection to MySQL server during query

The cryptic “type=SerialLogAppend,param=2,iterations=2” means: crash when serial
log entry with id 2 (srlCommit) is written for the second time. In the test we
insert twice in autocommit mode, causing two srlCommits and second one kills
the server.

Behind the scenes injection point is a simple counter that gets decremented on
each iteration if parameter matches and crashes when counter reaches 0.

Q: Thank you very much for your detailed answers!
Wlad: You’re welcome.

Now back to our usual report: it is mid of April and our team is now focused on fixing
recovery bugs for the next two months.

Since my last report from 2009-04-03 we fixed around 11 Falcon related bugs.
Compared to our last WFTO from 2009-04-03 we have three new tests and we deleted
one obsolete test.

The development of our failed/passed tests ratio over time looks like:

  • 0.40% – 1/253 (this report)
  • 1.20% – 3/249 (last report)
  • 0.80% – 2/249 (report before last one)

weekly_falcon_test_overview_2009-04-17

News for this week:

  • Ann Harrison fixed
    • Bug#39130 Unbounded serial log growth with online ALTER
    • Bug#44096 Exception: Recovery failed: corrupted serial log
  • Kevin Lewis fixed
    • Bug#38568 Falcon assertion in Record::release – useCount > 0
    • Bug#42185 Falcon crashes during large transaction, assert(source->useCount >= 2)
    • Bug#44161 Falcon crashes in MemMgr::blockSize(data.record)
    • Bug#44224 Falcon crash in Table::fetchNext
    • Bug#44233 Crash in Table::validateAndInsert when updating cardinalities
  • Jim Starkey fixed
    • Bug#39890 Falcon Error: page 0/1 wrong page type, expected 7 got 1
  • Vladislav Vaintroub fixed
    • Bug#42208 Falcon’s ORDER BY ..LIMIT gives wrong/inconsistent results on NULL values
    • Bug#43765 falcon_bug_22173a crashes in IndexPage::findNodeInLeaf
    • Bug#44114 Falcon recovery Error: page 5342/1 wrong page type, expected 5 got 511

The failing test case is

  • falcon_bug_22154-big (305 out of record cache memory issue)

What about you?

We are interested in you! Where do you use Falcon? What do you do with Falcon? Are there any features you want to see in Falcon? You can test Falcon and get famous by providing valuable bug reports or even test cases for Falcon!

Resources:

This entry was posted in MySQL and tagged , , , , , , . Bookmark the permalink.