Monday, October 17, 2005

Two Challenging Database problems

Here are two interesting problems using databases. I have given the table structures and the requirements too.

1. Cumulative Statistics Problem

In the game of cricket, we have several statistics. One of them is the runs scored per over. We have a table in a database in which we store two pieces of information.
  1. Over
  2. Runs
Now, we need to display 2 pieces of information
a. Runs scored per over
b. Runs scored in the last 5 overs.
We need data in 3 columns
  1. Over
  2. Runs this over
  3. Runs last 5 overs
How do we do this?
It is possible using only one SQL SELECT command. You may call the table Stats.

2. Insert cum update requirement

In the same situation above, 2 people do the task of recording the game statistics. One person manages the main database. The other person gives online information, which is stored in a temporary database (can be a temporary table instead - implementation is arbitrary).

For example, the DB manager stores information about 7 overs - overs 1-7. The recorder collects information about last few overs. The recorder may submit this information at any point of time, even between an over. But once he submits the runs of that over, he starts afresh.

For example, if 3 runs were scored in the first 3 balls of the 7th over, he sends this information - 3 runs in the 7th over.

Now, there is a drinks break. What he has already sent is of the over till now. Next, he starts collecting information again for the same (seventh) over.

Now, there are 5 runs in the last 3 balls of the 7th over. He notes down this information - 5 runs in the 7th over (the first 3 balls are already sent so he ignores the runs scored then). Next, he may start recording the 8th over. Now, if 6 runs are scored in this over, he will have this stat - 6 runs in the 8th over.

So he sends this information to the database manager - 6 runs in the 8th over. At this point, he sends information of the latest collection since the last sending. This is 5 runs in the remainder of the 7th over + 6 runs in the 8th over.

The database manager should be able to keep the database up to date. New records may be added for each new over. For the overs already recorded, the remainder scores may be aggregated and updated in the main database.

How would you manage this?

Tables in both database have the same format - (Over, Runs)

This is possible using the standard SQL commands like SELECT, INSERT, UPDATE

Answers:

1. The problem seems deceptively simple because it is stated so easily. The challenge is to do it in just one statement. In my solution, I have taken the same table twice, joining it with itself.

SELECT
s1.Over,
s1.Runs,
SUM(s2.Runs)
FROM
Stats s1
INNER JOIN Stats s2
ON s2.Over <= s1.Over AND s2.Over > s1.Over - 5
GROUP BY
s1.Over,
s1.Runs

2. Suppose the database names were DBMain and DBAux and the table names were Stats, one concept I would use is to aggregate the Runs values using SUM and GROUP BY Over to find the resultant table.

BEGIN TRANSACTION

INSERT INTO Temp(Over, Runs)
SELECT
Main.Over,
Main.Runs + Aux.Runs
FROM
DBAux.Stats Aux
INNER JOIN DBMain.Stats Main
ON Main.Over = Aux.Over

DELETE FROM DBMain.Stats
WHERE
Over
IN (
SELECT
Over
FROM
Temp
)

INSERT INTO DBMain.Stats
SELECT
Over,
Runs
FROM
Temp

END TRANSACTION

Of course, it may be possible that there is a less expensive way of doing the above. I tested the above using a MySQL database.

No comments: