"I See A Great Need": November 2005

2005-11-10

Why MS SQL Server is preferable to Oracle

Required background: RDBMS:1, ADO.NET:1

MS SQL Server (hereafter referred to as MSSQL) only runs on Windows, which means it shares the scalability limitations of Windows. It certainly isn't the best at utilizing the hardware resources at its disposal. Its bundled "Enterprise Manager" is mediocre at best (this may have improved in 2005, but it just came out and I haven't tried it). It doesn't even have an interactive SQL commandline tool.

Oracle runs on everything from Windows to Linux to Solaris. It scales from a single-processor workstation box to a cluster of multiprocessor Sun big iron servers. It doesn't really have bundled management tools, but there are lots of powerful third-party tools out there. It does have an interactive SQL commandline tool, and it squeezes performance out of hardware. It is astoundingly tunable, allowing a trained DBA to squeeze even greater performance out of it. So why would anyone prefer MSSQL to the powerhouse that is Oracle?

Of course, I'm deliberately neglecting the most important part: ease of use. What I mean by that is ease of use in development, rather than ease of deployment or administration. While it's valuable to be able to get the most out of hardware, and to be able to improve performance by adding hardware, it does no good if you don't have the applications to work with the data. As a database designer and developer who has worked with both Oracle and MSSQL, I can comfortably say that Oracle is a pain. As I said, I'm not talking about deployment or administration, I'm talking about design and development of DB-backed applications.

Consider this table (the relevant columns are .NET Framework, SQL Server, and Oracle). The MSSQL column types map pretty directly to the .NET types. Java or even C/C++ can be substituted for .NET, though it's unsurprising that MS designed .NET to integrate well with MSSQL. Oracle, on the other hand, doesn't even have a Boolean column type. Or an integer column type, actually, since with ODP.NET (Oracle's own ADO.NET driver) one must retrieve any numerical value (and INT is a synonym for NUMBER(10) in Oracle) as a .NET decimal.

From a DB designer's perspective, not having a Boolean column type is crippling. It is always preferable to have tables with column names and types that are as self-documenting as possible. Having to choose either NUMBER(1) or CHAR(1) to represent a Boolean value defeats that. First off, it's easy to choose inconsistently as enhancement and maintenance proceed through the years and through many hands. This choice can and should be documented somewhere along with other conventions of the system, but such external documents are often ignored or even lost. Second, what values represent true and false? Is it one and zero for a NUMBER(1)? Non-zero and zero? Positive and negative? Null and non-null (heaven forbid, and what if you actually need a null)? How about for CHAR(1)? 'Y' and 'N'? 'T' and 'F'? How about 'y' and 'n' or 't' and 'f'? '1' and '0'? Non-null and null (same issues as with NUMBER(1))? Sure, you document it and, ideally, wrap it behind a data access layer in code (which is what I did when I wound up working with an Oracle DB), but someone will eventually mess up ("I used the COUNT for the value in that stored procedure." "But we use positive and negative to represent true and false." "D'oh!") and cause obscure bugs that are difficult to track down.

From a developer's perspective, there's an important distinction between monetary or arbitrary precision values (decimal), integer values, and floating point values (double). Having these all represented in the database as NUMBER makes code harder to understand and, therefore, harder to enhance and maintain. Similar issues apply to the considerations about Booleans above.

I don't feel the need to cover integration of transactions with code (MSSQL has it, not sure if Oracle does), quality of errors messages (Oracle's are abysmal, MSSQL's are decent), or debugging integration (VS.NET has it with MSSQL, I don't know of any integration with Oracle). I also won't discuss the relative merits of PL/SQL (Oracle's procedural SQL extensions for stored procedures and such) and T-SQL (MSSQL's equivalent) other than to mention the ease with which one can return datasets (i.e. sets of rows just like an ordinary query) from a stored procedure in MSSQL and the hoops one must jump through to do the same in Oracle. (I've done it. It ain't pretty, and it requires still more of those externally documented conventions.) The lesson from all this is that MSSQL deserves a certain popularity due to its friendliness to developers. Likewise, Oracle's hostility to developers should give one pause before choosing it as part of a system solution.

Oracle's reputation for performance has kept it going for many years, but that same success has led them to concentrate their efforts on performance rather than on making it a better development platform. Java stored procedures are nice, sure, but .NET stored procedures are just as nice (supported in MSSQL 2005) and both .NET and Java have data types Oracle doesn't support. Of course, I would prefer to work with either DB2 or PostgreSQL over Oracle or MSSQL any day.

permalink posted by Gregory @ 07:40 0 comments

2005-11-04

The problem with upcoming video codecs

Required background: Video codecs:1, Algorithms:1

An important aspect of computer algorithms is resource tradeoffs. In general, algorithms are about processing input data to produce output data. The obvious resources are the time/effort it takes to process, the accuracy of the output data, and the storage space required for the input data, the running algorithm, and the output data. Of course, tradeoffs only make sense in the context of comparing algorithms.

Consider the simple example of two sorting algorithms, merge sort and quicksort. Merge sort requires twice the runtime storage that quicksort does. On the other hand, it produces slightly more accurate results in that it is "stable" (i.e. the input order of items with equal key values is preserved in the output). Mergesort also has a guaranteed running time whereas quicksort's running time is data-dependent.

One chooses an algorithm based on what tradeoffs are acceptable/desirable for the particular application. If sorting stability is important in a given application, quicksort's tradeoffs are not acceptable. If sorting stability is not a requirement but the data to be sorted is nearly as large as available storage, merge sort's tradeoffs are not acceptable. This brings us to codecs in general and video codecs in particular.

Even more specifically, let's consider DVDs. The audio and video on a DVD is encoded as MPEG-2 (we'll ignore the encryption scheme for now). It is a lossy compression codec, meaning the accuracy of the output is guaranteed to be lower than the input. In fact, the encoding algorithm has a tunable tradeoff between accuracy of output and size of output.

For encoding, the time/effort, input storage, and runtime storage it takes are largely irrelevant since it is done once to prepare the DVD master and never again. Decoding is constrained by the available storage on a DVD (though, certainly, many movies are released to DVD with supplemental discs containing extras and such, it is pretty important that the movie itself fits on a single disc) and the processing power that can be made available in a consumer electronics component (the more processing power that is required, the higher the price of the device, not to mention power consumption and heat dissipation, thus economic realities set limits on the processing power available for decoding). The MPEG2 codec (and the DVD media format) had to be developed to have tradeoffs that satisfied these constraints.

Economic constraints change, however, and they have. The DVD media format is showing its age and there are new formats on the horizon (HD-DVD and Blu-Ray, in particular). Furthermore, there is reason to believe that in the near future video will be stored on hard drives rather than optical discs and will be delivered there over an internet connection. Typical bandwidth to the American home isn't up for speedy delivery of even MPEG2-encoded video right now, but this can be expected to improve as WiMax networks, fiber to the home, and other technologies (maybe broadband over power lines) become widely available. Meanwhile, though the cost of processing power has gone down (as it always does), there is increasing desire to view video on battery-operated devices (e.g. video iPod, cellphones, laptops, portable all-in-one DVD players). Processing power translates directly into electrical power consumption, and portable energy density isn't improving quickly. Storage, however, even storage that can be run energy efficiently (e.g. flash memory rather than 7200 or 5400 RPM hard disks), is getting cheaper and larger.

This points to a need for a different set of tradeoffs. The accuracy of MPEG2-encoded video is almost certainly sufficient for portable applications, but there is less need to restrict the encoded size of the video and more need to restrict the processing requirements for decoding it. This does not seem to be the direction of video codecs, however.

The newer video codecs, such as MPEG4, H.264 (which is part 10 of MPEG4), and Windows Media (which also involves part of MPEG4), are all about better output accuracy with lower output storage at the expense of higher processing requirements for decoding (as compared to MPEG2). This makes sense when considering yesterday's economic realities, but not tomorrow's. The video iPod plays H.264 videos. It's battery life when playing videos is advertised as 2 hours, as opposed to 14 hours when just playing music. It is unsurprising that decoding video takes more effort than decoding audio, and changing the LCD screen 30 times/second while keeping its backlight on certainly accounts for some of the power consumption, but should it really take seven times the power?

This isn't the only time that technology has been developed that does not serve the needs of the future or even the present, and it won't be the last. It is something to learn from, however.

permalink posted by Gregory @ 09:30 0 comments

"I See A Great Need"