<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Temporal Keys, Part 1</title>
	<atom:link href="http://thoughts.j-davis.com/2009/11/01/temporal-keys-part-1/feed/" rel="self" type="application/rss+xml" />
	<link>http://thoughts.j-davis.com/2009/11/01/temporal-keys-part-1/</link>
	<description>Ideas on Databases, Logic, and Language by Jeff Davis</description>
	<lastBuildDate>Sat, 10 Jul 2010 14:36:49 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
	<item>
		<title>By: Max Bonham</title>
		<link>http://thoughts.j-davis.com/2009/11/01/temporal-keys-part-1/comment-page-1/#comment-219</link>
		<dc:creator>Max Bonham</dc:creator>
		<pubDate>Mon, 07 Jun 2010 19:26:01 +0000</pubDate>
		<guid isPermaLink="false">http://thoughts.j-davis.com/?p=171#comment-219</guid>
		<description>Some webmasters try to apply articles from free message directories to make visitors to their website and make some money. This is mostly powerful for those who have only begun working as an affiliate for individual companions and do not even have any funding, yet need to develop small niche sites to visitors to their site so that they can begin making gross.</description>
		<content:encoded><![CDATA[<p>Some webmasters try to apply articles from free message directories to make visitors to their website and make some money. This is mostly powerful for those who have only begun working as an affiliate for individual companions and do not even have any funding, yet need to develop small niche sites to visitors to their site so that they can begin making gross.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeff Davis</title>
		<link>http://thoughts.j-davis.com/2009/11/01/temporal-keys-part-1/comment-page-1/#comment-179</link>
		<dc:creator>Jeff Davis</dc:creator>
		<pubDate>Sun, 07 Mar 2010 19:02:26 +0000</pubDate>
		<guid isPermaLink="false">http://thoughts.j-davis.com/?p=171#comment-179</guid>
		<description>&lt;blockquote&gt;I worked on one project where the data load on the servers was so high, they avoided most constraints except maybe for primary keys. &lt;/blockquote&gt;

As I said in the post, constraints are always enforced, it&#039;s just a matter of when, and at what cost. After all, it&#039;s a business constraint, and would be present even if computers didn&#039;t exist.

In that particular application, it&#039;s possible that they knew in advance that many of the constraints would not be violated, or at least the conditions under which they would be violated. That&#039;s a lot of work to prove though -- it&#039;s so much easier to just declare it to the DBMS.

&lt;blockquote&gt;I prefer the DB to do the work.&lt;/blockquote&gt;

I prefer that the DBMS does the work as well, because that usually has the lowest total cost of enforcement.</description>
		<content:encoded><![CDATA[<blockquote><p>I worked on one project where the data load on the servers was so high, they avoided most constraints except maybe for primary keys. </p></blockquote>
<p>As I said in the post, constraints are always enforced, it&#8217;s just a matter of when, and at what cost. After all, it&#8217;s a business constraint, and would be present even if computers didn&#8217;t exist.</p>
<p>In that particular application, it&#8217;s possible that they knew in advance that many of the constraints would not be violated, or at least the conditions under which they would be violated. That&#8217;s a lot of work to prove though &#8212; it&#8217;s so much easier to just declare it to the DBMS.</p>
<blockquote><p>I prefer the DB to do the work.</p></blockquote>
<p>I prefer that the DBMS does the work as well, because that usually has the lowest total cost of enforcement.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Craig S.</title>
		<link>http://thoughts.j-davis.com/2009/11/01/temporal-keys-part-1/comment-page-1/#comment-178</link>
		<dc:creator>Craig S.</dc:creator>
		<pubDate>Sun, 07 Mar 2010 10:08:32 +0000</pubDate>
		<guid isPermaLink="false">http://thoughts.j-davis.com/?p=171#comment-178</guid>
		<description>This is exactly what I’ve been searching all day. I should have found your post faster.</description>
		<content:encoded><![CDATA[<p>This is exactly what I’ve been searching all day. I should have found your post faster.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: BillR</title>
		<link>http://thoughts.j-davis.com/2009/11/01/temporal-keys-part-1/comment-page-1/#comment-174</link>
		<dc:creator>BillR</dc:creator>
		<pubDate>Thu, 21 Jan 2010 02:50:46 +0000</pubDate>
		<guid isPermaLink="false">http://thoughts.j-davis.com/?p=171#comment-174</guid>
		<description>I am waiting on this, as I find this kind of issue never stops coming up, no matter the project. A note on doing the checks in the business logic: I worked on one project where the data load on the servers was so high, they avoided most constraints except maybe for primary keys. And even on primary key columns they sometimes did not even declare those, preferring to just index them as the time taken to validate uniqueness, given the transaction volume, was prohibitive. And we&#039;re talking DBs running on supercomputers! Just thought someone might find this perspective interesting. Personally I am very glad someone is working on this. I prefer the DB to do the work. :)</description>
		<content:encoded><![CDATA[<p>I am waiting on this, as I find this kind of issue never stops coming up, no matter the project. A note on doing the checks in the business logic: I worked on one project where the data load on the servers was so high, they avoided most constraints except maybe for primary keys. And even on primary key columns they sometimes did not even declare those, preferring to just index them as the time taken to validate uniqueness, given the transaction volume, was prohibitive. And we&#8217;re talking DBs running on supercomputers! Just thought someone might find this perspective interesting. Personally I am very glad someone is working on this. I prefer the DB to do the work. <img src='http://thoughts.j-davis.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeff Davis</title>
		<link>http://thoughts.j-davis.com/2009/11/01/temporal-keys-part-1/comment-page-1/#comment-123</link>
		<dc:creator>Jeff Davis</dc:creator>
		<pubDate>Mon, 02 Nov 2009 18:21:10 +0000</pubDate>
		<guid isPermaLink="false">http://thoughts.j-davis.com/?p=171#comment-123</guid>
		<description>&lt;blockquote&gt;we now need some external mechanism to represent the constraint that ranges may not overlap. The system can’t do it for us without some new machinery. Or at least, not terribly efficiently.&lt;/blockquote&gt;

Be patient, and keep an eye out for the next release of PostgreSQL ;)

&lt;blockquote&gt;We also need a somewhat rarer kind of index to allow us to actually *efficiently* get answers about what name goes with a given location.&lt;/blockquote&gt;

You can already do that in PostgreSQL. For periods of time, you can use my PERIOD data type (which has GiST indexing support), and do searches on things like &quot;contained by&quot; or &quot;overlaps&quot;.

http://pgfoundry.org/projects/temporal/</description>
		<content:encoded><![CDATA[<blockquote><p>we now need some external mechanism to represent the constraint that ranges may not overlap. The system can’t do it for us without some new machinery. Or at least, not terribly efficiently.</p></blockquote>
<p>Be patient, and keep an eye out for the next release of PostgreSQL <img src='http://thoughts.j-davis.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
<blockquote><p>We also need a somewhat rarer kind of index to allow us to actually *efficiently* get answers about what name goes with a given location.</p></blockquote>
<p>You can already do that in PostgreSQL. For periods of time, you can use my PERIOD data type (which has GiST indexing support), and do searches on things like &#8220;contained by&#8221; or &#8220;overlaps&#8221;.</p>
<p><a href="http://pgfoundry.org/projects/temporal/" rel="nofollow">http://pgfoundry.org/projects/temporal/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeff Davis</title>
		<link>http://thoughts.j-davis.com/2009/11/01/temporal-keys-part-1/comment-page-1/#comment-121</link>
		<dc:creator>Jeff Davis</dc:creator>
		<pubDate>Mon, 02 Nov 2009 18:16:54 +0000</pubDate>
		<guid isPermaLink="false">http://thoughts.j-davis.com/?p=171#comment-121</guid>
		<description>You bring up a lot of great issues.

&lt;blockquote&gt;Next, what if we represent the range as a pair of columns? ... Now we have a different set of problems.&lt;/blockquote&gt;

You&#039;re absolutely right, this has serious problems. The exclusivity is one, the searching is another, and constraints are the problem I was concerned with in the article.

&lt;blockquote&gt;does anybody know of a good system for handling missing date information&lt;/blockquote&gt;

I recommend C.J. Date&#039;s &quot;Temporal Data and the Relational Model&quot;. I think it will clarify a lot of these issues; I know it did for me.</description>
		<content:encoded><![CDATA[<p>You bring up a lot of great issues.</p>
<blockquote><p>Next, what if we represent the range as a pair of columns? &#8230; Now we have a different set of problems.</p></blockquote>
<p>You&#8217;re absolutely right, this has serious problems. The exclusivity is one, the searching is another, and constraints are the problem I was concerned with in the article.</p>
<blockquote><p>does anybody know of a good system for handling missing date information</p></blockquote>
<p>I recommend C.J. Date&#8217;s &#8220;Temporal Data and the Relational Model&#8221;. I think it will clarify a lot of these issues; I know it did for me.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeff Davis</title>
		<link>http://thoughts.j-davis.com/2009/11/01/temporal-keys-part-1/comment-page-1/#comment-120</link>
		<dc:creator>Jeff Davis</dc:creator>
		<pubDate>Mon, 02 Nov 2009 18:05:31 +0000</pubDate>
		<guid isPermaLink="false">http://thoughts.j-davis.com/?p=171#comment-120</guid>
		<description>&lt;blockquote&gt;The client could enforce it, but life is so much easier when the database does it for us.&lt;/blockquote&gt;

That&#039;s also a good point. Constraints should be declarative, and in the DBMS, to avoid mistakes and to be another line of defense against application bugs.</description>
		<content:encoded><![CDATA[<blockquote><p>The client could enforce it, but life is so much easier when the database does it for us.</p></blockquote>
<p>That&#8217;s also a good point. Constraints should be declarative, and in the DBMS, to avoid mistakes and to be another line of defense against application bugs.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jeff Davis</title>
		<link>http://thoughts.j-davis.com/2009/11/01/temporal-keys-part-1/comment-page-1/#comment-119</link>
		<dc:creator>Jeff Davis</dc:creator>
		<pubDate>Mon, 02 Nov 2009 18:02:22 +0000</pubDate>
		<guid isPermaLink="false">http://thoughts.j-davis.com/?p=171#comment-119</guid>
		<description>User 1:
 checks for conflict -- no conflict
User 2:
 checks for conflict -- no conflict
User 1:
 inserts [2009-01-01, 2009-01-03)
User 2:
 inserts [2009-01-02, 2009-01-04)

Now you have a problem. If you don&#039;t care much about performance, you can serialize access to the table, so that user 2 has to wait for user 1. But if you do care about performance, you don&#039;t want to make user 2 wait for user 1 unless there is a real potential for conflict.</description>
		<content:encoded><![CDATA[<p>User 1:<br />
 checks for conflict &#8212; no conflict<br />
User 2:<br />
 checks for conflict &#8212; no conflict<br />
User 1:<br />
 inserts [2009-01-01, 2009-01-03)<br />
User 2:<br />
 inserts [2009-01-02, 2009-01-04)</p>
<p>Now you have a problem. If you don&#8217;t care much about performance, you can serialize access to the table, so that user 2 has to wait for user 1. But if you do care about performance, you don&#8217;t want to make user 2 wait for user 1 unless there is a real potential for conflict.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: J. Prevost</title>
		<link>http://thoughts.j-davis.com/2009/11/01/temporal-keys-part-1/comment-page-1/#comment-116</link>
		<dc:creator>J. Prevost</dc:creator>
		<pubDate>Mon, 02 Nov 2009 15:59:46 +0000</pubDate>
		<guid isPermaLink="false">http://thoughts.j-davis.com/?p=171#comment-116</guid>
		<description>Agh.  Less-than-sign-html-thing bit me.  Re-reading this, I talked way too much, but here is the missing portion (from x in the above comment to the final paragraph that was included there) for completeness.

*less than* max_location).  There&#039;s also a bit of a semantic issue: I&#039;ve assumed a half-open interval here.  One that includes the minimum location but does not include the maximum.  This difference is not major for integers... as long as you decided up front which method you&#039;re going to use.  It will be a problem in a moment, though.  Finding out the range of locations for a name is fairly simple (select min_location, max_location from pair where name = x).  And the representation is nicely compact.

The final down side is: we now need some external mechanism to represent the constraint that ranges may not overlap.  The system can&#039;t do it for us without some new machinery.  Or at least, not terribly efficiently.  We also need a somewhat rarer kind of index to allow us to actually *efficiently* get answers about what name goes with a given location.

So we&#039;ve won some efficiencies at the expense of formalism.  We have, however, also lost some other efficiencies.  That&#039;s a pretty reasonable trade-off.  But what did we really lose by going away from the formalism?

Next step: real numbers.

create table real_pair (
    name text unique not null,
    min_location real not null,
    max_location real not null
);

Now we have a more significant problem with the &quot;half-open interval&quot; thing.  With real numbers, if we choose &quot;half-open intervals&quot; as our standard, there is absolutely no way to represent the range from 1.0 to 3.0 inclusive.  It&#039;s got to be 1.0 to 3.0 leaving out 3.0.  That may or may not be a problem depending on your application.

What if we were to try to use the original solution with reals?

create table real_simple (
    name text not null,
    location real unique not null
);

Whoa.  Now here&#039;s a real problem.  You can&#039;t represent ranges here at all, because they contain an infinite number of locations.  That&#039;s... rather not good.



Anyway, what this is all leading up to is this thought: the underlying relation involved in these examples is really one in which there are multiple (even infinite) points in the relation, representing a single conceptual entity.  We represent this as ranges or as points in a constrained space (imagine that you only schedule on hourly intervals, and integers are hours) purely for the purpose of efficiency.

But the most natural way to work with these relations is to imagine a system in which a single row can represent multiple (possibly infinite) points in the relation.  Then you could have this definition:

create table real_multiple (
    name text not null,
    location real unique multiple not null
);

The idea here is that one row in the database represents a set of rows in the underlying relation.  Potentially this could represent a mixture of ranges and points, or ranges with points torn out of the middle, or various kinds of intervals, or...  A lot of things.  Even more, it gives some way to think about geometry as well:

create table rect_multiple (
    name text unique not null,
    x real multiple not null,
    y real multiple not null,
    unique (x, y)
);

Of course it all gets complicated, but it&#039;s interesting.  And it results in additional kinds of constraints.  Like &quot;contiguous&quot;, for example.

Anyway, that&#039;s my take on the underlying problem.  I&#039;ll be very interested to hear the more practical ideas that are actually getting implemented.  ;&gt;</description>
		<content:encoded><![CDATA[<p>Agh.  Less-than-sign-html-thing bit me.  Re-reading this, I talked way too much, but here is the missing portion (from x in the above comment to the final paragraph that was included there) for completeness.</p>
<p>*less than* max_location).  There&#8217;s also a bit of a semantic issue: I&#8217;ve assumed a half-open interval here.  One that includes the minimum location but does not include the maximum.  This difference is not major for integers&#8230; as long as you decided up front which method you&#8217;re going to use.  It will be a problem in a moment, though.  Finding out the range of locations for a name is fairly simple (select min_location, max_location from pair where name = x).  And the representation is nicely compact.</p>
<p>The final down side is: we now need some external mechanism to represent the constraint that ranges may not overlap.  The system can&#8217;t do it for us without some new machinery.  Or at least, not terribly efficiently.  We also need a somewhat rarer kind of index to allow us to actually *efficiently* get answers about what name goes with a given location.</p>
<p>So we&#8217;ve won some efficiencies at the expense of formalism.  We have, however, also lost some other efficiencies.  That&#8217;s a pretty reasonable trade-off.  But what did we really lose by going away from the formalism?</p>
<p>Next step: real numbers.</p>
<p>create table real_pair (<br />
    name text unique not null,<br />
    min_location real not null,<br />
    max_location real not null<br />
);</p>
<p>Now we have a more significant problem with the &#8220;half-open interval&#8221; thing.  With real numbers, if we choose &#8220;half-open intervals&#8221; as our standard, there is absolutely no way to represent the range from 1.0 to 3.0 inclusive.  It&#8217;s got to be 1.0 to 3.0 leaving out 3.0.  That may or may not be a problem depending on your application.</p>
<p>What if we were to try to use the original solution with reals?</p>
<p>create table real_simple (<br />
    name text not null,<br />
    location real unique not null<br />
);</p>
<p>Whoa.  Now here&#8217;s a real problem.  You can&#8217;t represent ranges here at all, because they contain an infinite number of locations.  That&#8217;s&#8230; rather not good.</p>
<p>Anyway, what this is all leading up to is this thought: the underlying relation involved in these examples is really one in which there are multiple (even infinite) points in the relation, representing a single conceptual entity.  We represent this as ranges or as points in a constrained space (imagine that you only schedule on hourly intervals, and integers are hours) purely for the purpose of efficiency.</p>
<p>But the most natural way to work with these relations is to imagine a system in which a single row can represent multiple (possibly infinite) points in the relation.  Then you could have this definition:</p>
<p>create table real_multiple (<br />
    name text not null,<br />
    location real unique multiple not null<br />
);</p>
<p>The idea here is that one row in the database represents a set of rows in the underlying relation.  Potentially this could represent a mixture of ranges and points, or ranges with points torn out of the middle, or various kinds of intervals, or&#8230;  A lot of things.  Even more, it gives some way to think about geometry as well:</p>
<p>create table rect_multiple (<br />
    name text unique not null,<br />
    x real multiple not null,<br />
    y real multiple not null,<br />
    unique (x, y)<br />
);</p>
<p>Of course it all gets complicated, but it&#8217;s interesting.  And it results in additional kinds of constraints.  Like &#8220;contiguous&#8221;, for example.</p>
<p>Anyway, that&#8217;s my take on the underlying problem.  I&#8217;ll be very interested to hear the more practical ideas that are actually getting implemented.  ;&gt;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: J. Prevost</title>
		<link>http://thoughts.j-davis.com/2009/11/01/temporal-keys-part-1/comment-page-1/#comment-115</link>
		<dc:creator>J. Prevost</dc:creator>
		<pubDate>Mon, 02 Nov 2009 15:56:06 +0000</pubDate>
		<guid isPermaLink="false">http://thoughts.j-davis.com/?p=171#comment-115</guid>
		<description>It seems to me that the logical underpinning of schedule conflicts is the idea that a single row can represent multiple items (possibly infinite) in the underlying set.

Let&#039;s think about integers instead of dates for a moment.  Start with a simple table:

create table single (
    name text unique not null,
    location integer unique not null
);

Here we have a relation between one integer and one name.  No name can refer to more than one integer.  No integer can have more than one name.  I am intentionally not picking either one out as a primary key here.

The next step is to allow names to refer to sets of integers:

create table multiple (
    name text not null,
    location integer unique not null
);

Now each integer can belong to only one name, but a name can refer to multiple integers.  Two things to note about this: there is no way to declare a constraint, should we wish to, that all of the integers for a given name be contiguous.  Neither is this representation necessarily efficient for a system that&#039;s trying to do that.

Querying to find the name for a location is very straightforward (select * from multiple where location = x).  Getting a set of all of the locations is straightforward as well (select location from multiple where name = x).  Finding the bounds of the range is not straightforward, if it is represented in this manner.

Next, what if we represent the range as a pair of columns?

create table pair (
    name text not null,
    min_location integer not null,
    max_location integer not null
);

Now we have a different set of problems.  It is not somewhat more complicated to query for the name given a location (select name from pair where min_location &lt;= x 


(A final note on temporal data... does anybody know of a good system for handling missing date information?  I&#039;m not talking about &quot;don&#039;t know the date&quot;, I mean &quot;don&#039;t have full date information&quot;.  Like, for a death date, year of death is known, but month and day is not.  Or year and month are known, but day is not.  Or more strangely, year and day are known but month is not.  Having to fall back on multiple columns for this kind of thing and not having any kind of normal date behavior at all has always kind of gotten to me.)</description>
		<content:encoded><![CDATA[<p>It seems to me that the logical underpinning of schedule conflicts is the idea that a single row can represent multiple items (possibly infinite) in the underlying set.</p>
<p>Let&#8217;s think about integers instead of dates for a moment.  Start with a simple table:</p>
<p>create table single (<br />
    name text unique not null,<br />
    location integer unique not null<br />
);</p>
<p>Here we have a relation between one integer and one name.  No name can refer to more than one integer.  No integer can have more than one name.  I am intentionally not picking either one out as a primary key here.</p>
<p>The next step is to allow names to refer to sets of integers:</p>
<p>create table multiple (<br />
    name text not null,<br />
    location integer unique not null<br />
);</p>
<p>Now each integer can belong to only one name, but a name can refer to multiple integers.  Two things to note about this: there is no way to declare a constraint, should we wish to, that all of the integers for a given name be contiguous.  Neither is this representation necessarily efficient for a system that&#8217;s trying to do that.</p>
<p>Querying to find the name for a location is very straightforward (select * from multiple where location = x).  Getting a set of all of the locations is straightforward as well (select location from multiple where name = x).  Finding the bounds of the range is not straightforward, if it is represented in this manner.</p>
<p>Next, what if we represent the range as a pair of columns?</p>
<p>create table pair (<br />
    name text not null,<br />
    min_location integer not null,<br />
    max_location integer not null<br />
);</p>
<p>Now we have a different set of problems.  It is not somewhat more complicated to query for the name given a location (select name from pair where min_location &lt;= x </p>
<p>(A final note on temporal data&#8230; does anybody know of a good system for handling missing date information?  I&#8217;m not talking about &#8220;don&#8217;t know the date&#8221;, I mean &#8220;don&#8217;t have full date information&#8221;.  Like, for a death date, year of death is known, but month and day is not.  Or year and month are known, but day is not.  Or more strangely, year and day are known but month is not.  Having to fall back on multiple columns for this kind of thing and not having any kind of normal date behavior at all has always kind of gotten to me.)</p>
]]></content:encoded>
	</item>
</channel>
</rss>
