<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Experimental Thoughts &#187; Ruby</title>
	<atom:link href="http://thoughts.j-davis.com/tag/ruby/feed/" rel="self" type="application/rss+xml" />
	<link>http://thoughts.j-davis.com</link>
	<description>Ideas on Databases, Logic, and Language by Jeff Davis</description>
	<lastBuildDate>Fri, 07 Oct 2011 03:05:37 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>SQL: the successful cousin of Haskell</title>
		<link>http://thoughts.j-davis.com/2011/09/25/sql-the-successful-cousin-of-haskell/</link>
		<comments>http://thoughts.j-davis.com/2011/09/25/sql-the-successful-cousin-of-haskell/#comments</comments>
		<pubDate>Sun, 25 Sep 2011 07:10:29 +0000</pubDate>
		<dc:creator>Jeff Davis</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[Language]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://thoughts.j-davis.com/?p=472</guid>
		<description><![CDATA[Haskell is a very interesting language, and shows up on sites like http://programming.reddit.com frequently. It&#8217;s somewhat mind-bending, but very powerful and has some great theoretical advantages over other languages. I have been learning it on and off for some time, never really getting comfortable with it but being inspired by it nonetheless. But discussion on [...]]]></description>
			<content:encoded><![CDATA[<p>Haskell is a very interesting language, and shows up on sites like <a href="http://programming.reddit.com">http://programming.reddit.com</a> frequently. It&#8217;s somewhat mind-bending, but very powerful and has some great theoretical advantages over other languages. I have been learning it on and off for some time, never really getting comfortable with it but being inspired by it nonetheless.</p>
<p>But discussion on sites like reddit usually falls a little flat when someone asks a question like:</p>
<blockquote><p>If haskell has all these wonderful advantages, what amazing applications have been written with it?</p></blockquote>
<p>The responses to that question usually aren&#8217;t very convincing, quite honestly.</p>
<p>But what if I told you there was a wildly successful language, in some ways the <em>most</em> successful language ever, and it could be characterized by:</p>
<ul>
<li>lazy evaluation</li>
<li>declarative</li>
<li>type inference</li>
<li>immutable state</li>
<li>tightly controlled side effects</li>
<li>strict static typing</li>
</ul>
<p>Surely that would be interesting to a Haskell programmer? Of course, I&#8217;m talking about SQL.</p>
<p><span id="more-472"></span>Now, it&#8217;s all falling into place. All of those theoretical advantages become practical when you&#8217;re talking about managing a lot of data over a long period of time, and trying to avoid making any mistakes along the way. Really, that&#8217;s what relational database systems are all about.</p>
<p>I speculate that SQL is <em>so</em> successful and pervasive that it stole the limelight from languages like haskell, because the tough problems that haskell would solve are <em>already solved</em> in so many cases. Application developers can hack up a SQL query and run it over 100M records in 7 tables, glance at the result, and turn it over to someone else with near certainty that it&#8217;s the right answer! Sure, if you have a poorly-designed schema and have all kinds of special cases, then the query might be wrong too. But if you have a mostly-sane schema and mostly know what you&#8217;re doing, you hardly even need to check the results before using the answer.</p>
<p>In other words, if the query compiles, and the result looks anything like what you were expecting (e.g. the right basic structure), then it&#8217;s probably correct. Sound familiar? That&#8217;s exactly what people say about haskell.</p>
<p>It would be great if haskell folks would get more involved in the database community. It looks like a lot of useful knowledge could be shared. Haskell folks would be in a better position to find out how to apply theory where it has already proven to be successful, and could work backward to find other good applications of that theory.</p>
<p>Competing directly in the web application space against languages like ruby and javascript is going to be an uphill battle even if haskell is better in that space. I&#8217;ve worked with some very good ruby developers, and I honestly couldn&#8217;t begin to tell them where haskell might be a practical advantage for web application development. Again, I don&#8217;t know much about haskell aside from the very basics. But if someone like me who is interested in haskell and made some attempt to understand it and read about it still cannot articulate a practical advantage, clearly there is some kind of a problem (either messaging or technical). And that&#8217;s a huge space for application development, so that&#8217;s a serious concern.</p>
<p>However, the data management space is also huge &#8212; a large fraction of those applications exist primarily to collect data or present data. So, if haskell folks could work with the database community to advance data management, I believe that would inspire a lot of interesting development.</p>
]]></content:encoded>
			<wfw:commentRss>http://thoughts.j-davis.com/2011/09/25/sql-the-successful-cousin-of-haskell/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Building SQL Strings Dynamically, in 2011</title>
		<link>http://thoughts.j-davis.com/2011/07/09/building-sql-strings-dynamically-in-2011/</link>
		<comments>http://thoughts.j-davis.com/2011/07/09/building-sql-strings-dynamically-in-2011/#comments</comments>
		<pubDate>Sat, 09 Jul 2011 16:57:50 +0000</pubDate>
		<dc:creator>Jeff Davis</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[NULL]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://thoughts.j-davis.com/?p=403</guid>
		<description><![CDATA[I saw a recent post Avoid Smart Logic for Conditional WHERE Clauses which actually recommended, &#8220;the best solution is to build the SQL statement dynamically—only with the required filters and bind parameters&#8221;. Ordinarily I appreciate that author&#8217;s posts, but this time I think that he let confusion run amok, as can be seen in a thread on [...]]]></description>
			<content:encoded><![CDATA[<p>I saw a recent post <em><a href="http://use-the-index-luke.com/sql/where-clause/obfuscation/smart-logic">Avoid Smart Logic for Conditional WHERE Clauses</a></em> which actually recommended, &#8220;the best solution is to build the SQL statement dynamically—only with the required filters and bind parameters&#8221;. Ordinarily I appreciate that author&#8217;s posts, but this time I think that he let confusion run amok, as can be seen in a <a href="http://www.reddit.com/r/programming/comments/ij0px/the_smartest_way_to_make_sql_slow/">thread on reddit</a>.</p>
<p>To dispel that confusion: parameterized queries don&#8217;t have any plausible downsides; always use them in applications. Saved plans have trade-offs; use them sometimes, and only if you understand the trade-offs.</p>
<p>When query parameters are conflated with saved plans, it&#8217;s creates FUD about SQL systems because it mixes the fear around SQL injection with the mysticism around the SQL optimizer. Such confusion about the layers of a SQL system are a big part of the reason that some developers move to the deceptive simplicity of NoSQL systems (I say &#8220;deceptive&#8221; here because it often just moves an even greater complexity into the application &#8212; but that&#8217;s another topic).</p>
<p>The confusion started with this query from the original article:</p>
<p><span id="more-403"></span></p>
<pre>SELECT first_name, last_name, subsidiary_id, employee_id
FROM employees
WHERE ( subsidiary_id    = :sub_id OR :sub_id IS NULL )
  AND ( employee_id      = :emp_id OR :emp_id IS NULL )
  AND ( UPPER(last_name) = :name   OR :name   IS NULL )</pre>
<p>[ Aside: In PostgreSQL those parameters should be $1, $2, and $3; but that's not relevant to this discussion. ]</p>
<p>The idea is that one such query can be used for several types of searches. If you want to ignore one of those WHERE conditions, you just pass a NULL as one of the parameters, and it makes one side of the OR always TRUE, thus the condition might as well not be there. So, each condition can either be there and have one argument (restricting the results of the query), or be ignored by passing a NULL argument; thus effectively giving you 8 queries from one SQL string. By eliminating the need to use different SQL strings depending on which conditions you want to use, you reduce the opportunity for error.</p>
<p>The problem is that the article says this kind of query is a problem. The reasoning goes something like this:</p>
<ol>
<li>Using bind parameters forces the plan to be saved and reused for multiple queries.</li>
<li>When a plan is saved for multiple queries, the planner doesn&#8217;t have the actual argument values.</li>
<li>Because the planner doesn&#8217;t have the actual argument values, the &#8220;x IS NULL&#8221; conditions aren&#8217;t constant at plan time, and therefore the planner isn&#8217;t able to simplify the conditions (e.g., if one condition is always TRUE, just remove it).</li>
<li>Therefore it makes a bad plan.</li>
</ol>
<p>However, #1 is simply untrue, at least in PostgreSQL. PostgreSQL <em>can</em> save the plan, but you don&#8217;t have to. See the documentation for <a href="http://www.postgresql.org/docs/9.1/static/libpq-exec.html#LIBPQ-PQEXECPARAMS">PQexecParams</a>. Here&#8217;s an example in ruby using the &#8220;pg&#8221; gem (EDIT: Note: this does not use any magic query-building behind the scenes, it uses a protocol level feature in the PostgreSQL server to bind the arguments):</p>
<pre>require 'rubygems'
require 'pg'

conn = PGconn.connect("dbname=postgres")

conn.exec("CREATE TABLE foo(i int)")
conn.exec("INSERT INTO foo SELECT generate_series(1,10000)")
conn.exec("CREATE INDEX foo_idx ON foo (i)")
conn.exec("ANALYZE foo")

# Insert using parameters. Planner sees the real arguments, so it will
# make the same plan as if you inlined them into the SQL string. In
# this case, 3 is not NULL, so it is simplified to just "WHERE i = 3",
# and it will choose to use an index on "i" for a fast search.
res = conn.exec("explain SELECT * FROM foo WHERE i = $1 OR $1 IS NULL", [3])
res.each{ |r| puts r['QUERY PLAN'] }
puts

# Now, the argument is NULL, so the condition is always true, and
# removed completely. It will surely choose a sequential scan.
res = conn.exec("explain SELECT * FROM foo WHERE i = $1 OR $1 IS NULL", [nil])
res.each{ |r| puts r['QUERY PLAN'] }
puts

# Saves the plan. It doesn't know whether the argument is NULL or not
# yet (because the arguments aren't provided yet), so the plan might
# not be good.
conn.prepare("myplan", "SELECT * FROM foo WHERE i = $1 OR $1 IS NULL")

# We can execute this with:
res = conn.exec_prepared("myplan",[3])
puts res.to_a.length
res = conn.exec_prepared("myplan",[nil])
puts res.to_a.length

# But to see the plan, we have to use the SQL string form so that we
# can use EXPLAIN. This plan should use an index, but because we're
# using a saved plan, it doesn't know to use the index. Also notice
# that it wasn't able to simplify the conditions away like it did for
# the sequential scan without the saved plan.
res = conn.exec("explain execute myplan(3)")
res.each{ |r| puts r['QUERY PLAN'] }
puts

# ...and use the same plan again, even with different argument.
res = conn.exec("explain execute myplan(NULL)")
res.each{ |r| puts r['QUERY PLAN'] }
puts

conn.exec("DROP TABLE foo")</pre>
<p>See? If you know what you are doing, and want to save a plan, then save it. If not, do the simple thing, and PostgreSQL will have the information it needs to make a good plan.</p>
<p>My next article will be a simple introduction to database system architecture that will hopefully make SQL a little less mystical.</p>
]]></content:encoded>
			<wfw:commentRss>http://thoughts.j-davis.com/2011/07/09/building-sql-strings-dynamically-in-2011/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>None, nil, Nothing, undef, NA, and SQL NULL</title>
		<link>http://thoughts.j-davis.com/2008/08/13/none-nil-nothing-undef-na-and-sql-null/</link>
		<comments>http://thoughts.j-davis.com/2008/08/13/none-nil-nothing-undef-na-and-sql-null/#comments</comments>
		<pubDate>Wed, 13 Aug 2008 18:00:02 +0000</pubDate>
		<dc:creator>Jeff Davis</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[NULL]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Ruby]]></category>
		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://davisjeff.wordpress.com/?p=14</guid>
		<description><![CDATA[In my last post, Why DBMSs are so complex, I raised the issue of type mismatches between the application language and the DBMS. Type matching between the DBMS and the application is as important as types themselves for successful application development. If a type behaves one way in the DBMS, and a &#8220;similar&#8221; type behaves [...]]]></description>
			<content:encoded><![CDATA[<p>In my last post, <a title="Why DBMSs are so complex" href="http://thoughts.j-davis.com/2008/08/03/why-dbmss-are-so-complex/">Why DBMSs are so complex</a>, I raised the issue of type mismatches between the application language and the DBMS.</p>
<p>Type matching between the DBMS and the application is as important as types themselves for successful application development. If a type behaves one way in the DBMS, and a &#8220;similar&#8221; type behaves slightly differently in the application, that can only cause confusion. And it&#8217;s a source of unnecessary awkwardness: you already need to define the types that suit your business best in one place, why do you need to redefine them somewhere else, based on a different basic type system?</p>
<p><span id="more-14"></span></p>
<p>At least we&#8217;re using PostgreSQL, the most extensible database available, where you can define sophisticated types and make them perform like native features.</p>
<p>But there are still problems. Most notably, it&#8217;s a non-trivial challenge to find an appropriate way to model NULLs in the application language. You can&#8217;t <strong>not</strong> use them in the DBMS, because the SQL spec generates them from oblivion, e.g. from an outer join or an aggregate function, even when you have no NULLs in your database. So the only way to model the same semantics in your application is to somehow make your application language understand NULL semantics.</p>
<div>Here&#8217;s how SQL NULL behaves:</p>
<pre>
=&gt; -- aggregate with one NULL input
=&gt; select sum(column1) from (values(NULL::int)) t;
sum
-----

(1 row)

=&gt; -- aggregate with two inputs, one of them NULL
=&gt; select sum(column1) from (values(1),(NULL)) t;
sum
-----
1
(1 row)

=&gt; -- aggregate with no input
=&gt; select sum(column1) from (values(1),(NULL)) t where false;
sum
-----

(1 row)

=&gt; -- + operator
=&gt; select 1 + NULL;
?column?
----------

(1 row)
</pre>
<p>I&#8217;ll divide the &#8220;NULL-ish&#8221; values of various languages into two broad categories:</p>
<ol>
<li>Separate type, few operators defined, error early, no 3VL &#8212; Python, Ruby and Haskell fall into this category, because their &#8220;NULL-ish&#8221; types (<tt>None</tt>, <tt><strong>nil</strong></tt>, and <tt><strong>Nothing</strong></tt>, respectively) usually result in an immediate exception, unless the operator to which the NULLish value is passed handles it as a special case. Few built-in operators are defined for arguments of these types. These fail to behave like SQL NULL, because they employ no three-valued logic (3VL) at all, and thus fail in the forth portion of the SQL example.</li>
<li>Member of all types, every operator defined &#8212; Perl and R fall into this category. Perl&#8217;s <strong>undef</strong> can be passed through many built-in operators (like +), but doesn&#8217;t ever use 3VL, so fails the forth portion of the SQL example. R uses a kind of 3VL for it&#8217;s <tt><strong>NA</strong></tt> value, but it uses it everywhere, so <tt>sum(c(1,<strong>NA</strong>))</tt> results in <tt><strong>NA</strong></tt> (thus failing the second portion of the SQL example). In R, you can omit <tt>NA</tt>s from the sum explicitly (not a very good solution, by the way), but then it will fail the first portion of the SQL example.</li>
</ol>
<p>As far as I can tell (correct me if I&#8217;m mistaken), none of these languages support the third portion of the SQL example: the sum of an empty list in SQL is NULL. The languages that I tested with a built-in <tt>sum</tt> operator (Python, R, Haskell) all return <tt>0</tt> when passed an empty list.</p>
<p>Languages from the first category appear safer, because you will catch the errors earlier rather than later. However, transforming SQL NULLs in these languages to <tt>None</tt>, <tt><strong>nil</strong></tt>, or <tt><strong>Nothing</strong></tt> is actually quite dangerous, because a change in the data you store in your database (inserting NULLs or deleting records that may be aggregated) or even a change in a query (outer join, or an aggregate that may have no input) can produce NULLs, and therefore produce exceptions, that can evade even rigorous testing procedures and sneak into production.</p>
<p>Languages from the second category tend to pass the &#8220;<strong>undef</strong>&#8221; or &#8220;<strong>NA</strong>&#8221; along deeper into the application, which can cause unintuitive and difficult-to-trace problems. Perhaps worse, something will always happen, and usually the result will take the form of the correct answer even if it is wrong.</p>
<p>So where does that leave us? I think the blame here rests entirely on the SQL standard&#8217;s definition of NULL, and the inconsistency between &#8220;not a value at all&#8221; and &#8220;the third logical value&#8221; (both of which can be used to describe NULL in different contexts). Not much can be done about that, so I think the best strategy is to try to interpret and remove NULLs as early as possible. They can be removed from result sets before returning to the client by using COALESCE, and they can be removed after they reach the client with client code. Passing them along as some kind of special value is only useful if your application already must be thoroughly aware of that special value.</p>
<p>Note: Microsoft has defined some kind of &#8220;DBNull&#8221; value, and from browsing the docs, it appears a substantial amount of work went into making them behave as SQL NULLs. This includes a special set of SQL types and operators. Microsoft appears to be making a lot of progress matching DBMS and application types more closely, but I think the definition of SQL NULLs is a lost cause.</p></div>
]]></content:encoded>
			<wfw:commentRss>http://thoughts.j-davis.com/2008/08/13/none-nil-nothing-undef-na-and-sql-null/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>ruby-pg is now the official postgres ruby gem</title>
		<link>http://thoughts.j-davis.com/2007/12/14/ruby-pg-is-now-the-official-postgres-ruby-gem/</link>
		<comments>http://thoughts.j-davis.com/2007/12/14/ruby-pg-is-now-the-official-postgres-ruby-gem/#comments</comments>
		<pubDate>Fri, 14 Dec 2007 18:00:10 +0000</pubDate>
		<dc:creator>Jeff Davis</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[Language]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Ruby]]></category>

		<guid isPermaLink="false">http://davisjeff.wordpress.com/?p=18</guid>
		<description><![CDATA[ruby-pg is now the official rubyforge project for the &#8220;postgres&#8221; ruby gem. See the project here: http://www.rubyforge.org/projects/ruby-pg or install the gem directly: # gem install &#8211;remote postgres The previous project has gone unmaintained for a long time, which lead to the fork. This gem includes some important fixes, most notably the ability to compile against [...]]]></description>
			<content:encoded><![CDATA[<div>ruby-pg is now the official rubyforge project for the &#8220;postgres&#8221; ruby<br />
gem. See the project here:</p>
<p><a href="http://www.rubyforge.org/projects/ruby-pg">http://www.rubyforge.org/projects/ruby-pg</a></p>
<p>or install the gem directly:</p>
<p># gem install &#8211;remote postgres</p>
<p><span id="more-18"></span></p>
<p>The previous project has gone unmaintained for a long time, which lead<br />
to the fork.</p>
<p>This gem includes some important fixes, most notably the ability to<br />
compile against PostgreSQL 8.3.</p>
<div>The gem contains two modules:</p>
<ul>
<li>&#8216;postgres&#8217; &#8212; require this module as before, you can use it without<br />
making any changes to your application. This is essentially just a fork<br />
from version 0.7.1.2006.04.06, but contains some important fixes,<br />
including the ability to build against 8.3.</li>
<li>&#8216;pg&#8217; &#8212; a new interface, designed to offer every feature available in<br />
libpq to Ruby, with a better API. This module is simpler, cleaner, and<br />
more portable. It is still unstable, so test before using.</li>
</ul>
<p>PostgreSQL+Ruby users: please test and report any problems. I&#8217;d like to<br />
make sure this is as stable as possible, and builds on all necessary<br />
platforms.</p></div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://thoughts.j-davis.com/2007/12/14/ruby-pg-is-now-the-official-postgres-ruby-gem/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

