<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: An alternative way of EAV modelling</title>
	<atom:link href="http://www.jasny.net/articles/an-alternative-way-of-eav-modeling/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.jasny.net/articles/an-alternative-way-of-eav-modeling/</link>
	<description>It&#039;s all about me, mysql and Einstein.</description>
	<lastBuildDate>Wed, 08 Sep 2010 17:54:42 +0100</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Bill Karwin</title>
		<link>http://www.jasny.net/articles/an-alternative-way-of-eav-modeling/comment-page-1/#comment-209582</link>
		<dc:creator>Bill Karwin</dc:creator>
		<pubDate>Thu, 29 Oct 2009 17:21:55 +0000</pubDate>
		<guid isPermaLink="false">http://blog.adaniels.nl/?p=88#comment-209582</guid>
		<description>I commented on &quot;EAV multi-value fields&quot; but Arnold asked me to join the discussion here:

EAV is a non-relational design and trying to make it fit in an SQL database is a fool’s errand.

By using EAV, you’ve already sacrificed any benefit using an RDBMS might have given you, so why are you still using an RDBMS?

If EAV really seems like what you need, you should use something like CouchDB, Apache Cassandra , Project Voldemort, or one of the other emerging key/value storage technology.

I don’t mean to be cranky, but this is really one of those square-peg-in-a-round-hole situations.

Arnold replied: &quot;The issue with document oriented databases is that it is very difficult to get results based on referential info.&quot;

Yes, but neither can you support referential integrity constraints in EAV.  A constraint on a column applies to all rows, not just some rows, where the fid is a certain value.

Arnold: &quot;I would not be a good idea to make a table with millions of fields or make a db with a table for each illness.&quot;

Yes, the seminal use of the EAV data model is in medical databases in the 1970&#039;s, where a patient entity has thousands of boolean attributes, forming a sparse matrix of the illnesses and conditions that apply.  So EAV was used so that the attributes could be stored as rows, instead of columns, because a table with thousands of columns was impractical.

But the relational model for patients-to-conditions is simply a many-to-many relationship.  Yes, we can store rows to represent the boolean association from a patient to a given condition.  And we can store the possible conditions in rows too--in a lookup table also referenced by the many-to-many intersection table.

CREATE TABLE HasCondition (
  patient_id INT REFERENCES Patients, 
  condition_id INT REFERENCES Conditions, 
  PRIMARY KEY (patient_id, condition_id)
);

Arnold: &quot;Wanting to make the DB more flexible is not a valid reason to choose EAV.&quot;

This I agree with.  There are several ways to solve that problem, while keeping the separation between data and metadata.  These solutions therefore support other core relational features such as typing and referential integrity, without employing the &quot;Inner Platform Effect&quot; antipattern.

If you really, *REALLY* need to support custom user-defined attributes, I&#039;d recommend the &quot;Serialized LOB&quot; approach, encoding the custom attributes in XML or JSON or whatever, and stuffing them into a blob.  You aren&#039;t sacrificing any query power, because you&#039;ve already given that up as soon as you consider EAV.</description>
		<content:encoded><![CDATA[<p>I commented on &#8220;EAV multi-value fields&#8221; but Arnold asked me to join the discussion here:</p>
<p>EAV is a non-relational design and trying to make it fit in an SQL database is a fool’s errand.</p>
<p>By using EAV, you’ve already sacrificed any benefit using an RDBMS might have given you, so why are you still using an RDBMS?</p>
<p>If EAV really seems like what you need, you should use something like CouchDB, Apache Cassandra , Project Voldemort, or one of the other emerging key/value storage technology.</p>
<p>I don’t mean to be cranky, but this is really one of those square-peg-in-a-round-hole situations.</p>
<p>Arnold replied: &#8220;The issue with document oriented databases is that it is very difficult to get results based on referential info.&#8221;</p>
<p>Yes, but neither can you support referential integrity constraints in EAV.  A constraint on a column applies to all rows, not just some rows, where the fid is a certain value.</p>
<p>Arnold: &#8220;I would not be a good idea to make a table with millions of fields or make a db with a table for each illness.&#8221;</p>
<p>Yes, the seminal use of the EAV data model is in medical databases in the 1970&#8217;s, where a patient entity has thousands of boolean attributes, forming a sparse matrix of the illnesses and conditions that apply.  So EAV was used so that the attributes could be stored as rows, instead of columns, because a table with thousands of columns was impractical.</p>
<p>But the relational model for patients-to-conditions is simply a many-to-many relationship.  Yes, we can store rows to represent the boolean association from a patient to a given condition.  And we can store the possible conditions in rows too&#8211;in a lookup table also referenced by the many-to-many intersection table.</p>
<p>CREATE TABLE HasCondition (<br />
  patient_id INT REFERENCES Patients,<br />
  condition_id INT REFERENCES Conditions,<br />
  PRIMARY KEY (patient_id, condition_id)<br />
);</p>
<p>Arnold: &#8220;Wanting to make the DB more flexible is not a valid reason to choose EAV.&#8221;</p>
<p>This I agree with.  There are several ways to solve that problem, while keeping the separation between data and metadata.  These solutions therefore support other core relational features such as typing and referential integrity, without employing the &#8220;Inner Platform Effect&#8221; antipattern.</p>
<p>If you really, *REALLY* need to support custom user-defined attributes, I&#8217;d recommend the &#8220;Serialized LOB&#8221; approach, encoding the custom attributes in XML or JSON or whatever, and stuffing them into a blob.  You aren&#8217;t sacrificing any query power, because you&#8217;ve already given that up as soon as you consider EAV.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Arnold Daniels</title>
		<link>http://www.jasny.net/articles/an-alternative-way-of-eav-modeling/comment-page-1/#comment-209567</link>
		<dc:creator>Arnold Daniels</dc:creator>
		<pubDate>Thu, 29 Oct 2009 12:05:06 +0000</pubDate>
		<guid isPermaLink="false">http://blog.adaniels.nl/?p=88#comment-209567</guid>
		<description>Ivan: See &lt;a href=&quot;http://www.adaniels.nl/articles/eav-multi-value-fields&quot; rel=&quot;nofollow&quot;&gt;EAV multi-value fields&lt;/a&gt;</description>
		<content:encoded><![CDATA[<p>Ivan: See <a href="http://www.adaniels.nl/articles/eav-multi-value-fields" rel="nofollow">EAV multi-value fields</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ivan</title>
		<link>http://www.jasny.net/articles/an-alternative-way-of-eav-modeling/comment-page-1/#comment-209380</link>
		<dc:creator>Ivan</dc:creator>
		<pubDate>Fri, 23 Oct 2009 12:55:28 +0000</pubDate>
		<guid isPermaLink="false">http://blog.adaniels.nl/?p=88#comment-209380</guid>
		<description>Also, how do you create SELECT statement to bring back several fields at once?</description>
		<content:encoded><![CDATA[<p>Also, how do you create SELECT statement to bring back several fields at once?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ivan</title>
		<link>http://www.jasny.net/articles/an-alternative-way-of-eav-modeling/comment-page-1/#comment-209376</link>
		<dc:creator>Ivan</dc:creator>
		<pubDate>Fri, 23 Oct 2009 11:16:15 +0000</pubDate>
		<guid isPermaLink="false">http://blog.adaniels.nl/?p=88#comment-209376</guid>
		<description>Hello, very interesting way of doing EAV modeling. I would like to know how you implemented ranges and multi value fields, maybe it&#039;s time for another article? 

Thanks
Ivan</description>
		<content:encoded><![CDATA[<p>Hello, very interesting way of doing EAV modeling. I would like to know how you implemented ranges and multi value fields, maybe it&#8217;s time for another article? </p>
<p>Thanks<br />
Ivan</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Grant Czerepak</title>
		<link>http://www.jasny.net/articles/an-alternative-way-of-eav-modeling/comment-page-1/#comment-110362</link>
		<dc:creator>Grant Czerepak</dc:creator>
		<pubDate>Fri, 06 Feb 2009 23:24:21 +0000</pubDate>
		<guid isPermaLink="false">http://blog.adaniels.nl/?p=88#comment-110362</guid>
		<description>SQL Databases are not the environment for creating an EAV database.

A correctly functioning EAV database management system has to be built from the ground up.

Check out http://www.lazysoft.com and the Sentences database

Sentences provides the architecture, the schema, data, form and query tools you need to create an EAV database that is powerful, scalable, distributable.

Relational databases are based on lattices not the scalable networks of EAV.</description>
		<content:encoded><![CDATA[<p>SQL Databases are not the environment for creating an EAV database.</p>
<p>A correctly functioning EAV database management system has to be built from the ground up.</p>
<p>Check out <a href="http://www.lazysoft.com" rel="nofollow">http://www.lazysoft.com</a> and the Sentences database</p>
<p>Sentences provides the architecture, the schema, data, form and query tools you need to create an EAV database that is powerful, scalable, distributable.</p>
<p>Relational databases are based on lattices not the scalable networks of EAV.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Travis Paxton</title>
		<link>http://www.jasny.net/articles/an-alternative-way-of-eav-modeling/comment-page-1/#comment-66501</link>
		<dc:creator>Travis Paxton</dc:creator>
		<pubDate>Tue, 30 Sep 2008 03:43:23 +0000</pubDate>
		<guid isPermaLink="false">http://blog.adaniels.nl/?p=88#comment-66501</guid>
		<description>I am interested in how you approach the storage of multiple value options and how to retrieve them using your database schema.  I&#039;m also interested in how you group your data into &quot;rows&quot; of information.  Thanks!</description>
		<content:encoded><![CDATA[<p>I am interested in how you approach the storage of multiple value options and how to retrieve them using your database schema.  I&#8217;m also interested in how you group your data into &#8220;rows&#8221; of information.  Thanks!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: MT</title>
		<link>http://www.jasny.net/articles/an-alternative-way-of-eav-modeling/comment-page-1/#comment-51441</link>
		<dc:creator>MT</dc:creator>
		<pubDate>Tue, 26 Aug 2008 00:02:35 +0000</pubDate>
		<guid isPermaLink="false">http://blog.adaniels.nl/?p=88#comment-51441</guid>
		<description>This is very much a topical debate for me, as I am facing a similar conundrum with our own database structure.

Our application offers users the ability to create user defined fields. We have one client who has created 150+!!

The way I originally tackled this was via a simple EAV mechanism like this:

table UDF_DEFINITION
  id (integer) PK
  caption (varchar)
  datatype (varchar) constrained by domain to presets such as &#039;Text&#039;, &#039;Date&#039;, &#039;Number&#039; etc.

table UDF_VALUE
  id (integer) PK
  id_udfdefinition (integer) FK to UDF_DEFINITION table
  id_client (integer) FK to CLIENT table
  data (varchar) i.e. value

So I could do something like this:

  SELECT uv.* FROM udf_value uv JOIN client c on c.id = uv.id_client

This model was OK when we were returning ALL the data into a grid. However, it started to fall apart at the seams when we were asked to return paged search results based on search criteria.

The problem here as Antonin says is with sorting and paging. The problems were 2 fold:

1) Having my udf data all as VARCHAR meant that it is a pain in the *ass to order the data without having the correct data type.

2) Because you cannot apply a sort until you&#039;ve returned the whole results set due to the pivoted nature of the data. So this becomes a cross tab like performance hog on searches. A no no.

I have rejected altering the schema on the fly because it seems inherently dangerous when you have multiple users connecting concurrently. Also if I have to create a table for each UDF then I&#039;ll end up with 150+ joins in my SQL! Also, for me it&#039;s risky. In my experience, if it can go wrong, it WILL go wrong. :)

I have rejected using a non RDBMS storage type as all the rest of the app works off flat tables. So I refuse to go to that level of complexity for one feature.

i am now leaning towards distributing the data into separate columns. The possibilities I see are:

1) I then i can return the whole row and pick off my values in code using a switch by parsing the type for that definition id.
2) I can keep a read only data varchar value updated via triggers so that i can still do a search on a plain text field. (I havent thought this one through fully so it may not work)

Any comments or assistance welcome.</description>
		<content:encoded><![CDATA[<p>This is very much a topical debate for me, as I am facing a similar conundrum with our own database structure.</p>
<p>Our application offers users the ability to create user defined fields. We have one client who has created 150+!!</p>
<p>The way I originally tackled this was via a simple EAV mechanism like this:</p>
<p>table UDF_DEFINITION<br />
  id (integer) PK<br />
  caption (varchar)<br />
  datatype (varchar) constrained by domain to presets such as &#8216;Text&#8217;, &#8216;Date&#8217;, &#8216;Number&#8217; etc.</p>
<p>table UDF_VALUE<br />
  id (integer) PK<br />
  id_udfdefinition (integer) FK to UDF_DEFINITION table<br />
  id_client (integer) FK to CLIENT table<br />
  data (varchar) i.e. value</p>
<p>So I could do something like this:</p>
<p>  SELECT uv.* FROM udf_value uv JOIN client c on c.id = uv.id_client</p>
<p>This model was OK when we were returning ALL the data into a grid. However, it started to fall apart at the seams when we were asked to return paged search results based on search criteria.</p>
<p>The problem here as Antonin says is with sorting and paging. The problems were 2 fold:</p>
<p>1) Having my udf data all as VARCHAR meant that it is a pain in the *ass to order the data without having the correct data type.</p>
<p>2) Because you cannot apply a sort until you&#8217;ve returned the whole results set due to the pivoted nature of the data. So this becomes a cross tab like performance hog on searches. A no no.</p>
<p>I have rejected altering the schema on the fly because it seems inherently dangerous when you have multiple users connecting concurrently. Also if I have to create a table for each UDF then I&#8217;ll end up with 150+ joins in my SQL! Also, for me it&#8217;s risky. In my experience, if it can go wrong, it WILL go wrong. <img src='http://www.jasny.net/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>I have rejected using a non RDBMS storage type as all the rest of the app works off flat tables. So I refuse to go to that level of complexity for one feature.</p>
<p>i am now leaning towards distributing the data into separate columns. The possibilities I see are:</p>
<p>1) I then i can return the whole row and pick off my values in code using a switch by parsing the type for that definition id.<br />
2) I can keep a read only data varchar value updated via triggers so that i can still do a search on a plain text field. (I havent thought this one through fully so it may not work)</p>
<p>Any comments or assistance welcome.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Arnold Daniels</title>
		<link>http://www.jasny.net/articles/an-alternative-way-of-eav-modeling/comment-page-1/#comment-42319</link>
		<dc:creator>Arnold Daniels</dc:creator>
		<pubDate>Mon, 04 Aug 2008 09:47:06 +0000</pubDate>
		<guid isPermaLink="false">http://blog.adaniels.nl/?p=88#comment-42319</guid>
		<description>Hi Adam,

It&#039;s true it looks somewhat like the OTLT pattern, however it does not have most of the described drawbacks.

&lt;a href=&quot;http://decipherinfosys.wordpress.com/2007/02/01/otlt-one-true-lookup-table&quot; target=&quot;_blank&quot; rel=&quot;nofollow&quot;&gt;http://decipherinfosys.wordpress.com/2007/02/01/otlt-one-true-lookup-table&lt;/a&gt;

b.) Numeric values are saved as numbers and not as strings.
c.) Strings are saved in a variable sized column. The index makes sure it&#039;s fairly searchable.
a.) Values are only joined to one table `field_option`.
d.) Values are only joined using INTEGERS.

It&#039;s true though that you have to write complex and relatively slow queries to filter using the EAV table.

The issue I have with the articles condemning these structures, is that they are comparing very simple DB schemas with EAV. In that case there is no reason to use EAV. This structure is useful when the alternative is using either hundreds of tables or thousands of fields. Wanting to make the DB more flexible is not a valid reason to choose EAV.</description>
		<content:encoded><![CDATA[<p>Hi Adam,</p>
<p>It&#8217;s true it looks somewhat like the OTLT pattern, however it does not have most of the described drawbacks.</p>
<p><a href="http://decipherinfosys.wordpress.com/2007/02/01/otlt-one-true-lookup-table" target="_blank" rel="nofollow">http://decipherinfosys.wordpress.com/2007/02/01/otlt-one-true-lookup-table</a></p>
<p>b.) Numeric values are saved as numbers and not as strings.<br />
c.) Strings are saved in a variable sized column. The index makes sure it&#8217;s fairly searchable.<br />
a.) Values are only joined to one table `field_option`.<br />
d.) Values are only joined using INTEGERS.</p>
<p>It&#8217;s true though that you have to write complex and relatively slow queries to filter using the EAV table.</p>
<p>The issue I have with the articles condemning these structures, is that they are comparing very simple DB schemas with EAV. In that case there is no reason to use EAV. This structure is useful when the alternative is using either hundreds of tables or thousands of fields. Wanting to make the DB more flexible is not a valid reason to choose EAV.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Adam Machanic</title>
		<link>http://www.jasny.net/articles/an-alternative-way-of-eav-modeling/comment-page-1/#comment-41153</link>
		<dc:creator>Adam Machanic</dc:creator>
		<pubDate>Fri, 01 Aug 2008 18:52:03 +0000</pubDate>
		<guid isPermaLink="false">http://blog.adaniels.nl/?p=88#comment-41153</guid>
		<description>You&#039;ve reinvented what is referred to as OTLT, or the One True Lookup Table.

http://www.google.com/search?q=otlt</description>
		<content:encoded><![CDATA[<p>You&#8217;ve reinvented what is referred to as OTLT, or the One True Lookup Table.</p>
<p><a href="http://www.google.com/search?q=otlt" rel="nofollow">http://www.google.com/search?q=otlt</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Arnold Daniels</title>
		<link>http://www.jasny.net/articles/an-alternative-way-of-eav-modeling/comment-page-1/#comment-41017</link>
		<dc:creator>Arnold Daniels</dc:creator>
		<pubDate>Fri, 01 Aug 2008 13:51:06 +0000</pubDate>
		<guid isPermaLink="false">http://blog.adaniels.nl/?p=88#comment-41017</guid>
		<description>Hi Quinton,

First of all thanks for commenting on my blog and participating in what has become a bit of a discussion.

Using the result of a function in a WHERE condition with a substantial amount of records, will surely be slow. I don&#039;t think anyone will dispute that. However, I have not found any situation where I need to cast a field on-the-fly. If you do have a situation like that, this method will not work well.

I only need to do select based on integers, which is quite fast. Converting the filter values into integers before putting them into an SQL statement is is not dependant of the amount of records and is unlikely to make any performance hit.

Could you please give an example where you need to cast on-the-fly in a WHERE statement?

I try to free up some time to benchmark the 3 approaches in order to give a conclusion.</description>
		<content:encoded><![CDATA[<p>Hi Quinton,</p>
<p>First of all thanks for commenting on my blog and participating in what has become a bit of a discussion.</p>
<p>Using the result of a function in a WHERE condition with a substantial amount of records, will surely be slow. I don&#8217;t think anyone will dispute that. However, I have not found any situation where I need to cast a field on-the-fly. If you do have a situation like that, this method will not work well.</p>
<p>I only need to do select based on integers, which is quite fast. Converting the filter values into integers before putting them into an SQL statement is is not dependant of the amount of records and is unlikely to make any performance hit.</p>
<p>Could you please give an example where you need to cast on-the-fly in a WHERE statement?</p>
<p>I try to free up some time to benchmark the 3 approaches in order to give a conclusion.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
