Cruise Factory Data Feed Importer

Cruise Factory provides a data feed to the travel industry containing information on cruise product, ships, destinations, and pricing from various cruiselines. Benchmark was tasked with integrating this data into an existing Drupal 7 website.


The challenge involved with the data feed is that the data set is quite large, and there is a lot of different types of data, which all connect via IDs and Referenced ID fields. To render a single deal, for example, you would need to combine the data from the feeds for ship, destination, ports, itinerary, cruiseline, star rating, pricing (of which there are 5 potential pricing models in use, each in a separate table with different ways of handling pricing), cabins, amenities (such as bed size, televisions, etc), ship photos, and destination photos.

Keeping all these tables separate in the MySQL database led to a massive server load and a very slow site, when trying to render a single page that has joined database queries to 10 or 12 different tables. Consider a Search Results page with 10 deals appearing on a single page, and the experience was unusable!

The solution I devised was to do the mass import into separate entities as envisged by Cruise Factory, and then, using a Drupal View with plenty of relationships and views-within-views, combine all the data required to render a complete deal, and then, using Views Data Feeds, run a subsequent importer to import these complete entities.

This resulting in many benefits - the search index was more robust, since the data was all together in a single entity, which allowed us to search more meaningfully, and faster. The rendering of the deals was significantly faster, and search pages worked markedly faster as well.

Lastly, since the data feed had no datastamps or highwater marks on anything, the full feed had to be erased and re-imported fresh every day, which could take over an hour - this could now be done behind the scenes, with the actual front-facing entities only being missing for a few minutes as the final import is carried out. This is done at midnight where there is no site traffic, but with clever use of caching an Apache Solr, the downtime isn't even visible any more anyway.