How to dump CSV files to MySQL (the easy way)

Sometimes, the only option for dumping data into a database is using CSV files – they’re easy to use for non-technical people, they are a fairly efficient way of storing data and can be easily exported from Excel files.

There’s one huge problem with them, though…they’re a real pain in the arse to work with.

So, I decided to create a reliable, easy to use way for dumping CSV files, no matter how big or small, to a MySQL database. I will, however, show some alternatives along the way.

When it comes to getting CSV files into a MySQL database, you generally have a couple of options:

  • Writing a custom SQL import using LOAD DATA INFILE
  • Using PHPMyAdmin import interface
  • Using an existing tool such as BigDump
  • Writing a custom import script
  • Using my tool (CSVDump)

Each method has advantages and disadvantages, but if you’d like to use what I consider the best method, scroll down to the bottom of the post to see my custom CSVDump tool, which gives you the most customisation and an overall smooth experience.

Importing a CSV with a custom MySQL import

This method requires the least 3rd party tools and has the advantage that it can be run from the command line, so it’s easy to do via SSH alone.

Pros:

  • Requires very few third party tools
  • Very quick for simple imports

Cons:

  • Can be hard to debug if things go wrong
  • No customisation of imports
  • Hard to use if you’re not familiar with SQL

To import data into a table from a CSV file using MySQL alone, first create your table, then you can execute a command similar to this:

LOAD DATA LOCAL INFILE '/file.csv'
INTO TABLE `table`
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
(`field1`, `field2`, `field3`);

Simple enough, you should then have your populated table.

In this example, table is the table name and field1, field2 etc are the column names (note, this won’t automatically skip header rows!).

Importing a CSV file using PHPMyAdmin import interface

Using PHPMyAdmin is similar to the script method above.

Pros:

  • PHPMyAdmin is installed on virtually every server running MySQL
  • Can be very quick

Cons:

  • No customisation on import
  • Hard to debug if things go wrong

The PHPMyAdmin method is fairly robust and for simple imports is my preferred way of doing things. When you log in and select a table, hit the import button at the top of the screen and you’re presented with the following dialog:

PHPMyAdmin

You can either upload your file or choose a directory where the file resides. Note that the default ‘Format-Specific Options’ aren’t generally what you’ll find in CSV files – you’ll want to change the ‘Columns escaped with’ option to ‘\’ in most cases.

Using BigDump

BigDump is very good at importing huge files on web servers where space/time/resources is otherwise limited.

Pros:

  • Can import SQL and CSV files (though they don’t recommend using for CSV files)
  • Can be very quick
  • Spawns separate import processes, breaking a file into chunks, meaning your server won’t time out or run out of memory.

Cons:

  • Extremely unfriendly to use (in my opinion, anyway)
  • No customisation on import
  • It struggles a bit with CSV files

You can read more about using BigDump on their site.

Importing CSV files into MySQL using a custom script

This method is what most of us would use if we needed to process the data at all before inserting into our MySQL database.

Pros:

  • Highly customisable
  • You’ll know exactly what’s going on

Cons:

  • Not at all friendly to non-developers
  • One shot usage
  • Can be very slow, especially on large files, if not handled correctly

A custom import script might look something like this (with some parts omitted for brevity) :

<?php

// get mysqli connection

// import file to database
$handle = fopen($fn, "r");

// loop over rows in data file
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
    // trim the first column
    $data[0] = trim($data[0]);

    // SQL
    $SQL="INSERT into importing(text,number)values('$data[0]','$data[1]')";

    // insert into database
    $mysqli->query($SQL) or die($mysqli->error);
}

fclose($handle);

Pretty simple, and pretty effective, but if you’re doing this for lots of files it can be a pain to write code for every single one, and lacks much feedback if things go wrong.

Imicon-64porting a CSV file into a MySQL table using CSVDump

To counter some of the problems with the methods outlined above, I created a tool that allows easy importing of big CSV files fast with the added benefit of having a user interface AND being capable of doing column/row processing.

Pros:

  • Can process columns/rows
  • Creates tables on the fly
  • User interface
  • Easily process on a column/row basis – this is the real power of CSVDump
  • Can be very fast on simple imports
  • Great for repeated import processes

Cons:

  • Only imports CSVs

You can download CSVDump on GitHub, where there are also usage instructions.

When you have the file in your web directory, navigate to the homepage and you’ll be presented with this screen:

CSVDump

To dump a CSV file into the database, first copy your file into the ‘input’ directory, then it should show up in the file table (you don’t need to refresh the page). It will also show an estimate of the amount of rows in the table.

Once your file has appeared, you can change the database name, table name, and any other settings that appear here. Most settings should be familiar, but I’ll go over the ones that won’t:

Processor Class

This tells the software what pre/post processor to use to process the data. We’ll go over how to create a processor class in a moment.

Chunk Size

The size of chunk to use when importing files. If you have problems with memory limits, scale this down, or if you want to speed things up a bit, scale this up. 100 to 1000 should be good for most applications.

CSVDump processor classes

Creating a processor is easy – there are two types, however – Database processors and Row processors.

Row processors are given a row of data, which they can process and give back to the CSVDump program. It implements the abstract Row_Processor_Abstract class and must implement the process method. The file must be saved in the /CSVDump/Processor/Row folder with the filename matching the last portion of the classname, e.g:

Books.php

<?php

include "Abstract.php";

/**
 * Processor_Row_Books
 */

class Processor_Row_Books extends Processor_Row_Abstract
{

    /**
     * init
     * @param  $row   The row array
     * @return void
     */
    public function init($params = null)
    {
        parent::init($params);
    }

    /**
     * process a row of data
     * @return array
     */
    public function process($row)
    {
        $row['Title'] = $this->titleCase($row['Title']);
        $row['Slug'] = $this->removeStopwords($row['Title']);
        $row['Description'] = strip_tags($row['Description']);
        $row['Price'] = str_replace('$', '', $row['Price']);
        $row['Rating'] = round((float)$row['Rating'], 2);
        $row['Author'] = $row['AuthorFirstName'] . ' ' . $row['AuthorLastName'];
        return $row;
    }

    /**
     * Converts a string to title case
     * @param  string $str
     * @return string      String, in Title Case
     */
    public function titleCase($str)
    {
      $stopWords=array(
        'for',
        'the',
        'to',
        'of'
      );
      $words = explode(' ',strtolower($str));
      foreach($words as &$word){
        if(!in_array($word, $stopWords)) $word = ucfirst($word);
      }
      return implode(' ',$words);
    }

    /**
     * Removes stopwords from $str
     * @param  string $str The string with stopwords
     * @return string      String without stopwords
     */
    public function removeStopwords($str)
    {
      $needles=array(
        '/\bfor\b/i',
        '/\bthe\b/i',
        '/\bto\b/i',
        '/\bof\b/i'
      );
      return preg_replace($needles, "", $str);
    }

}

Simple!

DB Processors are similar, but are used to make changes on a table level. Follow the steps above for Row Processors, but instead save the file in the DB folder.

Here’s an example of a DB processor:

<?php
include "Abstract.php";

/**
 * Processor_DB_Books
 */
class Processor_DB_Books extends Processor_DB_Abstract
{
    public function preImport(){
        // Adding/modifying a column using the built in method
        $options = array(
            'nullable' => true,
            'default'  => null,
        );
        $this->addColumn('Slug', Processor_DB_Abstract::TYPE_VARCHAR, 255, $options);

        $options = array(
            'nullable' => true,
            'default'  => null,
        );
        $this->addColumn('Author', Processor_DB_Abstract::TYPE_VARCHAR, 255, $options);

        // Adding/modifying a column using raw SQL
        $SQL = "ALTER TABLE `" . $this->getTableName() . "` ADD `SomeText` VARCHAR(255) NULL DEFAULT NULL ;";
        try{
            $this->query($SQL, true);
        } catch (\Exception $e){
            // do nothing, this is okay because on multiple imports it might fail on
            // the second import when the column already exists.
        }
        return $this;
    }

    public function postImport(){
        $options = array(
            'nullable' => true,
            'default'  => 1
        );

        // Because of how this works, repeat definitions need to be taken care of...
        $newname = 'Enabled';
        if($this->tableColumnExists($newname)) {
            $this->dropColumn($newname);
        }
        $this->modifyColumn('Active', Processor_DB_Abstract::TYPE_BOOLEAN, null, $options, null, $newname);

        // And a raw SQL query post import
        $SQL = "UPDATE `".$this->getTableName()."` SET `SomeText`='HELLO';";
        $this->query($SQL);
        return $this;
    }
}

Conclusion

I’ve managed to use CSV dump for some very complex and sophisticated imports, and not just one-off imports, but ones that made their way into my daily routine.

It’s the typical workplace occurrence that one finds themselves having to import a file containing 1000’s of malformed entries from some crappy CSV file into a MySQL database, and I find it indispensable in processing and importing data because it’s just so easy!

6 Replies to “How to dump CSV files to MySQL (the easy way)”

  1. Thanks man for this awesome product. One question though: I believe that in order to make the tool really accessible to anyone, having an upload functionnality (So the end user doesn’t have to have a FTP tool to upload the CSV to /CSVDump/input).
    Is that difficult to add to the tool ?

    Thanks,

    Julien

    1. It wouldn’t be difficult, but I explicitly didn’t include one because this app is not secure at all, and I’m not prepared to do so.

      By all means, add one, but I don’t want to be held responsible for anything that gets executed on your server 🙂

      H

  2. Quick question, I’m looking to update my existing mysql database in bulk using CSV as its a once off change, does the CSVdump code you created do this?

    That is, I’m updating existing data on an existing in use table and not importing new data.

    1. Yeah – just untick the replace data checkbox under options.

      If you don’t need to do any processing of the data during/before/after though, I might suggest using phpmyadmin instead

      1. Unfortunately, I’m importing because I’m processing the data via excel. Not a coding wiz so excel/csv for me was the smartest way to do it.

        Awesome tool though! this saves us so much hassle moving forward!.

        1. Quick one, I’m formatting my CSV, and just wanted to know, for the CSVdump code, do I need to include all the columns in the CSV for it to import properly to update existing data?

          E.g if my existing table has 5 columns. do I need all 5 columns in my csv during the import to successfully update the existing data?

Leave a Reply to adminCancel reply