Create_pages import script, memory leak?

Hi,
I'm trying to populate the system (Solaris, v3.6.1, Oracle) with random pages to perform some tests but the cli seems to be running out of memory. With a php cli set at 32M, I seem to be able to create only about 570 pages or so…



Has anybody managed to create a lot of pages using the create_pages script?

Cheers.

Anyone?

The answer is "No" under PHP4. I also max out around 300 - 400 pages.

Thanks Avi.

You could try setting the memory limit to -1 (which removes the limit) to see how far you get. :slight_smile: I do NOT recommend this on a Production server for hopefully obvious reasons.

If you have PCNTL compiled into PHP, you could use pcntl_fork() every 50 assets or so. By doing this, you essentially create a new instance of the script, create 50 assets, quit the script (freeing up all memory), and start a new script to create 50 more.


That's the process I use when I write imports scripts, and I have created upwards of 4000 assets without running out of memory.


    #!/usr/bin/php
    <?php
    
    function fork()
    {
    	$child_pid = pcntl_fork();
    	switch ($child_pid) {
      case -1:
     	 trigger_error('Uh-oh, Spaghettios!');
     	 return null;
      break;
      case 0:
     	 return $child_pid;
      break;
      default :
     	 $status = null;
     	 pcntl_waitpid(-1, $status);
     	 return $child_pid;
      break;
    	}
    
    }
    
    
    // This forks for EVERY asset, you would do % (modulus) 50 or something and do it in chunks...
    
    for ( $ii = 0; $ii < 1000; $ii++) {
    	$pid = fork();
    	if (!$pid) {
      // Asset creation code goes here
      // ...
    
      echo 'The current PID of this forked process is '.posix_getpid(), "\n";
    
      // Exit from the forked process and return to the parent
      exit;
    	}
    }
    
    ?>

Just a note that you need PCNTL support for the Squiz Server to work (it forks processes as well). So, if your Squiz Server is working, then you have PCNTL support in PHP. :)

That's a great lead, I'll give it a spin.
Thanks Nathan and Avi.

This is what I’m getting as soon as it forks:

    ±----------------
| MySource Error
|-------------------------
| [ASSERT EXCEPTION] DB Error: unknown error                                                                        |
| SELECT NEXTVAL(‘sq_internal_msg_seq’) [nativecode=server closed the connection unexpectedly                       |
|       This probably means the server terminated abnormally
|       before or while processing the request.] (LINE 279 IN [SYSTEM_ROOT]/core/include/internal_message.inc) [SYS0270] |
±----------------------


It’s actually the first time I play around with forking so I wouldn’t be surprised if I was doing something wrong.



I inspired myself from Nathan’s code but haven’t done exactly the same thing. The reason for that is that my import script is based on create_page and Buggy’s migrator’s script. As Buggy did in his script, I first produce a result.php which is a sequence of functions.



e.g.:
    <?php
myCreateAsset(…);
myCreateAsset(…);
?>


Then in a second script I require_once the result.php which results in the functions being called one after the other.



At the start of the function I have the following:
    function myCreateAsset( $intId, $intPId, $pagename, $webpath)
    {
        global $assetMap;
        global $forkIndex;
    
        $forkIndex++;
        if( $forkIndex % 2 == 0 ) // Fork every second asset: TODO increase value :P
         {
            $forkIndex    = 0;
            $pid = pcntl_fork();
    
            if     ( $pid <  0) { die("error"); }
            else if( $pid >  0) { $status = null; pcntl_waitpid(-1, $status); exit; }
            else                { echo "Child continuing: $pid\n";print_r($assetMap); /*Continue*/ }
        }
    
    ...Create asset code goes here....
    }


Another reason I didn't exactly followed Nathan's code is that I don't see how you could easily put the % for every 50 assets.

Anybody knows what's happening (Cf. the error at the start of the post)?

If I remember correctly, creating the DB connection outside of the pcntl_fork() (ie. including init.inc) will cause problems. The connection will be broken on the next pcntl_fork().

[quote]If I remember correctly, creating the DB connection outside of the pcntl_fork() (ie. including init.inc) will cause problems. The connection will be broken on the next pcntl_fork().
[right][post=“8382”]<{POST_SNAPBACK}>[/post][/right][/quote]



Just before including init.inc, I forked: it now seems to work.

I’ll try now to import a few thousands assets and see whether my memory issues are fixed.



PS: A quick look at top after inserting 10 pages or so and no php process with constant increasin memory usage is visible.

[quote]Just before including init.inc, I forked: it now seems to work.
[right][post=“8386”]<{POST_SNAPBACK}>[/post][/right][/quote]Ah, sorry. I should have mentioned that in my post. It might take a little longer for each asset to be created considering you’re loading up Matrix every time you fork…so perhaps fork every 50 assets.

[quote]Ah, sorry. I should have mentioned that in my post. It might take a little longer for each asset to be created considering you’re loading up Matrix every time you fork…so perhaps fork every 50 assets.
[right][post=“8401”]<{POST_SNAPBACK}>[/post][/right][/quote]Sorry, I should have been more specific. I am loading Matrix only once…afaik

I’m not sure if it is exactly what Greg was suggesting but I did the following (which works):


    [.. more code ..]
    $pid = pcntl_fork();
    
    if     ( $pid <  0) { die("error"); }
    else if( $pid >  0) { $status = null; pcntl_waitpid(-1, $status); exit; }
    else                { echo "Child continuing..."; }
    
    require_once $SYSTEM_ROOT.'/core/include/init.inc'; 
    
    function myCreateAsset(...)
        { [.. see post above for code ..] }
    
    require_once($inFile); // Reads in result.php containing calls to myCreateAsset(...)
    [... more code ...]

There is obviously more to forking than what the man page or the PHP manual say. DB connection seem to be lost only if there are created outside any fork. Once in a child process, future generations will not lose it. I don't understand why but it's ok for now…