Improving speed of a repeat/if loop

Need help with a script? This is the place to discuss how to get your code running!

Improving speed of a repeat/if loop

Postby sctell » Sun Jul 23, 2017 11:33 pm

I am using this script to sum approx 10,000 items in a tab/cr delimited list.

I am finding it very slow.

Any ideas for speeding this up.


Code: Select all
        repeat for each line tRec in tArray
            if (item 6 of tRec = tCategory) and (item 4 of tRec < 0) and (item 1 of tRec >= tStartDate) and (item 1 of tRec < tEndDate) then
                add item 4 of tRec to tSum
            end if
        end repeat



Thanks

Terry
sctell
 
Posts: 1124
Joined: Sun Jul 06, 2008 10:41 am

Re: Improving speed of a repeat/if loop

Postby komouton » Mon Jul 24, 2017 1:20 pm

Hi,

If most of lines are excluded by your conditions, the following script could be slightly rapider.
You can rearrange your four conditions according to the order of the power by which each condition excludes lines. Give the higher priority to the more powerful condition.

If most of lines are included by your conditions, this revision would be little effective.

Code: Select all
  repeat for each line tRec in tArray
    if (item 6 of tRec <> tCategory) then next repeat
    else
      if (item 4 of tRec >= 0) then next repeat
      else
        if (item 1 of tRec < tStartDate) then next repeat
        else
          if (item 1 of tRec >= tEndDate) then next repeat
        end if
      end if
      add item 4 of tRec to tSum
    end if
  end repeat
komouton
 
Posts: 105
Joined: Wed Oct 21, 2009 1:30 pm

Re: Improving speed of a repeat/if loop

Postby sctell » Tue Jul 25, 2017 1:01 am

komouton wrote:If most of lines are excluded by your conditions, the following script could be slightly rapider.


Thanks for your suggestion, it is quicker but not as fast as SQLite.

I have been experimenting with using SC arrays in place of SQLite (FMDB wrapper).

I have found that accessing the array data for display is quicker than SQLite but with regard to summing a column it is slower.

I was hoping that summing a column in a SC array could be quicker but unfortunately it is a lot slower than with SQLite (example below).

Code: Select all
put merge("select sum(amount) as totalValue from '[[tAccountName]]' where amount < 0 and date >= [[tStartDate]] and date < [[tEndDate]] and category = '[[tCategory]]'") into tSQL
put FMDB("executeQueryForFields","MyDB",tSQL,"totalValue") into tAmount


All the best

Terry
sctell
 
Posts: 1124
Joined: Sun Jul 06, 2008 10:41 am

Re: Improving speed of a repeat/if loop

Postby codegreen » Tue Jul 25, 2017 5:23 pm

FWIW while I doubt you'll beat SQLite, at least you should try to minimize the number of chunks you parse and reduce the complexity of the exclusionary tests per-line (since the interpreter won't bail early even if one fails), e.g.:

Code: Select all
repeat for each line tRec in tArray
  if (item 6 of tRec ≠ tCategory) then next repeat
  get item 1 of tRec
  if (it < tStartDate) then next repeat
  if (it >= tEndDate) then next repeat
  add it to tSum
end repeat

HTH,
-Mark
codegreen
 
Posts: 1503
Joined: Mon Jul 14, 2008 11:03 pm

Re: Improving speed of a repeat/if loop

Postby komouton » Tue Jul 25, 2017 7:17 pm

The argument of "add" is not "it (item 1 of tRec)", but "item 4 of tRec".

The condition regarding "item 4" can be removed as follows:

Code: Select all
repeat for each line tRec in tArray
  if (item 6 of tRec ≠ tCategory) then next repeat
  get item 1 of tRec
  if (it < tStartDate) then next repeat
  if (it >= tEndDate) then next repeat
  add min(item 4 of tRec, 0) to tSum
end repeat
komouton
 
Posts: 105
Joined: Wed Oct 21, 2009 1:30 pm

Re: Improving speed of a repeat/if loop

Postby codegreen » Tue Jul 25, 2017 11:08 pm

Oops! My bad.Thanks for the correction.

I actually wrote something very much like that (a line longer I think) the first time around, but then Safari choked while posting and I had to retype it (hurriedly and incorrectly) from memory...
codegreen
 
Posts: 1503
Joined: Mon Jul 14, 2008 11:03 pm

Re: Improving speed of a repeat/if loop

Postby sctell » Tue Jul 25, 2017 11:17 pm

komouton wrote:The argument of "add" is not "it (item 1 of tRec)", but "item 4 of tRec".

The condition regarding "item 4" can be removed as follows:


This is the fastest solution.

It is approaching SQLite but even without measuring I can visibly identify the difference in speed.

codegreen wrote:FWIW while I doubt you'll beat SQLite, at least you should try to minimize the number of chunks you parse and reduce the complexity of the exclusionary tests per-line (since the interpreter won't bail early even if one fails), e.g.:


I assume as the number of records in the array grows the disparity between arrays and SQLite will grow.

Not the solution I was hoping for but a worthwhile exercise nevertheless.

If nothing else, for others searching for an array/SQLite comparison, it demonstrates the way to go.


All the best

Terry
sctell
 
Posts: 1124
Joined: Sun Jul 06, 2008 10:41 am

Re: Improving speed of a repeat/if loop

Postby komouton » Wed Jul 26, 2017 11:59 am

If "item 1 of tRec", tStartDate and tEndDate are always positive integers, and tEndDate is always bigger than tStartDate, the two conditions regarding "item 1 of tRec" can reduce to one.


Code: Select all
repeat for each line tRec in tArray
  if (item 6 of tRec ≠ tCategory) then next repeat
  get item 1 of tRec
  if (it + 1 - tStartDate)*(tEndDate - it) <= 0 then next repeat
  add min(item 4 of tRec, 0) to tSum
end repeat




Last heavy burden may be finding lines that contain tCategory.

I'm sorry I have no knowledge about SQLite, but would you teach me whether the SQLite command you mentioned reads data on disk directly, and whether tArray is a variable of SuperTalk?

Perhaps, the difference of performance between SQLite command and SC array may harbor in the difference of approaching.

From my experience, a task of finding certain lines in a huge variable using repeat-loop per line is likely to be time-cosuming.
komouton
 
Posts: 105
Joined: Wed Oct 21, 2009 1:30 pm

Re: Improving speed of a repeat/if loop

Postby sctell » Thu Jul 27, 2017 7:07 am

komouton wrote:but would you teach me whether the SQLite command you mentioned reads data on disk directly


Yes, SQLite reads directly from a disk based file.

See here:

http://sqlite.org

SQLite is built into MacOS and can be accessed directly from SC using AppleScript or the Command Line.
Forming the syntax by these methods is a little awkward and there is a Objective-C wrapper called FMDB that makes things easier when accessed from Xcode.
I placed FMDB into a SC XFCN some years ago and the example above is a typical method of accessing SQLite from within SC. SC does not have an officially supported XFCN to access SQLite (I wish it did).

See here:

https://github.com/ccgus/fmdb

komouton wrote:and whether tArray is a variable of SuperTalk?


tArray is a tab/return delimited array(list) of SC

komouton wrote:From my experience, a task of finding certain lines in a huge variable using repeat-loop per line is likely to be time-cosuming.


Yes it is but as SQLite requires to access a disk based file rather than in memory array I was hoping that a direct array may be just as quick but alas this is not to be.


All the best

Terry
sctell
 
Posts: 1124
Joined: Sun Jul 06, 2008 10:41 am

Re: Improving speed of a repeat/if loop

Postby komouton » Thu Jul 27, 2017 9:36 am

Thank you for teaching me.

If the purpose is optimizing the approach that sums up values in a certain field of specific records, I will use grep command on disk-based data to extract the specific records.

Your purpose in this thread is, I think, to use SC variables as array with generally expressed (=intuitive) syntax rather than specialized methods.

In this case,

I assume as the number of records in the array grows the disparity between arrays and SQLite will grow.


is a realistic concern, so I think disadvantage of SC Array will be inevitable.
komouton
 
Posts: 105
Joined: Wed Oct 21, 2009 1:30 pm

Re: Improving speed of a repeat/if loop

Postby codegreen » Thu Jul 27, 2017 7:07 pm

I don't know how often you're modifying this table or how big the other items are (or what percentage of records typically meet your search criteria, how often you're performing updates vs searches, and doing this vs other searches) but depending on those things you might want to consider pre-sorting the data in the table.

Since around 4.6 sortFieldByItem has used a well-behaved sort algorithm (heap sort, vs quick sort that was used before) so you can run data through multiple passes to get it sub-sorted by secondary keys. And it's surprisingly fast...

Once the data is pre-sorted you can calculate your sum MUCH more efficiently by using lineOffset to home in on the section of interest, then bail out as soon as you encounter a line that doesn't satisfy your primary search key (e.g., category).

Just thinkin' out loud here...

-Mark
codegreen
 
Posts: 1503
Joined: Mon Jul 14, 2008 11:03 pm


Return to Scripting in SuperTalk

Who is online

Users browsing this forum: Google [Bot] and 2 guests

cron