efficiently counting replications

Don't hold back

efficiently counting replications

PostPosted by drab » Mon Feb 27, 2017 8:16 am

I want to efficiently count the replications of each element in a very long vector.

This simple idiom does exactly what I want

r← +/v∘.=v

but unfortunately the cost of that goes up with the square of the length of the vector, so it is completely unusable for extremely long vectors.

The multiset operator helps.

ru← ∪⍦v

That does efficiently count the replications of the unique elements.

But then I need to redistribute those values back to their positions in the original vector.

Again, there are lots of fairly simple idioms that can do that, but they are usually also very inefficient on very long vectors.

I'm looking for one which is readily usable on very long vectors.

There must be a simple method I haven't yet thought of.
drab
 
Posts: 295
Joined: Thu Oct 09, 2014 6:23 am

Re: efficiently counting replications

PostPosted by forummaster » Mon Feb 27, 2017 8:20 am

What idioms have you tried? Perhaps they can be sped up to be more efficient.
forummaster
 
Posts: 555
Joined: Wed Jan 23, 2013 1:00 pm

Re: efficiently counting replications

PostPosted by drab » Mon Feb 27, 2017 11:11 am

The general problem can get more complicated when trying to deal with both small and large numbers of replications.

I wrote a recursive function to do it. And it can be much more efficient than the original +/v∘.=v idiom. But it's still not as simple or efficient as I would like.

z←reps v;u ru n m m1
u←∪v
n←⌈/ru←∪⍦v
m←v∊(ru=n)/u
z←n×m
→(0∊m)↓0
m1←∼m
z←z+m1\reps m1/v


May the original idiom could be recognized and treated specially.
Or maybe there could be another multiset function.
drab
 
Posts: 295
Joined: Thu Oct 09, 2014 6:23 am

Re: efficiently counting replications

PostPosted by drab » Mon Feb 27, 2017 11:22 am

Actually, come to think of it, my original problem did not need

+/v∘.=v

but instead needed the mask

1<+/v∘.=v

which can be handled fairly nicely and efficiently by multisets.
So my application problem is solved.

But still the puzzle of devising a better idiom remains interesting.
drab
 
Posts: 295
Joined: Thu Oct 09, 2014 6:23 am

Re: efficiently counting replications

PostPosted by drab » Mon Feb 27, 2017 11:33 am

A somewhat limited improvement is

(∪⍦v)+.×(∪v)∘.=v

but that doesn't help much unless there are a LOT of replications.
Also, it only works for numbers.
drab
 
Posts: 295
Joined: Thu Oct 09, 2014 6:23 am

Re: efficiently counting replications

PostPosted by forummaster » Mon Feb 27, 2017 12:16 pm

Here's a quick cut at a solution:

(((⍳⍴v)∊v⍳v)\∪⍦v)[v⍳v]

It can be sped up in various obvious ways such as not calculating v⍳v twice. Also the left arg to expansion can be calculated as

b←(⍴v)⍴0 ⋄ b[v⍳v]←1

Here's a timer function:

    ∇ run;v a t f1 f2 f3 f4
[1] f1←{(((⍳⍴⍵)∊⍵⍳⍵)\∪⍦⍵)[⍵⍳⍵]}
[2] f2←{a←⍵⍳⍵ ⋄ (((⍳⍴⍵)∊a)\∪⍦⍵)[a]}
[3] f3←{a←⍵⍳⍵ ⋄ b←(⍴⍵)⍴0 ⋄ b[a]←1 ⋄ (b\∪⍦⍵)[a]}
[4] v←?10000⍴20
[5] a←+/v∘.=v
[6] :Assert a≡reps v
[7] :Assert a≡f1 v
[8] :Assert a≡f2 v
[9] :Assert a≡f3 v
[10] '+/v∘.=v' ⋄ t←⎕T ⋄ ←+/v∘.=v ⋄ ⎕T-t
[11] 'reps' ⋄ t←⎕T ⋄ ←reps v ⋄ ⎕T-t
[12] 'f1' ⋄ t←⎕T ⋄ ←f1 v ⋄ ⎕T-t
[13] 'f2' ⋄ t←⎕T ⋄ ←f2 v ⋄ ⎕T-t
[14] 'f3' ⋄ t←⎕T ⋄ ←f3 v ⋄ ⎕T-t

and some timings

+/v∘.=v
5.290906805195846
reps
0.5841484741540626
f1
0.17358833315665834
f2
0.11888899287441745
f3
0.1155704146658536


Perhaps this technique will suggest an even faster method.
forummaster
 
Posts: 555
Joined: Wed Jan 23, 2013 1:00 pm

Re: efficiently counting replications

PostPosted by drab » Mon Feb 27, 2017 12:26 pm

That's a very good solution.

Thank you.
drab
 
Posts: 295
Joined: Thu Oct 09, 2014 6:23 am

Re: efficiently counting replications

PostPosted by drab » Mon Feb 27, 2017 12:53 pm

That brings to mind a nifty way to apply an expensive function to a big array with lots of reps ...

z←(f uniquely) r;s u
s←⍴r
u←∪r←,r
z←s⍴(f u)[u⍳r]
drab
 
Posts: 295
Joined: Thu Oct 09, 2014 6:23 am


Return to Open Discussion

Who is online

Users browsing this forum: No registered users and 1 guest

cron