Thursday 23 November 2017

R data.table merge / full outer join with na.fill / nomatch based on formula

itemprop="text">

What I need is to perform full outer
join with some kind of smart na.fill / nomatch in a efficient way. I've already done it
using loop but I would like to use matrix algebra or data.table operations to speed up
the process.



Data below are sample of stock open
orders information, full outer join is performed between datasets of asks open orders
and bids open orders. A dataset are asks, B are bids. Both datasets stores atomic orders
and their cumulative sums. The task is to match all ask orders with bid orders by
cumulative value and vice versa.
Populate example
data:



price =
c(11.25,11.26,11.35,12.5,14.2)
amount = c(1.2,0.4,2.75,6.5,15.2)
A
<- data.table(ask_price = price, ask_amount = amount, ask_cum_amount =
cumsum(amount), cum_value = cumsum(price*amount), ask_avg_price =
cumsum(price*amount)/cumsum(amount))

price =
c(11.18,11.1,10.55,10.25,9.7)
amount = c(0.15,0.6,10.2,3.5,12)
B
<- data.table(bid_price = price, bid_amount = amount, bid_cum_amount =
cumsum(amount), cum_value = cumsum(price*amount), bid_avg_price =
cumsum(price*amount)/cumsum(amount))


regular
full outer join and it's
results:



setkey(A,
cum_value)
setkey(B, cum_value)
C <-
merge(A,B,all=TRUE)

print(C)


na.fill
/ nomatch pseudocode formula, for every row (ask or bid)
where cum_value not matches (please keep in mind that every other field than cum_value
is related to ask OR
bid):



avg_price["current NA"]
<- cum_value["last non NA"]/cum_value["current NA"] * avg_price["last non NA"] +
(1-cum_value["last non NA"]/cum_value["current NA"]) * price["next non
NA"]
cum_amount["current NA"] <- cum_value["current NA"] /
avg_price["current
NA"]


expected
results:




D <-
data.table(
cum_value =
c(1.677,8.337,13.5,18.004,49.2165,115.947,130.4665,151.822,268.222,346.3065),

ask_price = c(NA,NA,11.25,11.26,11.35,NA,12.5,NA,NA,14.2),
ask_amount =
c(NA,NA,1.2,0.4,2.75,NA,6.5,NA,NA,15.2),
ask_cum_amount =
c(0.149066666666667,0.741066666666667,1.2,1.6,4.35,9.66496172396059,10.85,12.3126600707381,20.4097766460076,26.05),

ask_avg_price =
c(11.25,11.25,11.25,11.2525,11.31414,11.9966331281534,12.02456,12.3305605066459,13.1418390633132,13.29392),

bid_price = c(11.18,11.1,NA,NA,NA,10.55,NA,10.25,9.7,NA),
bid_amount =
c(0.15,0.6,NA,NA,NA,10.2,NA,3.5,12,NA),
bid_cum_amount =
c(0.15,0.75,1.23858478466587,1.66517233847558,4.6230572556498,10.95,12.3652404387114,14.45,26.45,NA),


bid_avg_price =
c(11.18,11.116,10.8995364444444,10.8120940902022,10.6458772362927,10.58877,10.5510685899445,10.50671,10.14072,NA)
)
print(D)


Note
that in the expected results the last NA is still as NA, this is because opposite order
could not be matched because the market depth is not enough to fulfill the order at any
price.



Is it possible to get expected results
using matrix algebra or data.table operations or any other efficient way to avoid
looping over full dataset?



Thanks in
advance



itemprop="text">
class="normal">Answer



Merge it
back again with A and B with a
roll to find the last/next non-NA
prices.



E.g. see the output values of
bid_avg_price for these two
merges:



B[merge(A, B, all = T),
roll = Inf]
B[merge(A, B, all = T), roll =
-Inf]


That should give
you all the info you need to compute those quantities.




No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print &q...