Description Usage Arguments Details Value Author(s) See Also Examples
Find the nearest neighbors of each point in a dataset, using a variety of algorithms.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69  findAnnoy(
X,
k,
get.index = TRUE,
get.distance = TRUE,
last = k,
BPPARAM = SerialParam(),
precomputed = NULL,
subset = NULL,
raw.index = NA,
warn.ties = NA,
...
)
findHnsw(
X,
k,
get.index = TRUE,
get.distance = TRUE,
last = k,
BPPARAM = SerialParam(),
precomputed = NULL,
subset = NULL,
raw.index = NA,
warn.ties = NA,
...
)
findKmknn(
X,
k,
get.index = TRUE,
get.distance = TRUE,
last = k,
BPPARAM = SerialParam(),
precomputed = NULL,
subset = NULL,
raw.index = FALSE,
warn.ties = TRUE,
...
)
findVptree(
X,
k,
get.index = TRUE,
get.distance = TRUE,
last = k,
BPPARAM = SerialParam(),
precomputed = NULL,
subset = NULL,
raw.index = FALSE,
warn.ties = TRUE,
...
)
findExhaustive(
X,
k,
get.index = TRUE,
get.distance = TRUE,
last = k,
BPPARAM = SerialParam(),
precomputed = NULL,
subset = NULL,
raw.index = FALSE,
warn.ties = TRUE,
...
)

X 
A numeric matrix where rows correspond to data points and columns correspond to variables (i.e., dimensions). 
k 
A positive integer scalar specifying the number of nearest neighbors to retrieve. 
get.index 
A logical scalar indicating whether the indices of the nearest neighbors should be recorded. 
get.distance 
A logical scalar indicating whether distances to the nearest neighbors should be recorded. 
last 
An integer scalar specifying the number of furthest neighbors for which statistics should be returned. 
BPPARAM 
A BiocParallelParam object indicating how the search should be parallelized. 
precomputed 
A BiocNeighborIndex object of the appropriate class, generated from 
subset 
A vector indicating the rows of 
raw.index 
A logial scalar indicating whether raw column indices should be returned,
see 
warn.ties 
Logical scalar indicating whether a warning should be raised if any of the 
... 
Further arguments to pass to the respective 
All of these functions identify points in X
that are the k
nearest neighbors of each other point.
findAnnoy
and findHnsw
perform an approximate search, while findKmknn
and findVptree
are exact.
The upper bound for k
is set at the number of points in X
minus 1.
By default, nearest neighbors are identified for all data points within X
.
If subset
is specified, nearest neighbors are only detected for the points in the subset.
This yields the same result as (but is more efficient than) subsetting the output matrices after running findKmknn
with subset=NULL
.
Turning off get.index
or get.distance
will not return the corresponding matrices in the output.
This may provide a slight speed boost when these returned values are not of interest.
Using BPPARAM
will also split the search across multiple workers, which should increase speed proportionally (in theory) to the number of cores.
Setting last
will return indices and/or distances for the k  last + 1
th closest neighbor to the k
th neighbor.
This can be used to improve memory efficiency, e.g., by only returning statistics for the k
th nearest neighbor by setting last=1
.
Note that this is entirely orthogonal to subset
.
If multiple queries are to be performed to the same X
, it may be beneficial to build the index from X
(e.g., with buildKmknn
).
The resulting BiocNeighborIndex object can be supplied as precomputed
to multiple function calls, avoiding the need to repeat index construction in each call.
Note that when precomputed
is supplied, the value of X
is completely ignored.
For exact methods, see comments in ?"BiocNeighborsties"
regarding the warnings when tied distances are observed.
For approximate methods, see comments in buildAnnoy
and buildHnsw
about the (lack of) randomness in the search results.
A list is returned containing:
index
, if get.index=TRUE
.
This is an integer matrix where each row corresponds to a point (denoted here as i) in X
.
The row for i contains the row indices of X
that are the nearest neighbors to point i, sorted by increasing distance from i.
distance
, if get.distance=TRUE
.
This is a numeric matrix where each row corresponds to a point (as above) and contains the sorted distances of the neighbors from i.
Each matrix contains last
columns.
If subset
is not NULL
, each row of the above matrices refers to a point in the subset, in the same order as supplied in subset
.
See ?"BiocNeighborsrawindex"
for an explanation of the output when raw.index=TRUE
for the functions that support it.
Aaron Lun
buildExhaustive
,
buildKmknn
,
buildVptree
,
buildAnnoy
,
or buildHnsw
to build an index ahead of time.
See ?"BiocNeighborsalgorithms"
for an overview of the available algorithms.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21  Y < matrix(rnorm(100000), ncol=20)
out < findExhaustive(Y, k=8)
head(out$index)
head(out$distance)
out1 < findKmknn(Y, k=8)
head(out1$index)
head(out1$distance)
out2 < findVptree(Y, k=8)
head(out2$index)
head(out2$distance)
out3 < findAnnoy(Y, k=8)
head(out3$index)
head(out3$distance)
out4 < findHnsw(Y, k=8)
head(out4$index)
head(out4$distance)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.