sciquence.sequences.max_avg_seq

sciquence.sequences.max_avg_seq()

Given a length n real sequence, finding the consecutive subsequence of length at least L with the maximum average can be done in O(n log L) time. In other words, function maximizes subsequence average, keeping its length equal or greater given value L

Parameters:
  • A (ndarray) – List of float numbers
  • L (int) – Minimal subsequence length
Returns:

  • start (int) – First slice index of found subsequence
  • stop (int) – Second slice index of found subsequence

Examples

>>> from sciquence.sequences import max_avg_seq
>>> import numpy as np
>>> X = np.array([-1, -2, -3, -23, -45, -3, -4, 5, 50, 67, 1, 3, 4, 5])
>>> max_avg_seq(X, 3)
(7, 10)
>>> print X[7:10]
[ 5 50 67]
# We change 50 into -50
>>> Z = np.array([-1, -2, -3, -23, -45, -3, -4, 5, -50, 67, 1, 3, 4, 5])
(9, 12)
>>> print Z[9:12]
[67  1  3]
# In last example, we replace -3 with 600
>>> V = np.array([-1, -2, 600, -23, -45, -3, -4, 5, -50, 67, 1, 3, 4, 5])
(0, 3)
>>> print V[0:3]
[ -1  -2 600]

References

Lin Y.L., Jiang T., Chaoc K.M. (2002). Efficient algorithms for locating the length-constrained heaviest segments with applications to biomolecular sequence analysis

http://www.csie.ntu.edu.tw/~kmchao/papers/2002_jcss.pdf