Automatic segmentation of fish midlines for optimizing robot design

While fish use continuous and flexible bodies to propel themselves, fish robots are often made from interconnected segments. How many segments do robots need to represent fish movements accurately? We propose a new method to automatically determine parsimonious robot models from actual fish data. We first identify key bending points (i.e., joint positions) along the body and then study the concerted movement of the segments so that the difference between actual fish and modelled bending kinematics is minimized. To demonstrate the utility of our method, we analyse the steady swimming kinematics of 10 morphologically distinct fish species. Broadly classified as sub-carangiform (e.g., rainbow trout) and carangiform (e.g., crevalle jack) swimmers, these species exhibit variations in the way they undulate when traditional parameters (including head and tail beat amplitudes, body wavelength and maximum curvature along the body) are considered. We show that five segments are sufficient to describe the kinematics with at least 99% accuracy. For optimal performance, segments should progressively get shorter towards the tail. We also show that locations where bending moments are applied vary among species, possibly because of differences in morphology. More specifically, we find that wider fish have shorter head segments. We discover that once bending points are factored in, the kinematics differences observed in these species collapse into a single undulatory pattern. The amplitude and timing of how body segments move entirely depend on their respective joint positions along the body. Head and body segments are also coupled in a timely manner, which depends on the position of the most anterior joint. Our findings provide a mechanistic understanding of how morphology relates to kinematics and highlight the importance of head control, which is often overlooked in current robot designs.


Introduction
In contrast to man-made propellers, most fish produce thrust by undulating their axial bodies (Bainbridge 1963). In recent years, there has been a significant push to build robots that can swim like fish. However, this is a challenging task. While fish are flexible and have high degrees of freedom at the same time (Wardle et al 1995, Jayne and Lauder 1995, Altringham and Ellerby 1999, fish robots are often made from multiple segments with rigid (to accommodate batteries, electronics, sensors, motors and cables) and compliant parts (to replicate the bending movements of the posterior body and caudal fin). When speed, efficiency and manoeuverability are considered, fish robots are not as good as their biological counterparts and have low technology readiness levels as compared to traditional propeller-driven autonomous or remotely operated underwater vehicles (Eriksen et al 2001, Ribas et al 2011. Given technological limitations, body design is a research area that can benefit from a formalized approach while drawing inspiration from biology. This is of particular significance to address questions like: how do we partition the body and where do we place actuation points so that the movements of a robot resemble those of the fish that it is designed after? What is the most parsimonious robot design (design with the smallest number of segments) that can describe the movements of the fish accurately?
Does one design fit all or how should it vary while modelling different behaviours or fish species?
Actual fish movements are seldomly reported in the literature, and traditional kinematics measurements used by biologists to describe these movements (e.g., head and tail amplitudes, tail beat frequency, body wavelength and maximum curvature) do not lend themselves well to provide pragmatic mechanical engineering guidelines. To mitigate this problem up to a certain extent, many research groups use artificial body midlines generated by a travelling wave equation as a proxy to study fish movements (Liu and Hu 2006, Yu et al 2007, Zhong et al 2017: where t is the time stamp, x is the position of the midline point along the body, w = 2πf (f is the tail beat frequency), k = 2π λ (λ is the body wavelength) and a (x) = C 1 x + C 2 x 2 . The equation parameters (C 1 , C 2 and λ) control how the body bends and can be customized for different swimming modes. The majority of robots are designed after sub-carangiform (e.g., rainbow trout, Oncorhynchus mykiss, in Kruusmaa et al (2014)) and carangiform (e.g., common carp, Cyprinus carpio, in Ozmen ) swimmers; hereinafter we will refer to these two groups as (sub-)carangiform swimmers. In these designs, it is widely assumed that undulatory movements are largely confined to the posterior body, with body amplitudes increasing exponentially towards the tail and body wavelength being around one. Once the artificial midlines are generated according to these criteria, they are used to identify the key bending points to orient robot design.
The travelling wave equation is a good first step in modelling fish swimming, however it has certain limitations. First, there is very little empirical data on how well the midlines generated by the travelling wave equation approximate the actual midlines of fish during steady swimming Hess 1984, Tytell andLauder 2004). Second, it does not specify how the head moves nor where the body wave starts, as both parameters play key roles in swimming performance and vary among species (Lindsey 1978). Third, further studies are needed to evaluate whether the travelling wave equation can be modified to approximate other behaviours observed during turning, C-start, linear acceleration and swimming in unsteady flows, see (Akanyeti and Liao 2013) for modelling Ka´rma´n gaiting.
In this work, we propose a new method which enables roboticists to draw conclusions from real fish data. Assuming that fish movements can be represented with a series of linear segments, the proposed method first identifies the minimum number of segments that can describe fish movements accurately, and then identifies a control architecture which describes the concerted movement of these segments so that the difference between actual and modelled kinematics is minimized. We demonstrate the utility of our method by studying the steady swimming kinematics of 10 representative species from (sub-) carangiform swimmers with distinct body shapes and flexural stiffness.

Problem definition
The input data consists of body midlines which are represented in the form of three-dimensional matrices, M(x, y, t), describing how points along the body (x, y) move in time (t). The data comes from actual fish experiments captured by a high-speed camera with the image plane parallel to the direction of motion (figure 1). Here, we assume that the fish midline points translate and rotate in two dimensions within the imaging plane without rolling, pitching and/or twisting movements. To represent midlines using a multi-segment model we propose two methods (namely, segment growing method and genetic algorithm) to meet the needs of roboticists in two complementary ways. In particular, the segment growing method determines the minimum number of segments required to achieve a certain degree of modelling accuracy (defined by the user a priori), whereas the genetic algorithm is devised to work with a fixed number of segments (defined by the user a priori), calculating the optimal lengths of these segments to maximize accuracy.

Segment growing method
Starting from the most anterior midline point, we create the first segment (S 1 ) with an initial length S init . We grow S 1 with finite increments (ΔS) until Table 1. Segment growing algorithm; where S init and ΔS refer to initial segment length and a finite increment in segment length, respectively. Algorithm inputs (fish midlines and error threshold), output (joint positions) and variables (S init and ΔS) are all normalized to the body length.
Algorithm segmentGrowing() Inputs: fish midlines (M ), error threshold (eTh) Outputs: joint locations (J) Variables: S init , ΔS 1 J = [], S b = 0, S e = S init 2 while S e < 1 3 E = calculateSegmentError (S e , S b , M ) 4 if E < eTh 5 S e += ΔS 6 else 7 S b = S e − ΔS 8 S e = S b + S init 9 add S b to J 10 end 11 end 12 return J the mean difference between S 1 and the actual midline exceeds the error threshold (supplementary figure 1 (https://stacks.iop.org/BB/16/046005/ mmedia)). At this point, we stop growing S 1 and create a joint (J 1 ). We then start growing the second segment (S 2 ) from J 1 until the mean difference between S 2 and the actual midline exceeds the error threshold, and this iterative process continues until we reach the most posterior point along the body. At the end, the segment growing method produces a number of segments with variable lengths, each of which describes a finite portion of the actual midline.

Calculation of individual segment error
For all time frames, t = {1, 2, 3, . . . , T}, we calculate the difference between the segment S i and the corresponding portion of the fish midline M i by measuring the perpendicular distance for each midline point (x j , y j ) that belongs to M i , where S ib = (x ib , y ib ) and S ie = x ie , y ie denote the beginning and end points of the segment S i , respectively and x ib < x j < x ie . We then take the maximum distance (among the midline points) and average it over time to arrive at the final segment error (or mean difference), For each time frame, S ib and S ie are derived from the joint positions (e.g., 0 and J 1 for S 1 , J 1 and J 2 for S 2 , and so on) by transforming the data from one to two dimensions so that they are mapped on to the curvilinear fish midline. The beginning of the first segment is initialized to the most anterior midline point. Similarly, the end of the last segment is fixed at the most posterior midline point.

Overall performance of multi-segment model
From all segments constituting the modelM, we choose the one with a maximum mean difference to summarize the overall performance (supplementary The lower the error, the better the performance. Putting it another way, we equate the overall accuracy of the model to the accuracy of the worst performing segment (i.e., the weakest link).
Segment growing method guarantees parsimonious segment formations through locally optimal decisions, i.e., by finding the longest segment with an error below the desired threshold, for each portion of the fish midline. In this way, it is also guaranteed that the overall model error is kept below the threshold. A greedy algorithm for segment growing method is given in table 1.

Genetic algorithm
The genetic algorithm starts with a fixed number of segments and uses heuristic search to determine the optimal segment lengths so that the overall model performance is maximized, similar to the approach of Bal et al (2016). Beginning with an initial population of random solutions, the genetic algorithm draws principles from the theory of evolution and natural selection where fittest solutions are selected for producing candidate solutions of the next generation. Such an iterative process continues for a fixed number of generations or until an optimality criterion is reached. The genetic algorithm is a popular method in computer science and there are many ways of implementing it (Whitley 1994). Here, we briefly describe the implementation tailored to our problem.

Initialization
In our representation, a solution is a vector of joints connecting the actual segments (J 1 , J 2 , . . . , J S−2 , J S−1 Figure 2. Controlling segments. (a) Illustration of a five-segment model (not drawn to scale) including the head (S 1 ) and body segments (S 2 − S 5 ). Each segment, i, (black line) rotates around an anchor point (filled black circle) where pitch angles are denoted by (θ i ). Segment ends are indicated by empty circles. (b) Pitch angles over three tail beats for S 1 (black), S 3 (dark grey) and S 5 (light grey) are shown as an example where actual measurements and their sine wave approximations are plotted using dashed and solid lines, respectively. Sine wave parameters, pitch amplitude (α i ) and relative timing with respect to the head segment ϕ i − ϕ 1 are also shown.
where S is the number of segments). Each joint has a value between 0 and 1L (indicating the position along the body with L standing for the body length). Each generation consists of a fixed number of candidate solutions (N sol ) that are evaluated together, and the goodness (or fitness) of a solution is calculated using the weakest link approach as previously described in section 2.2.2. In the first generation, we initialize the joint positions (of each solution) quasi-randomly. We apply a constraint to maintain an ascending order in the joint positions (i.e., J 1 < J 2 < · · · < J S−2 < J S−1 ). In the second and following generations, we create candidate solutions (alternatively termed offsprings) from the solutions of the previous generation (parents) using four genetic operators: selection, cloning, crossover and mutation.

Parent selection
During the process of parent selection, we first rank the possible solutions according to their fitness scores and normalize the rankings (r) between 0 (lowest fitness score) and 1 (highest fitness score). Each solution has a probability (P) of being selected as a parent based on its fitness, where z determines the bias towards solutions with high fitness scores. In this approach, solutions are represented as portions in a pie chart and the size of the portion is proportional to the P. We then select parents using the classical roulette wheel method by generating a random number between 0 and 2π and select the solution that the number falls into.

Offspring creation
We select two parents for producing two offsprings, either by (1) cloning, where the resulting offsprings are identical copies of their parents (one offspring per parent), or (2) crossover, where the offsprings carry information from both parents. During crossover, we split parents into two parts (anterior and posterior joint positions) and the offsprings inherit one part from each parent. Offsprings are also allowed to have minute changes in the joint positions through mutations to maintain diversity and explore new solutions locally. The rate and degree of mutations are controlled by two independent parameters. We next filter out those offsprings with invalid joint positions (i.e., removing those joint positions that are either outside of the body limits or not in an ascending order). This iterative process continues until we generate a sufficient number of offsprings to fill the quota, N sol .

Actuating segments
Once a multi-segment model is created (be it through the segment growing method or the genetic algorithm), the next step is to determine the control parameters that actuate these segments. In this study, we focus on the situation of steady swimming. We assume that each segment has one degree of freedom which rotates around a joint (pitching hereinafter) and the motion is periodic, which is a common architecture used in multi-segment fish robots, e.g., (Liu and Hu 2010). To describe the pitch angle, θ, of each segment, i, over time we use a sine equation where two variables α and ϕ controlling the pitch amplitude and timing, respectively, t is the time stamp and ω = 2πf with f being the tail beat frequency. The theoretical basis for this representation comes from previous fish studies where lateral movements of each point along the midline is described using a Fourier series. In this representation, the most significant contribution coming from the first term oscillating at the tail beat frequency (Videler and Hess 1984, Root et al 2007, Akanyeti and Liao 2013. We first estimate f by analyzing the movements of the caudal fin over multiple tail beats. We then estimate α and ϕ (for each segment) using the method of least squares so that the difference between measured and predicted θ is minimized.
At the end, we represent the kinematics of the S-segment model with 2S + 1 motion parameters f , α 1 , ϕ 1 , α 2 , ϕ 2 , . . . , α S−1 , ϕ S−1 , α S , ϕ S and S actuation points (distributed along the body) [r 1 , r 2 , . . . , r S−1 , r S ] which specify the points of rotation for each segment. We divide segments into two groups: head segment (the first segment, S 1 ) and body segments (starting from S 2 ) (figure 2). In this configuration, the head and the most anterior body segments rotate around the same point (r 1 = r 2 ) but they face opposite directions. To standardize pitch timing across datasets, we use the timing of the head movement as a reference to adjust the timing of body movements

Data analysis
Custom-written Octave and Matlab scripts are used to analyse the data. Midline data were normalized with respect to the body length L and all distance measures are reported in L. All results are shown as mean ± standard deviation of the mean, unless stated otherwise.

Parameter selection for segment growing method and genetic algorithm
Because each midline consists of 30 points, we fix S init and ΔS at 0.033 L (for the segment growing method). Our initial investigation with preliminary data suggests that running the genetic algorithm for 10 generations and evaluating 200 solutions per generation (i.e., N sol = 200) is sufficient to produce a meaningful solution in a reasonable amount of time (an example is shown in supplementary figure 3). The probability of crossover over cloning is set to 0.3. During crossover, parents with an even number of joints are divided into two equal halves whereas parents with an odd number of joints are divided into two parts where the anterior part has one extra joint. We equate z to 8. The rate and degree of mutation are fixed at 5% and 0.01L, respectively. During mutation, there is a 50-50 chance that the joint position would move in the anterior (or posterior) direction.

Agreement between segment growing method and genetic algorithm
We first check whether the segment growing method and the genetic algorithm produce similar multisegment models. We fit a linear regression comparing joint positions generated by these two methods and we calculate similarity using the slope of the regression and the coefficient of determination (namely, the R 2 value). We hypothesize that the slope and R 2 values will be close to one (which corresponds to perfect match) indicating strong agreement between the two methods. We also visualize the Bland-Altman plot to evaluate the limit of agreement between two methods.

Change of model performance vs increasing number of segments
To estimate the relationship between the model performance (y) and the number of segments (x), we use a power equation where the coefficients a and b are estimated using the least squares method. Assuming that the model error will decrease with respect to an increasing number of segments (b < 0), the magnitude of b indicates the decay constant. The goodness of the fit is evaluated by the R 2 value. We use both methods to systematically generate models with a different number of segments (up to 10 segments). In the segment growing method, we generate multiple models by systematically increasing the error threshold from 0.001L to 0.1L with 0.001L increments and select the model with the best overall performance (for each segment number). In the genetic algorithm, we simply enter the desired number of segments and the algorithm looks for the best combination of segment lengths to minimize the error.

Comparison with equal-length segment and single-segment models
Both the segment growing method and the genetic algorithm produce models that may have variablelength segments (variable-length segment models hereinafter). For instance, the model shown in the supplementary figure 1 has its first segment longer than the second segment, the second segment longer than the third segment, and so on. How important is it to have variable-length segments? To address this question, we compare the performance of the variable-length segment models to the models with equal-length segments. Equal-length segment models are often used in robot designs for simplicity and very few studies have formally investigated the efficacy of variable length approach (Yu et al 2007, Su et al 2014. We perform the comparison in two different ways. First, we repeat the analysis described in section 2.6.3 for the equal-length segment models. While increasing the number of segments, we predict that the error will decrease faster (i.e., with a larger magnitude of b) in the variable-length segment models than in equal-length segment models. Second, we calculate the relative change in error (Δe) using: where e 1 and e 2 are the errors of the variable-length and equal-length segment models, respectively, and Δe varies within the range of 0% (no change, e 1 = e 2 ) to 100% (e 1 = 0 and e 2 > 0). We also use equation (8) to compare the performance of the variable-length segment models to that of the single-segment model, which represents the entire fish body using one segment like a rigid plate. In swimming studies, rigid plates such as hydrofoils (Anderson et al 1998) are often used as a benchmark while evaluating the contribution of undulatory movements in propulsion.

Species comparison
We obtain a five-segment model to describe the swimming kinematics of every fish in the multi-species dataset. We study how segment lengths and motion parameters vary among the models. We hypothesize that the segment configuration of each species is related to its morphology. To begin to test this hypothesis, we check whether there is a correlation between head segment length (or the position of the most of anterior joint) and the maximum body width by fitting a linear regression using the least squares method.
To evaluate whether there is a common propulsive strategy across fishes, we study a family of equations (e.g., linear regression, polynomials, power and exponential growth) to describe how pitch amplitude and phase varies across body segments (i.e., α i , ϕ i = f (r i ) where i = 2, 3, 4, 5). To minimize the risk of overfitting, we choose the simplest model with R 2 > 0.9. We check how much the head amplitude decreases while increasing segment length. To reconcile head and body kinematics, we also check whether there is a correlation between the head and the tail beat amplitudes.

Traditional kinematics analysis
During steady swimming, midline points oscillate from side to side at the tail beat frequency. For each species, we measure the peak-to-peak amplitude and timing (phase) of these oscillations to evaluate how midline points move with respect to each other. We estimate head and tail amplitudes from the first and last point along the midline, respectively. To calculate the body wavelength, we first estimate the body wave speed from the phase envelope (how phase values change along the body) and then divide the speed by the tail beat frequency. We also study body curvature (defined as the reciprocal of the radius of a circle) and how it changes along the midline. For each midline point, we calculate curvature by taking two adjacent points into account (one anterior and posterior). We average curvature values over time and report the maximum mean curvature and where it occurs along the body. We calculate the Spearman rank correlation to evaluate whether there is a statistical relationship between these kinematics variables and the segment lengths in five-segment multi-species models.

Results
There is a strong agreement between the models generated by the segment growing method and the genetic algorithm. For both datasets, the two algorithms converge on models with the same number of segments and joint positions (supplementary figure 4). Therefore, we only present results generated by the segment growing method to prevent repetition. We find that those models with variable-length segments consistently perform better (i.e., producing a lower error) than the models with equal-length segments ( figure 3(a)). While the error decreases with number of segments in a non-linear fashion (first rapidly and then slowly similar to an exponential decay), the decay rate is faster in the variable-length segment models (b = −1.5 ± 0.1) than in the equallength segment models (b = −1.3 ± 0.1). This suggests that all else being equal, the variable-length approach produces more parsimonious models with fewer segments than the equal-length approach. For instance, a model with four variable-length segments is sufficient to mimic the actual fish midlines with less than 0.01L error (supplementary figure 5), whereas the equal-length approach requires at least two more segments (six segments in total) to achieve a similar result (supplementary figure 6). Otherwise, models with four equal-length segments fail to describe the movements of the posterior body adequately (in particular, towards the caudal fin see supplementary figure 7).
We observe that, for the optimal performance, anterior segments should be longer than the posterior segments regardless of how many segments models may have (supplementary figure 9). Regarding models with up to five segments, the segments gradually become shorter as we move along the body, e.g., 0.39 ± 0.05 L (S 1 ) > 0.28 ± 0.02 L (S 2 ) > 0.18 ± 0.01 L (S 3 ) > 0.15 ± 0.01 L (S 4 ) in the four-segment model and 0.27 ± 0.06 L (S 1 ) > 0.26 ± 0.03 L (S 2 ) > 0.2 ± 0.01 L (S 3 ) > 0.14 ± 0.01 L (S 4 ) > 0.13 ± 0.01 L (S 5 ) in the five-segment model. In models with a bigger number of segments, we see the same trend apart from the first and the last segments, which are used to describe the head and caudal fin movements, respectively.
Our results from the multi-species dataset suggest that using five segments is more than sufficient to describe the kinematics of steady swimming (the average model error is 0.005 ± 0.001 L). Similar to the trout dataset, we find that in all species the  anterior segments are longer than the posterior segments (figure 4). However, there is a substantial variation among species in the way that the anterior body is partitioned. We find that wider fish have shorter head segments (supplementary figure 10). For instance, in the crevalle jack (maximum body width = 0.14 L) the head segment is short and ends at the base of the cranium (0.19 L), whereas in the Florida gar (maximum body width = 0.08 L) the head segment is much longer and spans half of the body (0.48 L). Moving along the body, we see less and less variation in the segment lengths. In particular, we note that the standard deviations from the mean length are 0.09 L (S 1 ), 0.06 L (S 2 ), 0.04 L (S 3 ) and 0.02 L (S 4 ).
Our kinematics analysis shows that the 10 (sub-) carangiform species studied here exhibit large variations in the kinematics parameters traditionally used to study steady swimming: speed (1 L s −1 -3.1 L s −1 ), body wavelength (0.7 L -1.1 L), tail beat amplitude (0.12 L-0.21 L), head amplitude (0.01 L -0.07 L), maximum curvature (3.3 L -5.3 L) and maximum curvature point along the body (0.83 L -0.9 L) (supplementary figure 11 and supplementary table 1). We see that these variations are reflected in the segment formation of our models (supplementary table 2); most notably, fish swimming with longer wavelength have longer segments in the posterior body.
We discover that as diverse as these fishes are one control strategy may unite them all. There exists a simple relationship between the location of a segment and how it moves, and this relationship does not change whether we study barracuda (elongated body), trout (fusiform body) or sheepshead (laterally compressed). For segments representing the fish body, pitch amplitude increases exponentially ( figure 5(a)) and pitch phase increases linearly ( figure 5(b)) in accordance with the joint position. For segments representing the head, pitch amplitude decreases with length: the longer the head, the smaller the pitch (figure 6(a)) and there is a coupling between head and body segments with a phase difference correlated with the position of the most anterior joint.
We also find a positive correlation between the head and tail beat amplitudes, where tail beat amplitude increases with head amplitude ( figure 6(b)).

Discussions
We have described a novel approach which can automatically translate undulatory fish movements into empirical design guidelines for roboticists. In our approach, we first partition the fish body into a series of interconnected segments. We then study the relative movements of these segments with respect to each other. In this study, we have demonstrated the usefulness of this approach by analyzing the steady swimming kinematics of 10 (sub-)carangiform swimmers. We have presented the optimal segment formation and control parameters that allow accurate description of the kinematics of these fishes (table 2), and that can be used as a resource for future robotics work.
Below, we discuss our findings within the context of current fish robots described in the literature which vary in body lengths (from 17 to 66 cm), number of segments (from three to six), relative length of each segment (e.g., head segment length varies from 0.2 to 0.8 L), having rigid versus compliant segments and actuated versus passive joints (figure 7).

Accurate description of steady swimming kinematics with five segments
Our analysis suggests that five-segment models can approximate the undulatory movements of (sub-) carangiform swimmers during steady swimming with at least 99% accuracy (model error <0.01L). For optimal performance, the segments should become shorter moving towards the tail; this pattern is consistent for all species studied here. Although, there are not many fish robots designed this way, several studies have already showed that optimizing segment lengths (using real fish data) can improve the performance of freely swimming robots (Yu et al 2007).
How closely do robots need to mimic the midline kinematics of fishes? We have shown that the accuracy of multi-segment models increases with number of segments. However, we still do not know how segment number affects the actual performance of a robot, which is typically evaluated in terms of swimming speed and power efficiency. In other words, with all else being equal, can a five-segment robot swim faster than a four-segment robot without spending more energy? And if so by how much? Our modelling approach cannot directly address these questions as it does not consider internal and fluid forces while computing optimal segment formation. However, it offers reasonable starting points which may help exploring the design space more effectively. To systematically evaluate the swimming performance and hydrodynamic effects of different segment configurations, further experiments including physical robots (Yu et al 2007) and computational fluid dynamics simulations (Eloy 2013, Liu et al 2017 are required.
We recognize that it is not easy to manufacture robots with many small segments that are actuated independently. Perhaps it is not a coincidence that we see more robot designs which substitute posterior segments with a compliant body and a passive, flexible tail. In recent years, there is a growing amount of evidence suggesting that soft, underactuated fish robots can achieve better swimming performance than multi-segment rigid robots (Zhong et al 2018). However, to design and control these robots effectively, we need more comprehensive theoretical models which can link kinematics to interactions between motor commands, soft body properties and the fluid environment accurately (Zhong et al 2018, Epps et al 2009.

No proto-(sub-)carangiform swimmer
During steady swimming, fishes exhibit a wide range of undulatory kinematics that can be broadly classified into four swimming modes; anguilliform, sub-carangiform, carangiform and thunniform after (Breder 1926, Lindsey 1978. The majority of fish robots are designed after an abstract (sub-) carangiform swimmer with the following general design characteristics: the body has a fusiform shape, body amplitudes increase exponentially towards the tail, and the tail beat amplitude and body wavelength are assumed to be around 0.2 L and 1 L, respectively (Fiazza et al 2010). Here, we have shown that this simplified approach does not encapsulate the morphological and kinematics differences we observe in (sub-)carangiform swimmers. For instance, the body width, the tail beat amplitude and the body wavelength of the 10 species studied here vary greatly; their coefficient of variation (standard deviation divided by mean) are 0.24, 0.18 and 0.13, respectively. Similarly, the key locations where bending moments are applied (especially in the anterior part) are distinct for each species (the coefficient of variation is 0.28) and very likely stemming from differences in morphology. Our preliminary analysis between morphology and kinematics suggest that the maximum body width is a good predictor of the head segment length (wider fish have shorter head segments).

One body control strategy for all fishes
Our findings suggest that as diverse as (sub-) carangiform swimmers are, they may all use the same movement strategy to undulate. How much and when a segment moves is determined simply by its joint's location along the body and this relationship does not Table 2. Optimal actuation points (where segments rotate around) and control parameters of five-segment models (multi-species dataset). Note that S 1 and S 2 have the same actuation points but face the opposite directions. change whether we study the movements of two different segments in the same fish or the same segments in two different fish. What separates fishes, however, are the joint locations where bending moments are applied. Once these locations are factored in, the kinematic diversity collapses into a single swimming pattern that is governed by a simple formula. What this means is that the same control algorithm can generate different swimming styles, simply by changing the segment formation. For instance, a robot can switch from swimming like a trout to swimming like a barracuda by lengthening the most anterior segment.
Fishes are tuned to swim with high swimming efficiency (Anderson et al 1998, Taylor et al 2003, Gazzola et al 2014, Nangia et al 2017. Our kinematics modelling is informative for roboticists as it provides clear guidelines on how to build a multi-segment robot and how body segments should move with respect to each other on the basis of biological evidence. However, it does not tell us how these movements can be realized in a real robot. We still do not know whether each segment should be actuated independently, or passive tail joints would suffice achieving fish-like bending kinematics. For instance, a recent study has shown that partitioning the tail with passive joints can reduce the cost of transport up to 50% in a tuna-inspired robot (White et al 2020). Similarly, in the case of hybrid robots with rigid body segments in the anterior body and compliant segments in the posterior body, our models are still informative while choosing the position of the anterior joints. In addition, once built, a robot will need a dynamic controller, which incorporates mechanical and hydrodynamic forces, to generate timely motor commands that can lead to the desired body movements. A number of control architectures featuring central pattern generators (Bal et al 2019), proportional-integral-derivative (Salumäe and Kruusmaa 2013) and fuzzy controllers (Liu and Hu 2006), and principles from Braitenberg vehicles (Salumäe et al 2012) have been successfully applied to drive fish robots.

Active head control may improve swimming performance
We show here that longer fish rotate their head less (pitch amplitude decreases with increasing head length). In all fishes, head and body movements are coupled where body segments follow the lead of the head in a timely manner. These two relationships, in return, positively correlate with the variations we see in tail beat amplitude (e.g., fish with larger head oscillations have larger tail beat amplitudes).
We have recently observed a similar relationship in accelerating fishes . Our findings suggest the importance of head control in swimming, which is often overlooked when building fish robots. Until now, design efforts in robotics have mainly focussed on the control of the posterior body. In these robots, the head may constitute a large portion of the anterior body and it usually oscillates passively due to reaction forces and torques generated by the undulatory movements of the posterior body. In contrast, there is mounting evidence in the fish literature suggesting that timely head movements improve swimming performance, be it through reducing hydrodynamic drag (Lighthill 1969) or producing thrust (Gemmell et al 2016, Lucas et al 2020. Active head movements can also aid in directional control such as turning (Gray 1933), attacking prey (Nelson and Maciver 1999), feeding (Richard and Wainwright 1995), enhancing flow sensing (Akanyeti et al 2016).

Future work
We recognize that modelling fish movements in twodimensions using a series of interlinked rigid body segments is rather simplistic. In reality, fish move in three-dimensions (e.g., cupping motion of the caudal fin) and modulate the stiffness of the body and fin rays in real-time to improve swimming efficiency (Long and Nipper 1996, Esposito et al 2012, Tytell et al 2010. It remains to be seen how optimal segment formation and control would change if the tail segment is allowed to be compliant, pitching and rotating at the same time. Our comparative analysis on a multi-species dataset is limited to one individual swimming at one speed per species. Our ongoing work focuses on studying the effects of a fish's speed and size upon multiple individuals, and on characterizing the bending kinematics of anguilliform and thunniform swimmers. Our initial investigation from preliminary data suggests that our findings can be extended to describe the bending kinematics of thunniform swimmers accurately. In contrast, different segment formations (i.e., more equal-length segments) are needed to describe the kinematics of anguilliform swimmers; especially when they swim with shorter wavelengths (unpublished data).
We are also looking at how the bending kinematics of a fish change across behaviours (e.g., Ka´rma´n gaiting, accelerating forward to catch a prey, turning and escape responses). We predict that fast behaviours exhibiting large body amplitudes and curvatures (e.g., C-start) will require higher fidelity models with many segments than behaviours with subtle body movements (e.g., gliding). Similarly, the segment formation may vary dynamically during intermittent swimming such as burst and gliding where fish alternate periodically between forward linear acceleration and powerless gliding. Through these further studies, we can begin to build a comprehensive dictionary which translates the behaviour repertoire of fishes into pragmatic design solutions for roboticists.

Data
Fish midline data and our scripts are available on request. Table 1 presents the joint positions, pitch amplitude and pitch phase for each fish in the multispecies dataset (only for five-segment models). Supplementary table 1 presents the kinematics variables (including the body length and swimming speed) of each fish in the multi-species dataset.

Data availability statement
All data that support the findings of this study are included within the article (and any supplementary files).