.. _curve_fit_guide: *********************************** Curve Fitting in the Stoner Package *********************************** .. currentmodule:: Stoner Introduction ============ Many data analysis tasks make use of curve fitting at some point - the process of fitting a model to as set of data points and determining the co-efficients of the model that give the best fit. Since this is such a ubiquitous task, it will be no surprise that the Stoner package provides a variety of different algorithms. The choice of algorithm depends to a large part on the features of the model, the constraints on the problem and the nature of the data points to be fitted. In order of increasing complexity, the Stoner package supports the following: - `Simple polynomial fits`_ If the model is simply a polynomial function and there are no uncertainties in the data and no constraints on the parameters, then this is the simplest and easiest to use. This makes use of the :py:meth:`Data.polyfit` method. - `Simple function fitting`_ If you need to fit to an arbitrary function, have no constraints on the values of the fitting parameters, and have uncertainties in the *y* coordinates but not in the *x*, then the simple function fitting is probably the best option. The Stoner package provides a wrapper around the standard :py:func:`scipy.optimize.curve_fit` function in the form of the :py:meth:`Data.curve_vit` method. - `Fitting with limits`_ If your problem has constrained parameters - that is there are physical reasons why the parameters in your model cannot take certain values, the you probably want to use the :py:meth:`Data.lmfit` method. This works well when your data has uncertainties in the *y* values but not in *x*. - `Orthogonal distance regression`_ Finally, if your data has uncertainties in both *x* and *y* you may want to use the :py:meth:`Data.odr` method to do an analysis that minimizes the distance of the model function in both *x* and *y*. - `Differential Evolution Algorithm`_ Differential evolution algorithms attempt to find optimal fits by evaluating a population of possible solutions and then combining those that were scored by some costing function to be the best fits - thereby creating a new population of possible (hopefully better) solutions. In general some level of random fluctuation is permitted to stop the minimizer getting stuck in local minima. These algorithms can be effective when there are a karge number of parametgers to search or the cost is not a smooth function of the parameters and thus cannot be differentiated. The algorithm here uses a standard weighted variance as the cost function - like *lmfit* and *curve_fit* do. Why Use the Stoner Package Fitting Wrappers? -------------------------------------------- There are a number of advantages to using the Stoner package wrappers around the the vartious fitting algorithms rather than using them as standalone fitting functions: #. They provide a consistent way of defining the model to be fitted. All of the Stoner package functions accept a model function of the form: f(x,p1,p2,p3), constructing the necessary intrermediate model class as necessary - similatly they can all take an :py:class:`lmfit.model.Model` class or instance and adapt that as necessary. #. They provide a consistent parameter order and keyword argument names as far as possible within the limits of the underlying algorithms. Gerneally these follow the :py:func:`scipy.optimize.curve_fit` conventions. #. They make use of the :py:attr:`Data.setas` attribute to identify data columns containing *x*, *y* and associated uncertainties. They also probvide a common way to select a subset of data to use for the fitting through the *bounds* keyword argument. #. They provide a consistent way to add the best fit data as a column(s) to the :py:class:`Data` object and to stpore the best-fit parameters in the metadata for retrieval later. Since this is done in a consistent fashion, the package also can probide a :py:meth:`Data.annotate_plot` method to diisplay the fitting parameters on a plot of the data. Simple polynomial Fits ====================== Simple least squares fitting of polynomial functions is handled by the :py:meth:`Data.polyfit` method:: a.polyfit(column_x,column_y,polynomial_order, bounds=lambda x, y:True,result="New Column") This is a simple pass through to the numpy routine of the same name. The x and y columns are specified in the first two arguments using the usual index rules for the Stoner package. The routine will fit multiple columns if *column_y* is a list or slice. The *polynomial_order* parameter should be a simple integer greater or equal to 1 to define the degree of polynomial to fit. The *bounds* function follows the same rules as the *bounds* function in :py:meth:`Data.search` to restrict the fitting to a limited range of rows. The method returns a list of coefficients with the highest power first. If *column_y* was a list, then a 2D array of coefficients is returned. If *result* is specified then a new column with the header given by the *result* parameter will be created and the fitted polynomial evaluated at each point. Fitting Arbitrary Functions ========================== Common features of the Function Fitting Methods ----------------------------------------------- he output of the three methods used to fit arbitrary functions depend on the keyword parameters *output*, *result* *replace* and *header* in the method call. - *output="fit"* The optimal parameters and a variance-covariance matrix are returned - *output="row"* A 1D numpy array of optimal values and standard errors interleaved (i.e. p_opt[0],p_error[0],p_opt[1],p_error[1]....) us returned. This is useful when analysing a large number of similar data sets in order to build a table of fitting results. - *output="report"* A python object representing the fitting reult is returned. - *output="data"* A copy for the data file itself is returned - this is most useful when used in conjunction with :py:class:`Stoner.DataFolder` - *oputput="full"* As much information about the fit as can be extracted from the fitting algorithm is returned. If *result* is not None, then the best fit data points are calculated and also the fitting parameters, errors and :math:`\chi^2` value is calculated and added to the metadata of the :py:class:`Data` object. To distinguish between multiple fits, a *prefix* keyword can be given, otherwise the default is to use a prefix derived from the name of the model that was fitted. *result* and *replace* control where the best fit datapoints are added to the :py:class:`Data` obhject. If *replace* is **True** then it is added to the end of the :py:class:`Data` object and *replace* is ignored. Otherwise, *result* is interpreted as something that can be used as a column index and *replace* determines whether the new data is inserted after that column, or overwrites it. If *residuals* is not None or False then as well as calculating the best fit values, the difference between these and the data points is found. The value of *residuals* is either a column index to store the residuals at (or in if *replace* is **True**) or if **True**, then the column after the one specofied by *result* is used. In all cases *header* specifies the name of the new header - if omitted, it will default to 'Fitted with {model name}'. The utility method of the :py:class:`Stoner.Core.Data`, :py:meth:`Stoner.Util.Data.annotate_fit` is useful for adding appropriately formatted details of the fit to the plot (in this case for the case of `Simple function fitting`_). .. plot:: samples/curve_fit_line.py :include-source: :outname: curve_fit_line The *bounds* function can be used to restrict the fitting to only a subset of the rows of data. Any callable object which will take a float and an array of floats, representing the one x-value and one complete row and return True if the row is to be included in the fit and False if not. e.g.:: def bounds_func(x,row): """Keep only data points between (100,5) and (200,20) in x and y. x (float): x data value row (DataArray): complete row of data.""" return 100`_. . The :py:meth:`Data.lmfit` method is used to interact with lmfit. In order to use :py:meth:`Data.lmfit`, one requires a :py:class:`lmfit.model.Model` instance. This describes a function and its independent and fittable parameters, whether they have limits and what the limits are. The :py:mod:`Stoner.Fit` module contains a series of :py:class:`lmfit.model.Model` subclasses that represent various models used in condensed matter physics. The operation of :py:meth:`Data.lmfit` is very similar to that of :py:meth:`Data.curve_fit`:: from Stoner.analysis.fitting.models.thermal import Arrehenius model=Arrehenius(A=1E7,DE=0.01) fit=a.lmfit(model,xcol="Temp",ycol="Cond",result=True,header="Fit") print fit.fit_report() print a["Arrehenius:A"],a["Arrehenius:A err"],a["chi^2"],a["nfev"] In this example we would be fitting an Arrehenius model to data contained in the 'Temp' and 'Cond' columns. The resulting fit would be added as an additional column called fit. In addition, details of the fit are added as metadata to the current :py:class:`Data`. The *model* argument to :py:meth:`Data.lmfit` can be either an instance of the model class, or just the class itself (in which case it will be instantiated as required), or just a bare callable, in which case a model class will be created around it. The latter is approximately equivalent to a simple call to :py:meth:`Data.curve_fit`. The return value from :py:meth:`Data.lmfit` is controlled by the *output* keyword parameter. By default it is the :py:class:`lmfit.model.ModelFit` instance. This contains all the information about the fit and fitting process. You can pass the model as a subclass of model, if you don't pass initial values either via the *p0* parameter or as keyword arguments, then the model's *guess* method is called (e.g. :py:meth:`Stoner.analysis.fitting.models.thermal.Arrhenius.guess`) to determine parameters fromt he data. For example: .. plot:: samples/lmfit_example.py :include-source: :outname: lmfit_example Orthogonal distance regression ------------------------------ :py:meth:`Data.curve_fit` and :py:meth:`Data.lmfit` are both essentially based on a Levenberg-Marquardt fitting algorithm which is a non-linear least squares routine. The essential point is that it seeks to minimize the **vertical** distance between the model and the data points, taking into account the uncertainty in the vertical position (ie. *y* coordinate) only. If your data has peaks that may change position and/or uncertanities in the horizontal (*x*) position of the data points, you may be better off using an orthogonal distance regression. The :py:meth:`Data.odr` method wraps the :py:mod:`scipy.odr` module and tries to make it function as much like :py:mod:`lmfit` as possible. In fact, in most cases it can be used as a drop-in replacement: .. plot:: samples/odr_simple.py :include-source: :outname: odrfit2 The :py:meth:`Data.odr` method allows uncertainties in *x* and *y* to be specified via the *sigma_x* and *sigma_y* parameters. If either are not specified, and a *sigma* parameter is given, then that is used instead. If they are either explicitly set to **None** or not given, then the :py:attr:`Data.setas` attribute is used instead. Differential Evolution Algorithm -------------------------------- When the number of parameters gets large it can get increasingly difficult to get fits using the techniques above. In these situations, the differential evolution approach may be valuable. The :py:meth:`Stoner.Data.differential_evolution` method provides a wrapper around the :py:func:`scipi.optimize.differential_evolution` minimizer with the advantage that the model specification, and calling signatures are essentially the same as for the other fitting functions and thus there is little programmer overhead to switching to it: .. plot:: samples/differential_evolution_simple.py :include-source: :outname: diffev2 Intrinsically, the differential evolution algorithm does not calculate a variance-covariance matrix since it never needs to find the gradient of the :math:`\chi^2` of the fit with the parameters. In order to provide such an estimate, the :py:meth:`Stroner.Data.differential_evolution` method carries out a standard least-squares non-linear fit (using :py:func:`scipy.optimize.curve_fit`) as a second stage once :py:func:`scipy.optimize.differential_evolution` has foind a likely goot fitting set of parameters. This hybrid approach allows a good fit location to be identified, but also the physically useful fitting errors to be estimated. Included Fitting Models ======================= The :py:mod:`Stoner.analysis.fitting.models` module provides a number of standard fitting models suitable for solida state physics. Each model is provided either as an :py:class:`lmfit.model.Model` and callable function. The former case often also provides a means to guess initial values from the data. Elementary Models ----------------- Amongst the included models are very generic model functions (in :py:mod:`Stoner.analysis.fitting.models.generic`) including: .. currentmodule:: Stoner.analysis.fitting.models.generic - :py:class:`Linear` - straight line fit :math:`y=mx+c` - :py:class:`Quadratic` - 2nd order polynomial :math:`y=ax^2+bx+c` - :py:class:`PowerLaw` - A general powerlaw expression :math:`y=Ax^k` - :py:class:`StretchedExponential` is a standard exponential decay with an additional power :math:`\beta` - :math:`y=A\exp\left[\left(\frac{-x}{x_0}\right)^\beta\right]` Thermal Physics Models ---------------------- The :py:mod:`Stoner.analysis.fitting.models.thermal` module supports a range of models suitable for various thermal physics related expressions: .. currentmodule:: Stoner.analysis.fitting.models.thermal - :py:class:`Arrhenius` - The Arrhenius expression is used to describe processes whose rate is controlled by a thermal distribution, it is essentially an exponential decay :math:`y=A\exp\left(\frac{-\Delta E}{k_Bx}\right)`. - :py:class:`ModArrhenius` - The modified Arrhenous expresses :math:`\tau=Ax^n\exp\left(\frac{-\Delta E}{k_B x}\right)` is used when the prefactor in the regular Arrhenius theory has a temperature dependence. Typically the prefactor exponent ranges for :math:`-1