A Microsoft Office (Excel, Word) forum. OfficeFrustration

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » OfficeFrustration forum » Microsoft Excel » Charts and Charting
Site Map Home Register Authors List Search Today's Posts Mark Forums Read  

Use of excel curves as a valid way to estimate unmeasured data



 
 
Thread Tools Display Modes
  #1  
Old October 13th, 2009, 10:53 AM posted to microsoft.public.excel.charting
EBL
external usenet poster
 
Posts: 3
Default Use of excel curves as a valid way to estimate unmeasured data

Hi--
I am reviewing a scientific manuscript for publication. In the
article, they graph data in excel and use the curve fitting function
to estimate days when nadirs are reached. To me, that seems like a
misuse of trendlines, but I'm appealing to the statistically savvy
people on this group. Here is the type of data--

Day Result
1 114%
4 107%
8 92%
13 81%
28 79%
42 92%
68 84%
91 89%
112 92%

Based on an excel curve, the authors estimated that the actual nadir
occurred on day 24. However, in my mind, that's an inappropriate use
of the curve fitting function. Comments?
Thanks.
  #2  
Old October 13th, 2009, 02:15 PM posted to microsoft.public.excel.charting
Jon Peltier[_2_]
external usenet poster
 
Posts: 386
Default Use of excel curves as a valid way to estimate unmeasured data

What kind of model did they fit their data to? Excel has none that is
valid and comes up with 24.

I get a fit that comes very close (R^2 0.99) to each point if I use a
polynomial fit of order 6. Of course, this sets up an oscillating curve
with minima at about 22 (75%) and 79 (78%), and maxima at 51 (96%) and
105 (111%). When plotted, this is clearly an unrealistic fit and should
be discarded.

I can fit a line to the first four points, and another line to the last
5 or 6 points (doesn't make much difference) and these lines intersect
at 12-13. If I ignore the bump at the 6th point and fit points 5 and
7-9, this intersects the first line at 14. Both of these fits are better
than a poly fit.

The best way to fit this data is to come up with a physical mechanism
that describes the process that produced the data, and use appropriate
coefficients in the related model.

- Jon
-------
Jon Peltier
Peltier Technical Services, Inc.
http://peltiertech.com/



EBL wrote:
Hi--
I am reviewing a scientific manuscript for publication. In the
article, they graph data in excel and use the curve fitting function
to estimate days when nadirs are reached. To me, that seems like a
misuse of trendlines, but I'm appealing to the statistically savvy
people on this group. Here is the type of data--

Day Result
1 114%
4 107%
8 92%
13 81%
28 79%
42 92%
68 84%
91 89%
112 92%

Based on an excel curve, the authors estimated that the actual nadir
occurred on day 24. However, in my mind, that's an inappropriate use
of the curve fitting function. Comments?
Thanks.

  #3  
Old October 13th, 2009, 03:02 PM posted to microsoft.public.excel.charting
EBL
external usenet poster
 
Posts: 3
Default Use of excel curves as a valid way to estimate unmeasured data

Thanks for the reply.

Their M&M just says that curves were fit using excel. I think they
picked an XY chart and the option of scatter with data points
connected with smoothed lines (at least--when I do that, I get a curve
that looks identical to the one that they provided in the draft
manuscript).

I think you're confirming what I thought but couldn't articulate--that
this is a simplistic and scienfitically unacceptable way to estimate a
value at an unmeasured time point because a model to fit the data
needs to be based on measured or estimated parameters/coefficients
describing the biological process that they are attempting to model.

Correct?




On Oct 13, 6:15*am, Jon Peltier wrote:
What kind of model did they fit their data to? Excel has none that is
valid and comes up with 24.

I get a fit that comes very close (R^2 0.99) to each point if I use a
polynomial fit of order 6. Of course, this sets up an oscillating curve
with minima at about 22 (75%) and 79 (78%), and maxima at 51 (96%) and
105 (111%). When plotted, this is clearly an unrealistic fit and should
be discarded.

I can fit a line to the first four points, and another line to the last
5 or 6 points (doesn't make much difference) and these lines intersect
at 12-13. If I ignore the bump at the 6th point and fit points 5 and
7-9, this intersects the first line at 14. Both of these fits are better
than a poly fit.

The best way to fit this data is to come up with a physical mechanism
that describes the process that produced the data, and use appropriate
coefficients in the related model.

- Jon
-------
Jon Peltier
Peltier Technical Services, Inc.http://peltiertech.com/



EBL wrote:
Hi--
I am reviewing a scientific manuscript for publication. *In the
article, they graph data in excel and use the curve fitting function
to estimate days when nadirs are reached. *To me, that seems like a
misuse of trendlines, but I'm appealing to the statistically savvy
people on this group. *Here is the type of data--


Day * * Result
1 *114%
4 *107%
8 *92%
13 81%
28 79%
42 92%
68 84%
91 89%
112 * * * *92%


Based on an excel curve, the authors estimated that the actual nadir
occurred on day 24. *However, in my mind, that's an inappropriate use
of the curve fitting function. *Comments?
Thanks.- Hide quoted text -


- Show quoted text -


  #4  
Old October 13th, 2009, 06:43 PM posted to microsoft.public.excel.charting
Jon Peltier[_2_]
external usenet poster
 
Posts: 386
Default Use of excel curves as a valid way to estimate unmeasured data

Oh yeah, I didn't even consider the "smoothed line" method. This in no
way follows the path of the data. The curves are bezier curves
connecting adjacent points, and can stray far from the position of
actual data.

Using smoothed lines is as invalid as using a polynomial fit.

- Jon
-------
Jon Peltier
Peltier Technical Services, Inc.
http://peltiertech.com/



EBL wrote:
Thanks for the reply.

Their M&M just says that curves were fit using excel. I think they
picked an XY chart and the option of scatter with data points
connected with smoothed lines (at least--when I do that, I get a curve
that looks identical to the one that they provided in the draft
manuscript).

I think you're confirming what I thought but couldn't articulate--that
this is a simplistic and scienfitically unacceptable way to estimate a
value at an unmeasured time point because a model to fit the data
needs to be based on measured or estimated parameters/coefficients
describing the biological process that they are attempting to model.

Correct?




On Oct 13, 6:15 am, Jon Peltier wrote:
What kind of model did they fit their data to? Excel has none that is
valid and comes up with 24.

I get a fit that comes very close (R^2 0.99) to each point if I use a
polynomial fit of order 6. Of course, this sets up an oscillating curve
with minima at about 22 (75%) and 79 (78%), and maxima at 51 (96%) and
105 (111%). When plotted, this is clearly an unrealistic fit and should
be discarded.

I can fit a line to the first four points, and another line to the last
5 or 6 points (doesn't make much difference) and these lines intersect
at 12-13. If I ignore the bump at the 6th point and fit points 5 and
7-9, this intersects the first line at 14. Both of these fits are better
than a poly fit.

The best way to fit this data is to come up with a physical mechanism
that describes the process that produced the data, and use appropriate
coefficients in the related model.

- Jon
-------
Jon Peltier
Peltier Technical Services, Inc.http://peltiertech.com/



EBL wrote:
Hi--
I am reviewing a scientific manuscript for publication. In the
article, they graph data in excel and use the curve fitting function
to estimate days when nadirs are reached. To me, that seems like a
misuse of trendlines, but I'm appealing to the statistically savvy
people on this group. Here is the type of data--
Day Result
1 114%
4 107%
8 92%
13 81%
28 79%
42 92%
68 84%
91 89%
112 92%
Based on an excel curve, the authors estimated that the actual nadir
occurred on day 24. However, in my mind, that's an inappropriate use
of the curve fitting function. Comments?
Thanks.- Hide quoted text -

- Show quoted text -


  #5  
Old October 14th, 2009, 10:34 PM posted to microsoft.public.excel.charting
Herbert Seidenberg
external usenet poster
 
Posts: 1,113
Default Use of excel curves as a valid way to estimate unmeasured data

Excel 2007
Curve Fitting.
A better fit.
http://www.mediafire.com/file/neojzj...10_14_09b.xlsx
  #6  
Old October 15th, 2009, 05:51 PM posted to microsoft.public.excel.charting
Jerry W. Lewis
external usenet poster
 
Posts: 491
Default Use of excel curves as a valid way to estimate unmeasured data

Plus, Excel's Bezier curve seems to have its minimum on day 23, not day 24.

Any idea what Herbert Seidenberg is talking about? He claims a "better fit"
in 2007 but does not say better in what way, and like the authors of the
paper, fails to say what kind of fit. Also his download link is useless to
those of us who do not use 2007.

Jerry

"Jon Peltier" wrote:

Oh yeah, I didn't even consider the "smoothed line" method. This in no
way follows the path of the data. The curves are bezier curves
connecting adjacent points, and can stray far from the position of
actual data.

Using smoothed lines is as invalid as using a polynomial fit.

....
What kind of model did they fit their data to? Excel has none that is
valid and comes up with 24.


  #7  
Old October 16th, 2009, 05:21 AM posted to microsoft.public.excel.charting
Jon Peltier[_2_]
external usenet poster
 
Posts: 386
Default Use of excel curves as a valid way to estimate unmeasured data

Jerry -

I hate the way he just posts an undocumented link to an xlsx file. I
prefer seeing a description in an html page, without having to download
and open a whole workbook. This is the first of his links I've ever
bothered to follow, and I'm in no hurry to follow another.

Anyway, he linked to a site called http://zunzun.com, which I gather he
used to fit the meager data to a function of the form

y = x / (a * exp(-exp(b - c * x))) + d

- Jon
-------
Jon Peltier
Peltier Technical Services, Inc.
http://peltiertech.com/



Jerry W. Lewis wrote:
Plus, Excel's Bezier curve seems to have its minimum on day 23, not day 24.

Any idea what Herbert Seidenberg is talking about? He claims a "better fit"
in 2007 but does not say better in what way, and like the authors of the
paper, fails to say what kind of fit. Also his download link is useless to
those of us who do not use 2007.

Jerry

"Jon Peltier" wrote:

Oh yeah, I didn't even consider the "smoothed line" method. This in no
way follows the path of the data. The curves are bezier curves
connecting adjacent points, and can stray far from the position of
actual data.

Using smoothed lines is as invalid as using a polynomial fit.

...
What kind of model did they fit their data to? Excel has none that is
valid and comes up with 24.


  #8  
Old October 16th, 2009, 10:03 PM posted to microsoft.public.excel.charting
Lori Miller
external usenet poster
 
Posts: 62
Default Use of excel curves as a valid way to estimate unmeasured data

I agree, the analysis is sloppy and more info is needed.

Three common interpolation methods for smooth functions give very
different values for minima around the interval of interest.

Using the formulae below the global minima were found, with the aid
of solver, to be at x = 23.25, 20.65 and 31.74 respectively.

Data range=A2:B10, first is ctrl+shift+entered in two cells, (0 t = 1):

"Smooth line":
=MMULT(t^{0,1,2,3},MMULT({0,2,0,0;-1,0,1,0;2,-5,4,-1;-1,3,-3,1}/2,A4:B7))

Cubic: =TREND(B4:B7,A4:A7^{3,2,1},x^{3,2,1})

Lagrange: =TREND(B2:B10,A2:A10^{8,7,6,5,4,3,2,1},x^{8,7,6,5, 4,3,2,1})



"Jon Peltier" wrote in message
...
Jerry -

I hate the way he just posts an undocumented link to an xlsx file. I
prefer seeing a description in an html page, without having to download
and open a whole workbook. This is the first of his links I've ever
bothered to follow, and I'm in no hurry to follow another.

Anyway, he linked to a site called http://zunzun.com, which I gather he
used to fit the meager data to a function of the form

y = x / (a * exp(-exp(b - c * x))) + d

- Jon
-------
Jon Peltier
Peltier Technical Services, Inc.
http://peltiertech.com/



Jerry W. Lewis wrote:
Plus, Excel's Bezier curve seems to have its minimum on day 23, not day
24.

Any idea what Herbert Seidenberg is talking about? He claims a "better
fit" in 2007 but does not say better in what way, and like the authors of
the paper, fails to say what kind of fit. Also his download link is
useless to those of us who do not use 2007.

Jerry

"Jon Peltier" wrote:

Oh yeah, I didn't even consider the "smoothed line" method. This in no
way follows the path of the data. The curves are bezier curves
connecting adjacent points, and can stray far from the position of
actual data.

Using smoothed lines is as invalid as using a polynomial fit.

...
What kind of model did they fit their data to? Excel has none that is
valid and comes up with 24.




  #9  
Old October 27th, 2009, 11:57 PM posted to microsoft.public.excel.charting
zunzun.com
external usenet poster
 
Posts: 1
Default Use of excel curves as a valid way to estimate unmeasured data

On Oct 16, 4:03*pm, "Lori Miller"
wrote:
I agree, the analysis is sloppy and more info is needed.


There is a rather obvious outlier at;

Day Result
42 92%

which I saw by scatterplotting the raw data. Try removing this point.

James Phillips
http://zunzun.com

  #10  
Old October 30th, 2009, 04:13 PM posted to microsoft.public.excel.charting
Lori Miller
external usenet poster
 
Posts: 62
Default Use of excel curves as a valid way to estimate unmeasured data

The observation on day 42 is only outlying if you're bound by simple
parametric models and shouldn't necessarily be excluded. In fact it
forms a significant part of the basis for the nadir estimate and i
doubt the authors of the paper would have included it otherwise.

In the absence of an underlying theoretical model, any inference on
scarce data is necessarily vague, and the practice of trying arbitrary
data models until you find one that fits is not generally recommended
due to the problems of overfitting and data snooping.

PS. To clarify the remark was directed at the OP: chart-based
estimates should be avoided in any scientific paper, in addition an
inaccurate value is given without a mitigating explanation it seems.


"zunzun.com" wrote:

On Oct 16, 4:03 pm, "Lori Miller"
wrote:
I agree, the analysis is sloppy and more info is needed.


There is a rather obvious outlier at;

Day Result
42 92%

which I saw by scatterplotting the raw data. Try removing this point.

James Phillips
http://zunzun.com

.

 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump


All times are GMT +1. The time now is 07:44 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 OfficeFrustration.
The comments are property of their posters.