[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y ] [Home]
4chanarchives logo
Any machine learning / stats people here? I need to fit a curve
Images are sometimes not shown due to bandwidth/network limitations. Refreshing the page usually helps.

You are currently reading a thread in /sci/ - Science & Math

Thread replies: 40
Thread images: 5
File: image.jpg (28 KB, 634x417) Image search: [Google]
image.jpg
28 KB, 634x417
Any machine learning / stats people here?

I need to fit a curve by manipulating 3 variables. The process that calculates my guesses is expensive to run (~45 minutes) so I would like to minimize my number of guesses. Are there some statistical methods that can give the optimal numbers to try?
>>
>>7799116
yep
>>
>>7799121
And they are?
>>
>>7799127
Post the problem
>>
>>7799132
This *is* the problem.

I am not going to describe the underlying process much, because it's pretty irrelevant and I'd like to treat it as a black box.

I can make a weak assumption that results from changing the input vars are smooth, or at least smooth most of the time.
>>
>>7799145
If you're going to be so vague I can't help you.
>>
>>7799149
Why? I am trying to solve for the general case, I thought there would be a general solution.

What extra info specifically do you want?
>>
>>7799116
it's expansive because you have a lots of data points?
work on samples.
Fit on one sample and validate performance on another.
Decide your model structure doing that.
Allow for a few more degrees of freedom and fit on the full dataset.

Fit on the whole dataset for maximum
>>
>>7799116
What is the degree of your polynomial? Have you done bayesian analysis to compare which of your models has the highest information content?
Note this does not mean that you should pick the model with the best accuracy as it might overfit.

Here is more: http://jakevdp.github.io/blog/2015/08/07/frequentism-and-bayesianism-5-model-selection/
>>
>>7799203
>http://jakevdp.github.io/blog/2015/08/07/frequentism-and-bayesianism-5-model-selection/
Good link, cross-validation is probably the most important as its the best defense against overfitting.
>>
>>7799203
It's not a polynomial. The input vars go into a complex model which spits out outputs after a long calculation. We're basically tweaking A subset of tweakables to study its behavior. I would like to fit it onto some preexisting curves.
>>
I'll explain a proper parameter selection methodology to you for $250/hr.
>>
>>7799263
Sure. Please post your cv and banking details.
>>
>>7799273
Sorry. My inner smartass took over for a moment.

It is hard to answer your question without more info. The easy answer is grid search, but at 45 mins a pop to test a combination of parameters, that's not what you are looking for and I suspect this is exactly what you are already trying to avoid. Is there any more info you can provide about the parameter search space?
>>
>>7799298
I'm playing around with PETSe. We're studying the collapse of structures (eg bridges) under pressure. The curve is stress/strain. We put in some custom physics ( fon't ask me for details, I'm just one of the unpaid student / slave labor) and would like to fit our output to some curves with exact analytical solutions.
>>
OK. I have almost zero domain specific knowledge about what you are working on, but I believe I understand what you are trying to accomplish.

When you say that you are trying to "fit a curve" to your data, that implies that you actually have data with which to work. Let's go back to the "black box" concept. If I understand correctly, you are feeding a set of input values into your computation engine. After waiting 45 minutes, you get an output value. That is a single data point corresponding to your chosen input values. Then you try another set of input values, wait another 45 minutes, and get another output value. That would be your second data point. Then, lather rinse and repeat... Do I understand your situation properly?

Now, after you do the above you have a bunch of data points to work with and you want to fit a curve to this data. You are hoping to fit a curve with an analytical solution, correct? Is this going to work for you? I don't know. I would need to look at the data to answer that, but if the data has a decently tight pattern to it your chances are good.

How much data do you need to fit the curve? Well, my standard answer is as much as you can get your hands on. But gathering your data is expensive time-wise, so how little can you get away with? Once again, it depends on the nature of the data. How about if you gather 50 or 100 data points and start trying to fit your curve. Curve-fitting for this amount of data will be lightning fast, so you can start playing around with something. Continue gathering more data while you play. Add more data to your model as it becomes available. See where that process takes you.
>>
>45 minute objective function runtime
jesus christ man you're fucked

you need to bring that down man or you're never going to get anywhere, any optimization approach is going to need to run that hundreds to thousands of times
>>
>>7801007
Not quite right: we enter three variables, and the output is a curve. We would like this curve to be the same as the analytical curve, so that we can figure out what our numbers actually represent.
>>
File: 1451827034381.jpg (13 KB, 600x467) Image search: [Google]
1451827034381.jpg
13 KB, 600x467
>>7799116
use regression since you already have data, download Minitab... btw yield strength cannot be obtained thru Hardness number of the sample
>>
File: stress-strain-diagram.jpg (17 KB, 340x317) Image search: [Google]
stress-strain-diagram.jpg
17 KB, 340x317
>>7800889
>The curve is stress/strain.
It should be representable as a splined polynomial, like pic related.
>>
>>7801049
>use regression!
>download minitab!

That's like telling me to solve the problem using numbers and a keyboard. How about some specifics, yo.
>>
>>7799116
Have you transformed any of the predictors or response variables?
Try the boxcox method and transform
>>
>all these "i took a stats class once" answers
OP nobody here has any clue how to solve your problem. you're the expert here unfortunately.

try seeing if your lab has any collaborators with experience in this shit, or if there's any labs on campus who do a lot of optimization stuff
>>
>>7799116
Rule of thumb is to always start with a linear model.
>>
>>7799149
This question is appropriately described.
>>
What tools are you using? R? SAS? SciPy?
>>
>>7801183
I would try to run several of these in parallel, then converge upon it a la simulated annealing.

Alternatively. I've been messing around with the Nelder-Mead method. Seems applicable.
>>
I don't get it.

Why would it take 45 minutes to compute if there are only 3 variables?

What happens if you simply try to apply some general linear models in R or Python or matlab or whatever package you like?

How inadequate is it then?

If that's still inadequate then try k nearest neighbours regression or some variants thereof.

No idea why it's taking you 45 minutes with 3 variables.
>>
>>7799116
>Are there some statistical methods that can give the optimal numbers to try?
Yes. It's called a regression, there are several methods but for only 3 parameters, even the most simple should do.
What's your model?
How many data points do you already have?
>>
Hey guys, how come bootstrapping and cross-validation are said to be used for the same thing i.e. testing and validating a model? FWIW I'm coming from a computer science background and this is in the context of neural networks. I don't have much formal stats training

Maybe I have a super basic misunderstanding of the concepts, but the way I see it, in k-fold CV, I pick a K (say 10) then I do K number of tests and train the model on K-1 sections and validate it against the remaining section. Then I get an error from this, and I do this K-1 more times and get an average error; thus I know how likely is for the model to generalize to a population (ish)

Now with bootstrapping, it seems like I simply select a random number of data points and then calculate some parameters like mean and variance, and maybe try to fit them if it's a regression problem. Then I keep doing this and I eventually get a function that solves the problem by using all the fits I got. Or something.

However, the CV seems to VALIDATE an already existing, trained model; and bootstrapping seems to actually TRAIN a dataset? How do people use bootstrapping for validation then?
>>
File: Screenshot 2016-01-14 20.48.53.png (616 KB, 794x595) Image search: [Google]
Screenshot 2016-01-14 20.48.53.png
616 KB, 794x595
>>7801139
>>7801118
>>7801049
>>7801073


lmao, these fucking answers, holy fuck you guys are making me laugh
>>
>>7801850
sauce
>>
>>7801035
So, your output is more complex than I initially understood, which makes the problem more interesting, but I think the same concepts and approach apply. My next question is this: do you actually know the equation for the analytical curve that you are trying to fit you parameters to? If you do, then this leads you to the ability to construct an error measure for each set of parameters you test. Once you have this error measure in hand, you can probably apply a gradient descent approach to the error function to lead you much more quickly to your desired parameter values.
>>
>>7799298
you can do grid search just fine in 45 minutes if you parallelise enough. If OP is not a cheapskate he can arrange some time on his local cluster and be done in 45 minutes
>>
>>7802208
OP hasn't provided any info re: his computing environment or available resources. I had assumed that if he had access to parallel computation resources then he wouldn't have posted in the first place. But, that might have been a bad assumption...
>>
Step 1: go to a picture website for racist teenagers and ask them.
>>
>>7802296
Thanks for your contribution.
>>
Look into stochastic optimization methods.

You haven't really given enough info for me to recommend anything specific. Can you calculate a gradient?

I think you are basically fucked though.
>>
>>7801850
why?
>>
File: Screenshot 2015-12-31 06.51.17.png (705 KB, 773x708) Image search: [Google]
Screenshot 2015-12-31 06.51.17.png
705 KB, 773x708
>>7801983
Thread replies: 40
Thread images: 5

banner
banner
[Boards: 3 / a / aco / adv / an / asp / b / biz / c / cgl / ck / cm / co / d / diy / e / fa / fit / g / gd / gif / h / hc / his / hm / hr / i / ic / int / jp / k / lgbt / lit / m / mlp / mu / n / news / o / out / p / po / pol / qa / r / r9k / s / s4s / sci / soc / sp / t / tg / toy / trash / trv / tv / u / v / vg / vp / vr / w / wg / wsg / wsr / x / y] [Home]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.
If a post contains personal/copyrighted/illegal content you can contact me at [email protected] with that post and thread number and it will be removed as soon as possible.
DMCA Content Takedown via dmca.com
All images are hosted on imgur.com, send takedown notices to them.
This is a 4chan archive - all of the content originated from them. If you need IP information for a Poster - you need to contact them. This website shows only archived content.