Any machine learning / stats people here? I need to fit a curve

Thread replies: 40
Thread images: 5

Anonymous
2016-01-21 04:49:43 Post No. 7799116
[Report] Image search: [Google]

File: image.jpg (28 KB, 634x417) Image search: [Google]

Anonymous 2016-01-21 04:49:43 Post No. 7799116 [Report]

Any machine learning / stats people here?

I need to fit a curve by manipulating 3 variables. The process that calculates my guesses is expensive to run (~45 minutes) so I would like to minimize my number of guesses. Are there some statistical methods that can give the optimal numbers to try?

>>

Anonymous 2016-01-21 04:51:25 Post No.7799121
[Report]

Anonymous 2016-01-21 04:51:25 Post No.7799121 [Report]

>>7799116
yep

>>

Anonymous 2016-01-21 04:53:14 Post No.7799127
[Report]

Anonymous 2016-01-21 04:53:14 Post No.7799127 [Report]

>>7799121
And they are?

>>

Anonymous 2016-01-21 04:55:24 Post No.7799132
[Report]

Anonymous 2016-01-21 04:55:24 Post No.7799132 [Report]

>>7799127
Post the problem

>>

Anonymous 2016-01-21 04:59:46 Post No.7799145
[Report]

Anonymous 2016-01-21 04:59:46 Post No.7799145 [Report]

>>7799132
This *is* the problem.

I am not going to describe the underlying process much, because it's pretty irrelevant and I'd like to treat it as a black box.

I can make a weak assumption that results from changing the input vars are smooth, or at least smooth most of the time.

>>

Anonymous 2016-01-21 05:02:37 Post No.7799149
[Report]

Anonymous 2016-01-21 05:02:37 Post No.7799149 [Report]

>>7799145
If you're going to be so vague I can't help you.

>>

Anonymous 2016-01-21 05:06:02 Post No.7799157
[Report]

Anonymous 2016-01-21 05:06:02 Post No.7799157 [Report]

>>7799149
Why? I am trying to solve for the general case, I thought there would be a general solution.

What extra info specifically do you want?

>>

Anonymous 2016-01-21 05:10:00 Post No.7799165
[Report]

Anonymous 2016-01-21 05:10:00 Post No.7799165 [Report]

>>7799116
it's expansive because you have a lots of data points?
work on samples.
Fit on one sample and validate performance on another.
Decide your model structure doing that.
Allow for a few more degrees of freedom and fit on the full dataset.

Fit on the whole dataset for maximum

>>

Anonymous 2016-01-21 05:29:15 Post No.7799203
[Report]

Anonymous 2016-01-21 05:29:15 Post No.7799203 [Report]

>>7799116
What is the degree of your polynomial? Have you done bayesian analysis to compare which of your models has the highest information content?
Note this does not mean that you should pick the model with the best accuracy as it might overfit.

Here is more: http://jakevdp.github.io/blog/2015/08/07/frequentism-and-bayesianism-5-model-selection/

>>

Anonymous 2016-01-21 05:33:45 Post No.7799211
[Report]

Anonymous 2016-01-21 05:33:45 Post No.7799211 [Report]

>>7799203
>http://jakevdp.github.io/blog/2015/08/07/frequentism-and-bayesianism-5-model-selection/
Good link, cross-validation is probably the most important as its the best defense against overfitting.

>>

Anonymous 2016-01-21 05:52:35 Post No.7799237
[Report]

Anonymous 2016-01-21 05:52:35 Post No.7799237 [Report]

>>7799203
It's not a polynomial. The input vars go into a complex model which spits out outputs after a long calculation. We're basically tweaking A subset of tweakables to study its behavior. I would like to fit it onto some preexisting curves.

>>

Anonymous 2016-01-21 06:04:12 Post No.7799263
[Report]

Anonymous 2016-01-21 06:04:12 Post No.7799263 [Report]

I'll explain a proper parameter selection methodology to you for $250/hr.

>>

Anonymous 2016-01-21 06:09:23 Post No.7799273
[Report]

Anonymous 2016-01-21 06:09:23 Post No.7799273 [Report]

>>7799263
Sure. Please post your cv and banking details.

>>

Anonymous 2016-01-21 06:29:01 Post No.7799298
[Report]

Anonymous 2016-01-21 06:29:01 Post No.7799298 [Report]

>>7799273
Sorry. My inner smartass took over for a moment.

It is hard to answer your question without more info. The easy answer is grid search, but at 45 mins a pop to test a combination of parameters, that's not what you are looking for and I suspect this is exactly what you are already trying to avoid. Is there any more info you can provide about the parameter search space?

>>

Anonymous 2016-01-22 00:04:38 Post No.7800889
[Report]

Anonymous 2016-01-22 00:04:38 Post No.7800889 [Report]

>>7799298
I'm playing around with PETSe. We're studying the collapse of structures (eg bridges) under pressure. The curve is stress/strain. We put in some custom physics ( fon't ask me for details, I'm just one of the unpaid student / slave labor) and would like to fit our output to some curves with exact analytical solutions.

>>

Anonymous 2016-01-22 01:20:45 Post No.7801007
[Report]

Anonymous 2016-01-22 01:20:45 Post No.7801007 [Report]

OK. I have almost zero domain specific knowledge about what you are working on, but I believe I understand what you are trying to accomplish.

When you say that you are trying to "fit a curve" to your data, that implies that you actually have data with which to work. Let's go back to the "black box" concept. If I understand correctly, you are feeding a set of input values into your computation engine. After waiting 45 minutes, you get an output value. That is a single data point corresponding to your chosen input values. Then you try another set of input values, wait another 45 minutes, and get another output value. That would be your second data point. Then, lather rinse and repeat... Do I understand your situation properly?

Now, after you do the above you have a bunch of data points to work with and you want to fit a curve to this data. You are hoping to fit a curve with an analytical solution, correct? Is this going to work for you? I don't know. I would need to look at the data to answer that, but if the data has a decently tight pattern to it your chances are good.

How much data do you need to fit the curve? Well, my standard answer is as much as you can get your hands on. But gathering your data is expensive time-wise, so how little can you get away with? Once again, it depends on the nature of the data. How about if you gather 50 or 100 data points and start trying to fit your curve. Curve-fitting for this amount of data will be lightning fast, so you can start playing around with something. Continue gathering more data while you play. Add more data to your model as it becomes available. See where that process takes you.

>>

Anonymous 2016-01-22 01:45:02 Post No.7801032
[Report]

Anonymous 2016-01-22 01:45:02 Post No.7801032 [Report]

>45 minute objective function runtime
jesus christ man you're fucked

you need to bring that down man or you're never going to get anywhere, any optimization approach is going to need to run that hundreds to thousands of times

>>

Anonymous 2016-01-22 01:46:22 Post No.7801035
[Report]

Anonymous 2016-01-22 01:46:22 Post No.7801035 [Report]

>>7801007
Not quite right: we enter three variables, and the output is a curve. We would like this curve to be the same as the analytical curve, so that we can figure out what our numbers actually represent.

>>

Anonymous 2016-01-22 01:54:34 Post No.7801049
[Report] Image search: [Google]

Anonymous 2016-01-22 01:54:34 Post No.7801049 [Report]

File: 1451827034381.jpg (13 KB, 600x467) Image search: [Google]

13 KB, 600x467

>>7799116
use regression since you already have data, download Minitab... btw yield strength cannot be obtained thru Hardness number of the sample

>>

Anonymous 2016-01-22 02:02:08 Post No.7801073
[Report] Image search: [Google]

Anonymous 2016-01-22 02:02:08 Post No.7801073 [Report]

File: stress-strain-diagram.jpg (17 KB, 340x317) Image search: [Google]

17 KB, 340x317

>>7800889
>The curve is stress/strain.
It should be representable as a splined polynomial, like pic related.

>>

Anonymous 2016-01-22 02:11:52 Post No.7801087
[Report]

Anonymous 2016-01-22 02:11:52 Post No.7801087 [Report]

>>7801049
>use regression!
>download minitab!

That's like telling me to solve the problem using numbers and a keyboard. How about some specifics, yo.

>>

Anonymous 2016-01-22 02:38:15 Post No.7801118
[Report]

Anonymous 2016-01-22 02:38:15 Post No.7801118 [Report]

>>7799116
Have you transformed any of the predictors or response variables?
Try the boxcox method and transform

>>

Anonymous 2016-01-22 02:42:19 Post No.7801122
[Report]

Anonymous 2016-01-22 02:42:19 Post No.7801122 [Report]

>all these "i took a stats class once" answers
OP nobody here has any clue how to solve your problem. you're the expert here unfortunately.

try seeing if your lab has any collaborators with experience in this shit, or if there's any labs on campus who do a lot of optimization stuff

>>

Anonymous 2016-01-22 03:05:12 Post No.7801139
[Report]

Anonymous 2016-01-22 03:05:12 Post No.7801139 [Report]

>>7799116
Rule of thumb is to always start with a linear model.

>>

Anonymous 2016-01-22 03:26:31 Post No.7801182
[Report]

Anonymous 2016-01-22 03:26:31 Post No.7801182 [Report]

>>7799149
This question is appropriately described.

>>

Anonymous 2016-01-22 03:26:47 Post No.7801183
[Report]

Anonymous 2016-01-22 03:26:47 Post No.7801183 [Report]

What tools are you using? R? SAS? SciPy?

>>

Anonymous 2016-01-22 03:37:21 Post No.7801194
[Report]

Anonymous 2016-01-22 03:37:21 Post No.7801194 [Report]

>>7801183
I would try to run several of these in parallel, then converge upon it a la simulated annealing.

Alternatively. I've been messing around with the Nelder-Mead method. Seems applicable.

>>

Anonymous 2016-01-22 04:30:23 Post No.7801265
[Report]

Anonymous 2016-01-22 04:30:23 Post No.7801265 [Report]

I don't get it.

Why would it take 45 minutes to compute if there are only 3 variables?

What happens if you simply try to apply some general linear models in R or Python or matlab or whatever package you like?

How inadequate is it then?

If that's still inadequate then try k nearest neighbours regression or some variants thereof.

No idea why it's taking you 45 minutes with 3 variables.

>>

Anonymous 2016-01-22 11:32:06 Post No.7801800
[Report]

Anonymous 2016-01-22 11:32:06 Post No.7801800 [Report]

>>7799116
>Are there some statistical methods that can give the optimal numbers to try?
Yes. It's called a regression, there are several methods but for only 3 parameters, even the most simple should do.
What's your model?
How many data points do you already have?

>>

Anonymous 2016-01-22 12:05:07 Post No.7801833
[Report]

Anonymous 2016-01-22 12:05:07 Post No.7801833 [Report]

Hey guys, how come bootstrapping and cross-validation are said to be used for the same thing i.e. testing and validating a model? FWIW I'm coming from a computer science background and this is in the context of neural networks. I don't have much formal stats training

Maybe I have a super basic misunderstanding of the concepts, but the way I see it, in k-fold CV, I pick a K (say 10) then I do K number of tests and train the model on K-1 sections and validate it against the remaining section. Then I get an error from this, and I do this K-1 more times and get an average error; thus I know how likely is for the model to generalize to a population (ish)

Now with bootstrapping, it seems like I simply select a random number of data points and then calculate some parameters like mean and variance, and maybe try to fit them if it's a regression problem. Then I keep doing this and I eventually get a function that solves the problem by using all the fits I got. Or something.

However, the CV seems to VALIDATE an already existing, trained model; and bootstrapping seems to actually TRAIN a dataset? How do people use bootstrapping for validation then?

>>

Anonymous 2016-01-22 12:20:48 Post No.7801850
[Report] Image search: [Google]

Anonymous 2016-01-22 12:20:48 Post No.7801850 [Report]

File: Screenshot 2016-01-14 20.48.53.png (616 KB, 794x595) Image search: [Google]

616 KB, 794x595

>>7801139
>>7801118
>>7801049
>>7801073

lmao, these fucking answers, holy fuck you guys are making me laugh

>>

Anonymous 2016-01-22 14:13:17 Post No.7801983
[Report]

Anonymous 2016-01-22 14:13:17 Post No.7801983 [Report]

>>7801850
sauce

>>

Anonymous 2016-01-22 16:50:21 Post No.7802186
[Report]

Anonymous 2016-01-22 16:50:21 Post No.7802186 [Report]

>>7801035
So, your output is more complex than I initially understood, which makes the problem more interesting, but I think the same concepts and approach apply. My next question is this: do you actually know the equation for the analytical curve that you are trying to fit you parameters to? If you do, then this leads you to the ability to construct an error measure for each set of parameters you test. Once you have this error measure in hand, you can probably apply a gradient descent approach to the error function to lead you much more quickly to your desired parameter values.

>>

Anonymous 2016-01-22 17:21:43 Post No.7802208
[Report]

Anonymous 2016-01-22 17:21:43 Post No.7802208 [Report]

>>7799298
you can do grid search just fine in 45 minutes if you parallelise enough. If OP is not a cheapskate he can arrange some time on his local cluster and be done in 45 minutes

>>

Anonymous 2016-01-22 17:57:59 Post No.7802266
[Report]

Anonymous 2016-01-22 17:57:59 Post No.7802266 [Report]

>>7802208
OP hasn't provided any info re: his computing environment or available resources. I had assumed that if he had access to parallel computation resources then he wouldn't have posted in the first place. But, that might have been a bad assumption...

>>

Anonymous 2016-01-22 18:17:31 Post No.7802296
[Report]

Anonymous 2016-01-22 18:17:31 Post No.7802296 [Report]

Step 1: go to a picture website for racist teenagers and ask them.

>>

Anonymous 2016-01-22 18:31:48 Post No.7802318
[Report]

Anonymous 2016-01-22 18:31:48 Post No.7802318 [Report]

>>7802296
Thanks for your contribution.

>>

Anonymous 2016-01-22 20:15:01 Post No.7802495
[Report]

Anonymous 2016-01-22 20:15:01 Post No.7802495 [Report]

Look into stochastic optimization methods.

You haven't really given enough info for me to recommend anything specific. Can you calculate a gradient?

I think you are basically fucked though.

>>

Anonymous 2016-01-22 21:24:09 Post No.7802634
[Report]

Anonymous 2016-01-22 21:24:09 Post No.7802634 [Report]

>>7801850
why?

>>

Anonymous 2016-01-23 03:58:38 Post No.7803387
[Report] Image search: [Google]

Anonymous 2016-01-23 03:58:38 Post No.7803387 [Report]

File: Screenshot 2015-12-31 06.51.17.png (705 KB, 773x708) Image search: [Google]

705 KB, 773x708

>>7801983