WinBUGS - Defining a discrete stepwise distribution - frequency-distribution

I want to define a distribution in my model of the form: P(x=10)=0.10, P(x=15)=0.20, P(x=20)=0.70
The WinBUGS FAQ says it is possible construct my own discrete uniform distribution as a categorical variable with a uniform prior and which can take on the necessary integer values. See the blockerht example in the first part of the manual.
I looked the example up, I think it is this one: "A hierarchical t-distribution with unknown degrees of freedom"
At the model specification they do something like:
for (n in 1:Nbins) {
prior[n] <- 1/Nbins; # Uniform prior on v
}
k ~ dcat(prior[]);
Which does define a discrete uniform. But I don't know how to get to the form I need. Can anyone help me?

If I understand your question correctly, you do not need the loop...
#BUGS script to obtain distribution
m1<-"model{
ind ~ dcat(p[])
pmix <- x[ind]
}"
writeLines(m1,"m1.txt")
#simulate from the distribution
library("R2OpenBUGS")
m1.bug<-bugs(data = list(x=c(10, 15, 20), p=c(0.1,0.2,0.7)),
inits = NULL,
param = "pmix",
model = "m1.txt",
n.iter = 1100, n.burnin = 100, n.chains = 1, n.thin=1, DIC=FALSE)
hist(m1.bug$sims.list$pmix)
should work...

I am learning how to do this myself. I wonder if you can do this:
prior[10] <- .1
prior[15] <- .2
prior[20] <- .7
x ~ dcat(prior[])

Related

passing dictionaries of parameters and initial values to scipy.integrate.odeint

I'm trying to integrate a system of differential equations using spicy.itegrate.odeint.
First, parameters and initial conditions are sampled and returned in two dictionaries (x0 and p). Then the model is created and written as a function to a file, looking roughly as follows (with dummy equations):
def model(x, t, p):
xdot = [
x['rate1'], p["a"]
x['rate2'], p["b"] * x["state1"] - p["c"] * x["state2"]
x['rate3'], p["c"] * x["state2"]
x["state4"], x["rate1"] + x["rate2"]
x["state5"], - x["rate2"] + x["rate3"]
]
return xdot
This is so that I can easily generate different models from simple inputs. Thus, what might normally be hardcoded variables, are now keys in a dictionary with the corresponding value. I do this because assigning variables dynamically is considered bad practice.
When I try to integrate the system using odeint, as follows
sol = odeint(model, x0, t, args=(p,),
atol=1.0e-8, rtol=1.0e-6)
where, thus, x0 is a dictionary of initial conditions, and p of parameters (and t a list of floats). I get the following error:
TypeError: float() argument must be a string or a number, not 'dict'
Obviously, scipy is not happy with my attempt to pass a dictionary to parameterize and initialize my model. The question is if there is a way for me to resolve this, or whether I am forced to assign all values in my dictionary to variables with the name of their corresponding key. The latter does not allow me to pass the same set of initial conditions and parameters to all models, since they differ both in states and parameters. Thus, wish to pass the same set of parameters to all models, regardless of wether the parameters are in the model or not.
For performance reasons scipy functions like odeint work with arrays where each parameter is associated with a fixed position.
A solution to access parameters by name is to convert them to a namedtuple which gives them both, a name and a position. However, the conversion needs to be done inside the function because odeint passes the parameters as a numpy.array to the model function.
This example should convey the idea:
from scipy.integrate import odeint
from collections import namedtuple
params = namedtuple('params', ['a', 'b', 'c', 'd'])
def model(x, t0):
x = params(*x)
xdot = [1,
x.a + x.b,
x.c / x.a,
1/x.d**2] # whatever
return xdot
x0 = params(a=1, b=0, c=2, d=0.5)
t = [0, 0.5, 1]
sol = odeint(model, x0, t)

Dictionary key from pdb file

I'm trying to go through a .pdb file, calculate distance between alpha carbon atoms from different residues on chains A and B of a protein complex, then store the distance in a dictionary, together with the chain identifier and residue number.
For example, if the first alpha carbon ("CA") is found on residue 100 on chain A and the one it binds to is on residue 123 on chain B I would want my dictionary to look something like d={(A, 100):[B, 123, distance_between_atoms]}
from Bio.PDB.PDBParser import PDBParser
parser=PDBParser()
struct = parser.get_structure("1trk", "1trk.pdb")
def getAlphaCarbons(chain):
vec = []
for residue in chain:
for atom in residue:
if atom.get_name() == "CA":
vec = vec + [atom.get_vector()]
return vec
def dist(a,b):
return (a-b).norm()
chainA = struct[0]['A']
chainB = struct[0]['B']
vecA = getAlphaCarbons(chainA)
vecB = getAlphaCarbons(chainB)
t={}
model=struct[0]
for model in struct:
for chain in model:
for residue in chain:
for a in vecA:
for b in vecB:
if dist(a,b)<=8:
t={(chain,residue):[(a, b, dist(a, b))]}
break
print t
It's been running the programme for ages and I had to abort the run (have I made an infinite loop somewhere??)
I was also trying to do this:
t = {i:[((a, b, dist(a,b)) for a in vecA) for b in vecB if dist(a, b) <= 8] for i in chainA}
print t
But it's printing info about residues in the following format:
<Residue PHE het= resseq=591 icode= >: []
It's not printing anything related to the distance.
Thanks a lot, I hope everything is clear.
Would strongly suggest using C libraries while calculating distances. I use mdtraj for this sort of thing and it works much quicker than all the for loops in BioPython.
To get all pairs of alpha-Carbons:
import mdtraj as md
def get_CA_pairs(self,pdbfile):
traj = md.load_pdb(pdbfile)
topology = traj.topology
CA_index = ([atom.index for atom in topology.atoms if (atom.name == 'CA')])
pairs=list(itertools.combinations(CA_index,2))
return pairs
Then, for quick computation of distances:
def get_distances(self,pdbfile,pairs):
#returns list of resid1, resid2,distances between CA-CA
traj = md.load_pdb(pdbfile)
pairs=self.get_CA_pairs(pdbfile)
dist=md.compute_distances(traj,pairs)
#make dictionary you desire.
dict=dict(zip(CA, pairs))
return dict
This includes all alpha-Carbons. There should be a chain identifier too in mdtraj to select CA's from each chain.

scikit learn GridSearchCV always returns the first parameters as best

I set parameters for GridSearchCV to be:
parameters = {'kernel':['rbf'], 'C':[1, 5, 0.5], 'gamma':[1, 5, 0.5]}
grid = GridSearchCV(SVC(), parameters)
grid.fit(dataset, targets)
Then grid.best_params_ or grid.best_estimator_ always returns the first parameters from the list to be the best (i.e. 1 and 1). If I change the order of the parameters and put 5 at the top of the list for 'C', then the best parameters are 'C'=5 and 'gamma'=1.
What am I doing wrong?
you have to change the scoring parameter to (roc_auc), her's an example:
grid = GridSearchCV(model, param_grid = p, scoring='roc_auc')
grid.fit(self.train_data, self.train_labels)
print('\nThe best hyper-parameter for -- {} is {}, the corresponding mean accuracy through 10 Fold test is {} \n'\
.format(name, grid.best_params_, grid.best_score_))
model = grid.best_estimator_
train_pred = model.predict(self.train_data)
print('{} train accuracy = {}\n'.format(name,(train_pred == self.train_labels).mean()))

MATLAB: dynamic variable definitions

For a numerical simulation in MATLAB I have parameters defined in an .m file.
%; Parameters as simple definitons
amb.T = 273.15+25; ... ambient temperature [K]
amb.P = 101325; ... ambient pressure [Pa]
combustor.T = 273.15+800; ... [K]
combustor.P = 100000; ... [Pa]
combustor.lambda = 1.1;
fuel.x.CH4 = 0.5; ... [0..1]
fuel.n = 1;
air.x.O2 = 0.21;
%; more complex definitions consisting of other params
air.P = reactor.P;
air.T = amb.T;
air.n = fuel.x.CH4 * 2 * fuel.n * combustor.lambda / air.x.O2;
Consider this set as 'default' definitions. For running one simulation this definitions works fine.
It's getting more complicated if I want to change one of these parameters programmatically for a parameter study (the effect of changing parameters on the results), that is, to perform multiple simulations by using a for loop. In the script performing this I want to change the defintion of several parameters beforehand, i.e. overwrite default definitions. Is there a way to do this without touching the default definitions in-code (comment them/overwrite them literally)? It should be possible to change any parameter in the study-performing script and catch up on default definitions from the listing above (or the other way round).
Let me illustrate the problem with the following example: If I want to vary combustor.lambda (let's say running from 0.9 to 1.3) field air.n has to be evaluated again for the change to take place in the actual simulation. So, I could evaluate the listing again, but this way I would lose the study-defined combustor.lambda for the default one.
I am thinking about these solutions but I cannot get to how to do this:
Use references/handles in a way that the struct fields only hold the definitions, not the actual values. This allows for changing default definitions before 'parsing' the whole struct to get the actual values.
Evaluate the default definition set by a function considering (non-default) definitions defined preliminarily, i.e. skipping these lines of the default definition set during evaluation.
Any OOP approach. Of course, it is not limited to struct data types, but on the other hand, maybe there are useful functions for structs?
Edit:
The purpose of the default set is for the programmer to be as free as possible in choosing the varying parameters with any of the other parameters keeping their default definition which can be independent (= values) as well as dependent (= equations like air.n).
% one default parameter set
S = struct('T', 25, 'P', 101000, 'lambda', .5, 'fuel', .5);
GetNByLambda = #(fuel, lambda) fuel * 2 * lambda;
T = struct('P', S.P, 'n', GetNByLambda(S.fuel, S.lambda));
% add more sets
S(end+1) = struct('T', 200, 'P', 10000, 'lambda', .8, 'fuel', .7);
T(end+1) = struct('P', S.P, 'n', GetNByLambda(S(end+1).fuel, S(end+1).lambda));
% iterate over parameter sets
for ii = 1:length(S)
disp(S(end+1))
disp(T(end+1))
end

OpenBUGS error “expected multivariate node”

I am writing a program in R which uses R2OpenBUGS. The code is given at the bottom. The following error is coming while running it-
model is syntactically correct
data loaded
expected multivariate node
model must have been compiled but not updated to be able to change RN generator
BugsCmds:NoCompileInits
BugsCmds:NoCompileInits
BugsCmds:NoCompileInits
model must be compiled before generating initial values
model must be initialized before updating
model must be initialized before monitors used
model must be initialized before monitors used
model must be initialized before monitors used
model must be initialized before monitors used
model must be initialized before DIC can be monitored
model must be initialized before updating
model must be initialized before monitors used
DIC monitor not set
Please help me out to correct the code-
The following is the OpenBUGS code.
## BUGS Model
model {
# model for joint distribution of nuhat
nuhat[1:m]~dmnorm(mean[], B[,])
for(i in 1:m){
mean[i]<-mu
}
B[1:m,1:m]<-SIGMA[1:m,1:m]+tau2*inverse(C[1:m,1:m])
C[1:m,1:m]<-DW[1:m,1:m]-rho*W[1:m,1:m]
# priors on parameters
mu~dunif(-5,5)
rho~dunif (-1,1)
tau2~dunif (0, 1000)
}
## Data
list(m=5, nuhat=c(-0.228352281,-0.290119586,-0.211553228,-0.252395328,-0.263358489),
SIGMA=structure(.Data=c( 1.451677,0,0,0,0,
0,1.578091,0,0,0,
0,0,1.386538,0,0,
0,0,0,1.484578,0,
0,0,0,0,1.500409), .Dim=c(5,5)),
DW=structure(.Data=c(2,0,0,0,0,
0,2,0,0,0,
0,0,3,0,0,
0,0,0,2,0,
0,0,0,0,1), .Dim=c(5,5)),
W=structure(.Data=c(0,1,1,0,0,
1,0,0,1,0,
1,0,0,1,1,
0,1,1,0,0,
0,0,1,0,0), .Dim=c(5,5)))
## Inits
list(mu=-1,tau2=1,rho=0.5)
OpenBUGS does not let you assign matrices for deterministic nodes in the same way it does for random nodes. For example (forgetting for the moment what B, SIGMA and DW in your model actually are)...,
B[1:m, 1:m] ~ dwish(SIGMA[,], 5)
is okay, but
B[1:m, 1:m] <- SIGMA[,] + DW[,]
does not seem to work. Instead you have to create a loops to assign each element of the matrix, e.g.
for(i in 1:m){
for(i in 1:m){
B[i, j] <- SIGMA[i, j] + DW[i, j]
}
}
The inverse transformation in your code (which is part of the calculation of B) cannot not be done element wise, so this has to go outside any loops assigning values to deterministic matrix elements.
With both these rules in mind, your model will work if you re-express as:
model {
# model for joint distribution of nuhat
nuhat[1:m]~dmnorm(mean[], B[,])
for(i in 1:m){
mean[i]<-mu
}
for(i in 1:m){
for(j in 1:m){
B[i,j] <- SIGMA[i,j] + tau2*invC[i,j]
C[i,j] <- DW[i,j] - rho*W[i,j]
}
}
invC[1:m,1:m]<-inverse(C[1:m,1:m])
# priors on parameters
mu~dunif(-5,5)
rho~dunif (-1,1)
tau2~dunif (0, 1000)
}

Resources