Background: Genome-scale metabolic network models have been widely used for analysing the cellular behaviour of organisms both qualitatively and quantitatively. An iterative semi-automated procedure for model validation and refinement is important for maintaining high quality in such models. Methods and results: We have implemented a computational pipeline for genome-scale yeast metabolic model validation and optimization. This helps to assist the revision cycles of hypothesis generation, evaluation and testing via model simulation, literature/bioinformatics evidence mining and biological experiments. The resulting computational tools have been utilised for refining a recent genome-scale metabolic model of yeast, which is a further development of a consensus yeast network reconstruction, involving 15 cellular compartments and with improved representation of lipid metabolism and many other pathways. Several procedures of constraint-based analysis have been applied for further refinement of this model. First, flux variability analysis has been carried out to identify the reactions that are essential or blocked. The second step involves flux balance analysis under different growth conditions and genetic perturbations. The model predictions are then validated using the experimental data from single gene deletion study and from our previous investigation of yeast carbon/nitrogen source utilisation. False predictions are subsequently corrected, subject to different optimisation procedures based on mixed integer linear programming (MILP). To restore links of particular reactions to the growth, we identify those biomass constituents or key intermediate metabolites that should be generated through these reactions. We could also use a gap-filling algorithm to find a minimal subset of reactions adding on to the model to unblock reactions and restore growth in silico. The bi-level MILP techniques can be exploited to identify a minimal subset of reactions in order to suppress in the model which in turn will bound the predicted flux below a certain threshold. These analyses may provide suggestions for revisions in gene-protein-reaction (GPR) associations, biomass composition, reaction directionality, metabolite transportation, and other constraints implied by the regulatory rules. They also provide useful guidance for targeted search for evidence from literature and/or bioinformatics databases. Conclusions: Our computational tools for metabolic model refinement have been implemented in Python, using Cplex as the MILP solver. These tools can effectively search for (multiple) revision suggestions based on empirical observations. Such model refinement supported with manual curation and robot scientist experiments will help to improve the model performance in phenotype prediction under various conditions. Future work could be on development of a probabilistic logic programming framework that will integrate the metabolic model with collective evidence including the constraint-based analysis results for learning GPR associations and regulatory rules, and for automated suggestion of experiments either in silico or in vivo.