![]() |
Verifying Permutation Rewritable Hazard Free LoopsBy Michal Dobrogost August 2011 We present an extension to the language of Atomic Verifiable Operation (AVOp) streams to allow the expression of loops which are rewritable via an arbitrary permutation. Inspired by (and significantly extending) hardware register rotation to data buffers in multi-core programs, we hope to achieve similar performance benefits in expressing software pipelined programs across many cores. By adding loops to AVOp streams, we achieve significant stream compression, which eliminates an impediment to scalability of this abstraction. Furthermore, we present a fast extension to the previous AVOp verification process which ensures that no data hazards are present in the program's patterns of communication. Our extension to the verification process is to verify loops without completely unrolling them. A proof of correctness for the verification process is presented. Read the entire thesis here. |
Simulation and Optimal Design of Nuclear Magnetic Resonance ExperimentsBy Zhenghua Nie July 2011 In this study, we concentrate on spin-1/2 systems. A series of tools using the Liouville space method have been developed for simulating of NMR of arbitrary pulse sequences. We have calculated one- and two-spin symbolically, and larger systems numerically of steady states. The one-spin calculations show how SSFP converges to continuous wave NMR. A general formula for two-spin systems has been derived for the creation of double-quantum signals as a function of irradiation strength, coupling constant, and chemical shift difference. The formalism is general and can be extended to more complex spin systems. Estimates of transverse relaxation, R2, are affected by frequency offset and field inhomogeneity. We find that in the presence of expected B0 inhomogeneity, off-resonance effects can be removed from R2 measurements, when ||omega||<= 0.5 gamma B1 in Hahn echo experiments, when ||omega||<=gamma B1 in CPMG experiments with specific phase variations, by fitting exact solutions of the Bloch equations given in the Lagrange form. Approximate solutions of CPMG experiments show the specific phase variations can significantly smooth the dependence of measured intensities on frequency offset in the range of +/- 1/2 gamma B1. The effective R2 of CPMG experiments when using a phase variation scheme can be expressed as a second-order formula with respect to the ratio of offset to pi-pulse amplitude. Optimization problems using the exact or approximate solution of the Bloch equations are established for designing optimal broadband universal rotation (OBUR) pulses. OBUR pulses are independent of initial magnetization and can be applied to replace any pulse of the same flip angles in a pulse sequence. We demonstrate the process to exactly and efficiently calculate the first- and second-order derivatives with respect to pulses. Using these exact derivatives, a second-order optimization method is employed to design pulses. Experiments and simulations show that OBUR pulses can provide more uniform spectra in the designed offset range and come up with advantages in CPMG experiments. |
![]() |
![]() |
A Semi-Definite, Nonlinear Model for Optimizing k-Space Sample Separation in Parallel Magnetic Resonance ImagingByQiong Wu July 2011 Parallel MRI, in which k-space is regularly or irregularly undersampled, is critical for imaging speed acceleration. In this thesis, we show how to optimize a regular undersampling pattern for three-d\ imensional Cartesian imaging in order to achieve faster data acquisition and/or higher signal to noise ratio (SNR) by using nonlinear optimization. A new sensitivity profiling approach is proposed to pro\ duce better sensitivity maps, required for the sampling optimization. This design approach is easily adapted to calculate sensitivities for arbitrary planes and volumes. The use of a semi-definite, linea\ rly constrained model to optimize a parallel MRI undersampling pattern is novel. To solve this problem, an iterative trust-region is applied. When tested on real coil data, the optimal solution presents \ a significant theoretical improvement in accelerating data acquisition speed and eliminating noise. |
Locating Carbon Bonds from INADEQUATE Spectra using Continuous Optimization Methods and Non-Uniform K-Space SamplingBy Sean Colin Watson May 2011 The 2-D INADEQUATE experiment is a useful experiment for determining carbon structures of organic molecules known for having low signal-to-noise ratios. A non-linear optimization method for solving low-signal spectra resulting from this experiment is introduced to compensate. The method relies on the peak locations defined by the INADEQUATE experiment to create boxes around these areas and measure the signal in each. By measuring pairs of these boxes and applying penalty functions that represent a priori information, we are able to quickly and reliably solve spectra with an acquisition time under a quarter of that required by traditional methods. Examples are shown using the spectrum of sucrose. The concept of a non-uniform Fourier transform and its potential advantages are introduced. The possible application of this type of transform to the INADEQUATE experiment and the previously explained optimization program is detailed. |
![]() |
![]() |
Elementary Function Evaluation Using New Hardware InstructionsBy Anuroop Sharma August 2010 In this thesis, we present novel fast and accurate hardware/ soft- ware implementations of the elementary math functions based on range reduction, e.g. Bemer’s multiplicative reduction and Gal’s accurate table methods. The software implementations are branch free, because the new instructions we are proposing internalize the control flow associated with handling exceptional cases. These methods provide an alternative to common iterative methods of computing reciprocal, square root and reciprocal square root. These methods could be applied to any rational-power operation. These methods require either the precision available through fused multiply-accumulate instructions or extra working precision in registers. We also extend the range reduction methods to include trigonometric and inverse trigonometric functions. The new hardware instructions enable exception handling at no additional cost in execution time, and scale linearly with increasing superscalar and SIMD widths. Based on reduced instruction, constant counts, and reduced register pressure we would recommend that optimizing compilers always in-line such functions, further improving performance by eliminating function-call overhead. On the Cell/B.E. SPU, we found an overall 234% increase in throughput for the new table-based methods, with increased accuracy. The research reported in the thesis has resulted in a patent application [AES10], filed jointly with IBM. Read the entire thesis here. |
Model-Based Tissue Quantification from Simulated Partial k-Space MRI DataBy Mehrdad Mozafari July 2008 Pixel values in MR images are linear combinations of contributions from multiple tissue fractions. The tissue fractions can be recovered using the Moore- Penrose pseudo-inverse if the tissue parameters are known, or can be deduced using machine learning. Acquiring sufficiently many source images may be too time consuming for some applications. In this thesis, we show how tissue fractions can be recovered from partial k-space data, collected in a fraction of the time required for a full set of experiments. The key to reaching significant sample reductions is the use of regularization. As an additional benefit, regularizing the inverse problem for tissue fractions also reduces the sensitivity to measurement noise. Numerical simulations are presented showing the effectiveness of the method, showing three tissue types. Clinically, this corresponds to liver imaging, in which normal liver, fatty liver and blood would need to be included in a model, in order to get an accurate fatty liver ratio, because all three overlap in liver pixels (via partial voluming). |
![]() |
Explicitly Staged Software PipeliningBy Wolfgang Thaller August 2006 Software Pipelining is a method of instruction scheduling where loops are scheduled more efficiently by executing operations from more than one iteration of the loop in parallel. Finding an optimal software pipelined schedule is NP-complete, but many heuristic algorithms exist. In iteration i , a software pipelined loop will execute, in parallel, "stage" 1 of iteration i , stage 2 of iteration i - 1 and so on until stage k of iteration i - k + 1. We present a new approach to software pipelining based on using a heuristic algorithm to explicitly assign each operation to its stage before the actual scheduling. This explicit assignment allows us to implement control flow mechanisms that are hard to implement with traditional methods of software pipelining, which do not give us direct control over what stages instructions are assigned to. Read the entire thesis here. |
HUSC Language and Type SystemBy Gordon J. Uszkay June 1, 2006 HUSC is a high level declarative language designed to capture the relations and properties of a complex system independently of implementation and platform. It is intended for use in Coconut (COde CONstructing User Tool), a pro ject at McMaster University to create a new development environment for safety-critical, computation- ally intensive domains such as medical imaging. The language is intended to provide an interface that is comfortable for use by scientists and engineers, while providing the benefits of strong, static type analysis found in functional programming languages. HUSC supports type inferencing using constraint handling rules, including both predefined, parameterized shapes and an arbitrary number of additional properties or constraints called attributes. Each attribute class provides its own type inferencing rules according to the HUSC attribute class definition structure. Multiple imple- mentations of an operator or function can each specify a specialized type context, including attributes, and be selected based on type inferencing. The HUSC system creates a typed code hypergraph, with terms as nodes and edges being all of the operator or functions implementations that are satisfied in the type context, which can be used as the basis for an optimizing compiler back end. We present here the HUSC language and type system, a prototype implementation and an example demonstrating how the type inferencing may be used in conjunction with the function joins to provide a good starting point for code graph optimization. It also includes a number of observations and suggestions for making the transition from prototype to useable system. The results of this work are sufficient to demonstrate that this approach is promising, while highlighting some difficulties in producing a robust, practical implementation. |
![]() |