11932 Rosethorn Dr.
Austin, Texas 78758

Education Ph.D., Computer Engineering
  Advisor: Prof. Yale Patt
  Research Interests: Chip-multiprocessor architectures, parallel programming, compiler-hardware interactions, and memory system design.
  The University of Texas at Austin
Master of Science, Computer Engineering, December 2005
  The University of Texas at Austin                            GPA: 3.90/ 4.00
  Bachelor of Science, Electrical Engineering with Highest Honors, May 2003
  The University of Texas at Austin                            GPA: 3.97/ 4.00

Industry Experience

Summer 2008 Graduate Intern Technical, Intel Corporation, Austin, TX
Manager: Doug Carmean, Mentor: Eric Sprangle
  • Researched techniques to improve load-balancing in multi-threaded applications.
  • Ported simulation infrastructure to the Amazon Elastic Compute Cloud.

Summer 2007 Graduate Intern Technical, Intel Corporation, Austin, TX
Manager: Doug Carmean, Mentor: Eric Sprangle
  • Investigated software techniques to reduce data synchronization in irregular applications.
  • Studied hardware techniques to reduce the impact of critical sections on performance.
  • Developed a distributed job scheduler for Windows XP compute clusters.
Summer 2006 Graduate Intern Technical, Intel Corporation, Austin, TX
Manager: Doug Carmean, Mentor: Eric Sprangle
  • Parallelized applications to evaluate the trade-off between hardware-efficiency and software-efficiency.
  • Implemented directory-based cache coherence in the existing CMP simulator.
  • Automated the process to benchmark graphics processors.
Summer 2005 Engineering Coop, Advanced Micro Devices, Austin, TX
Manager: Ben Sanders, Mentor: Kevin Lepak
  • Proposed a scheduling policy to reduce cache warm-up effects in TPC-C.
  • Studied memory interference experienced by concurrently running transactions.
  • Added improved checkpointing support for JAVA benchmarks.

Summer 2004 Engineering Coop, Advanced Micro Devices, Austin, TX
Manager: Ben Sanders, Mentor: Kevin Lepak
  • Developed tools to identify instructions from each transaction in TPC-C.
  • Analyzed instruction and data footprint of each transaction type.
  • Ported a suite of multithreaded benchmarks in the simulation environment.

Spring 2004 Product and Test Engineering Intern, Oasis Silicon Systems, Austin, TX
Manager: David Owmby
  • Maintained and enhanced test solutions to reduce cost and increase test coverage.
  • Supported RMA analysis including verification, failure analysis, corrective action, and documentation.
  • Formulated test plans and strategies.

Spring 2003 Hardware Intern, National Instruments, Austin, TX
Manager: Ron Kubena
  • Conducted test audit for digital I/O and timing hardware. Verified test coverage and implemented tests to increase coverage.
  • Tested and Debugged RMAs and Dead-On-Arrival (DOA) boards.
  • Reviewed board-level and ASIC designs of products under development.
Research Experience

08/2004-present Graduate Research Assistant, HPS Research Group
  • Investigating hardware/software techniques to reduce the overhead of critical sections and barriers in multithreaded applications [ASPLOS 2009].
  • Developed Feedback-Driven Threading, a framework to control the number of threads at run-time using feedback from hardware [ASPLOS 2008].
  • Studied the Asymmetric Chip Multiprocessor, an architecture paradigm to reduce the effort required to parallelize applications [TR-HPS-2007].
  • Designed a novel framework to increase cache capacity by filtering unused words in cache lines [HPCA 2007].
  • Proposed profiling techniques to predict input-set dependency of program behavior [CGO 2006].

05/2001-05/2003 Undergraduate Research Assistant, Center for Space Research
  • Supported on-orbit calibration of the laser altimeter for Ice, Cloud, and land Elevation Satellite (launched Jan 2003) [http://icesat.gsfc.nasa.gov].
  • Designed and implemented hardware and software for the verification of laser-altimeter pointing and time of measurement [MST 2003].
  • Developed a GPS-based airplane navigation software in LabVIEW.
  • Wrote image-analysis software to determine laser altimeter geolocation.
Teaching Experience

Fall 2008-present Graduate Research Mentor, Texas Research Experience
  • Mentored two undergraduate seniors in computer architecture research.
Spring 2008 Teaching Assistant, Graduate Microarchitecture Course (EE382N)

Fall 2004 and 2005 Head Teaching Assistant, Senior-level Computer Architecture Course (EE360N)
  • Introduced a new programming assignment in which students develop an instruction-level simulator.
  • Prepared/graded homeworks and conducted weekly discussion sections.

Spring 2005  Mentor, Senior Design Project
  • Co-supervised a team of graduating seniors in researching data prefetchers.

Fall 2003 Teaching Assistant, Freshmen Chemistry Course (GE206D)
  • Taught two sections of 30 students each.
Conference Publications 
  1. M. A. Suleman, Onur Mutlu, M. Qureshi, Y. Patt, "Accelerating Critical Section Execution with Asymmetric Multi-Core Architectures," Proceedings of the 14th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, March 2009.
  1. M. A. Suleman, M. K. Qureshi, Y. N. Patt, "Feedback-Driven Threading: Higher-performance and Power-efficient execution of multithreaded workloads on CMPs," Proceedings of the 13th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, March 2008.
  1. M. Qureshi, M. A. Suleman, Y. N. Patt, "LDIS: Increasing Cache Capacity by Filtering Unused Words in Cache Line," Proceedings of the 13th IEEE International Conference on High Performance Computer Architecture, February 2007.
  1. H. Kim, M. A. Suleman, O. Mutlu, Y. N. Patt, "2D-Profiling: Detecting Input-Dependent Branches with a Single Input Data Set," Proceedings of the 4th Annual International Symposium on Code Generation and Optimization, March 2006.
  1. B. Mohammad, M. Rab, K. Mohammad, M. A. Suleman, "Resizeable Cache Architecture for High Yield SOC," Proceedings of the International Conference on IC Design and Technology, May 2009.

Journal Publications 
  1. L. A. Magruder, M. A. Suleman, and B.E. Schutz, "ICESat Laser Altimeter Measurement Time Validation System," Measurement Science and Technology, Vol. 14, issue 11, November 2003.

Non-Refereed Publications 
  1. M. A. Suleman, Onur Mutlu, M. Qureshi, Y. Patt, "An Asymmetric Multi-core Architecture for Accelerating Critical Sections," HPS Technical Report, TR-HPS-2008-003, September 2008.
  1. M. A. Suleman, Y. N. Patt (UT-Austin), E. Sprangle, A. Rohillah, A. Ghuloum, D. Carmean (Intel), "ACMP: Balancing Hardware Efficiency and Programmer Efficiency," HPS Technical Report, TR-HPS-2007-001, February 2007.
  1. H. Kim, M. A. Suleman, O. Mutlu, Y. N. Patt, "2D-Profiling: Detecting Input-Dependent Branches with a Single Input Data Set," HPS Technical Report, TR-HPS-2006-001, January 2006.

  1. International Conference on Architectural Support for Programming Languages and Operating Systems, 2009. Title: Accelerating Critical Section Execution with Asymmetric Multi-core Architectures.
  1. 2nd Annual Computer Architecture and Embedded Processors Research Review for Industry, The University of Texas at Austin, 2008. Title: An Asymmetric Multi-core Architecture for Accelerating Critical Sections.
  1. Computer Engineering Seminar, Texas A&M University, 2008. Title: High-Performance Execution of Multithreaded Workloads on CMPs.
  1. International Conference on Architectural Support for Programming Languages and Operating Systems, 2008. Title: Feedback-Driven Threading.
  1. Research Review for Industry, The University of Texas at Austin, 2007. Title: ACMP: An Architecture to Handle Amdahl's Law
  1. Advanced Micro Devices (Austin, Texas), 2005. Title: Transaction Type Aware Affinity Scheduling.
Selected Projects
  • Cache Coherence: Investigated invalidation-based cache coherence protocols and bus interface unit designs for CMPs.
  • OR1200 processor: Collaborated with a team in designing a 1 GHz RISC processor. Optimized the area, timing, and power of the wishbone bus.
  • x86 processor: Implemented a subset of the x86 ISA in gate-level Verilog. The processor had a 7-stage pipeline and a 10-us cycle time on 0.35 um. The design included Caches, Virtual Memory, and Branch Prediction.
  • Alpha processor: Designed, implemented, and synthesized a subset of 64-bit Alpha Architecture. The processor had a 4-stage pipeline and was synthesized using three standard cell libraries: 0.13, 0.18, and 0.35 um.
  • Operating System: Programmed operating system commands to implement a multithreaded environment and a file system.
Honors & Awards

  • Medal for graduating from UT with Highest Honors, Spring 2003. 
  • Distinguished Scholar Medal, Spring 2002. 
  • UT Engineering Scholar, Spring 2001 to Spring 2003. 
  • TxTEC Scholar, 2002-2003.
  • College Scholar, Spring 2003.
  • Second prize, IEEE Ford Design Contest, Fall 2001.
  • Valedictorian, School Prefect, and several other academic awards in high school.
  • Extracurricular Activities

  • President, Pakistani Student Support Group, Spring 2007.
  • President, University Cricket Club, Spring 2003.
  • Vice President, Eta Kappa Nu, Electrical Engineering Honors Society, Spring 2003.
  • Publishing Chair, Eta Kappa Nu, Summer and Fall 2002. 
  • Winner of nine poem recitation competitions, 1994-2000.
  • Voluntary tutor, Eta Kappa Nu, 2002-2004.
  • Professional Activities

  • Member, IEEE, HKN, and ACM.
  •  Skills 
  • Design Packages: Verilog, Cadence, Synopsys, SIS, and VIS.
  • Languages: LabVIEW, C/C++, Java, Assembly(MC6812, x86 and TMS320C30).
  • Scripting: Perl, Bash, Tcsh, Awk, and Visual Basic.