Metrics by Dr. Timothy D. Korson CPTR 209

Metrics by Dr. Timothy D. Korson CPTR 209

Metrics by Dr. Timothy D. Korson CPTR 209 Software Engineering 1/41 There is Information in the Code of Interest to Testers Complexity information Memory Management Information

Type mismatch information 2/41 Complexity Metrics Prominent in the history of software metrics has been the search for measures of complexity. This search has been inspired by the belief that only by measuring complexity can we truly understand and conquer it. Because it is a high-level notion made up of many different attributes, there can never be a single measure of software complexity [Fenton 1992]. Yet in the sense described above there have been hundreds of proposed complexity metrics. Most of these are also restricted to code. The best known are Halstead's software science and McCabe's cyclomatic

number. 3/41 Halstead's Software Science 4/41 5/41 McCabe Metrics

Cyclomatic Complexity Metric (v(G)) Cyclomatic Complexity (v(G)) is a measure of the complexity of a module's decision structure. It is the number of linearly independent paths and therefore, the minimum number of paths that should be tested. Essential Complexity Metric (ev(G)) Essential Complexity (ev(G)) is a measure of the degree to which a module contains unstructured constructs. This metric measures the degree of structuredness and the quality of the code. It is used to predict the maintenance effort and to help in the modularization process. Module Design Complexity Metric (iv(G)) Module Design Complexity (iv(G)) is the complexity of the designreduced module and reflects the complexity of the module's calling patterns to its immediate subordinate modules. This metric differentiates between modules which will seriously complicate the design of any program they are part of and modules which simply contain complex computational logic. It is the basis upon which

program design and integration complexities (S0 and S1) are 6/41 McCabe Metrics Pathological Complexity Metric (pv(G)) pv(G) is a measure of the degree to which a module contains extremely unstructured constructs.

Design Complexity Metric (S0) S0 measures the amount of interaction between modules in a system. Integration Complexity Metric (S1) S1 measures the amount of integration testing necessary to guard against errors. Object Integration Complexity Metric (OS1) OS1 quantifies the number of tests necessary to fully integrate an object or class into an OO system. Global Data Complexity Metric (gdv(G)) gdv(G) quantifies the cyclomatic complexity of a module's structure as it relates to global/parameter data. It can be no less than one and no more than the cyclomatic complexity of the original flowgraph. 7/41 Criticism Despite their widespread use, the Halstead and McCabe metrics have been criticized on both

empirical and theoretical grounds. Empirically it has been claimed that they are no better indicators of complexity than LOC since they are no better at predicting effort, reliability, or maintainability. Theoretically, it has been argued that the metrics are too simplistic; for example, McCabe's metric is criticized for failing to take account of data-flow complexity or the complexity of unstructured programs. 8/41 Code Metrics for Everyone Developers should use code analysis tools extensively to help them develop high quality code. Managers will want to see reports and trends from code

analysis tools to know what risk reduction measures to take System testers will want to use reports from code metrics tools to help them determine if additional testing is necessary, and if so, which areas of the code need more testing. In some cases the tools can help pinpoint specific additional test cases that should be run. Clients may want to include various metrics thresholds as part of the acceptance criteria for delivered systems 9/41 Smoke Test It is common for a systems test group to have a small Smoke Test Suite which the system must pass before it is accepted by the independent test team for comprehensive system testing.

When a software development organization has a mature metrics program in place, the criteria for passing the smoke test often also requires documentation that the system has achieved certain metrics thresholds and passed certain automated static checks. 10/41 Standards 11/41 GQM Goal

Question Optimal allocation of What errors would test effort be most damaging to the stakeholders? Metrics Frequency of use Consequence of failure Where are errors Cyclomatic most likely to occur? complexity 12/41

Program Complexity McCabes Cyclomatic complexity is equal to the maximum number of linearly independent paths throughout the program. Those are called Basis Paths and any other path throughout the program can be expressed as a combination of some of these paths. The simplest way to compute cyclomatic complexity is the number of regions in the flow graph plus one. 13/41 How Many Paths Through the Program? credit rating >= 4?

N Y Approve 3 1 4 income >=100,000? 2 Y N Y children < 3?

N Disapprove IF credit rating > = 4 THEN approve ELSE IF (income >= 100,000) AND (number of children < 3) THEN approve ELSE disapprove

14/41 Loop Example credit rating >= 4? Y N 1 Approve income >=100,000? Y Y 5

N 2 3 N children < 3? 4 Disapprove More? Now how many paths? 15/41

Path vs. Code Coverage Path (or logic) coverage Requires a significant level of testing resources Often required for safety critical systems Many commercial systems struggle to even achieve adequate code coverage levels 16/41 Code Coverage Tools Example Summary Report Cobertura

17 17/41 Coverage At the Primitive Component Level Require 100% coverage As integration proceeds try to keep to 100% until it become infeasible 100% coverage at the system level for todays complex distributed systems is nearly impossible, however 50% coverage at the system level is insufficient! But it is common. 18/41

Defect Ratio in C If Subsystem C has 1000 lines of code and 12 defects have been found in C. Then we say that the defect ratio in C is 12/1000 = 1.2% 19/41 Used As Synonyms Defect ratio Defect density

Defect rate Fault rate Fault density 20/41 Errors, Faults, and Failures IEEE standard 729 Vocabulary (Incident) Systems containing many faults may be very reliable, because the conditions that trigger the faults may be very rare. 21/41

Hypothesis Areas of the code with the highest complexity will have the highest defect density 22/41 Defect Density may even be more an indicator of testing severity than quality. 23/41 Exercise Suppose the defect density in component A is 10 defects per KLOC and the defect density in component B is 25 defects per KLOC. B might be more faulty than A

List at least 5 additional reasons why B might have a higher defect density than A 1. 2. 3. 4. 5. 24/41 What is a Defect? In some studies defects means just post-release failures in others it means all known faults in others it is the set of faults discovered after some arbitrary fixed point in the software life-cycle (e.g. after unit testing).

25/41 Types of Defects Critical Failures Non-Critical Failure that has a work around Issue with performance, scalability Vulnerability to attack Lack of information security failure to encrypt Usability Poor GUI design

Workflow Organization Inconsistencies Missing functionality Bad Grammar in the GUI Misspelled words in the GUI GUI standards not followed 26/41 Are all Defects Bugs?

Errors Issues Anomalies Defects Bugs Crashes 27/41 Incident Count Metrics

It is important for developers to measure those aspects of software quality that can be useful for determining how many problems have been found with a product how effective are the prevention, detection and removal processes when the product is ready for release to the next development stage or to the customer how the current version of a product compares in quality with previous or competing versions 28/41 Software Size? There is no consensus about how to measure software size in a consistent and comparable way. Even when using the most common size

measure (LOC or KLOC) for the same programming language, deviations in counting rules can result in variations by factors of one to five. 29/41 What Does This Mean? in the USA and Europe the average defect density (based on number of known postrelease defects) appears to be between 5 and 10 per KLOC Reference for this and the next few slides: Quality Assurance and Metrics by Norman Fenton

30/41 De-facto Industry Standard Despite the serious problems is calculating standard values we accept that defect density has become the de-facto industry standard measure of software quality. Commercial organizations argue that they avoid many problems by having formal definitions which are consistent in their own environment. In other words, it works for them, but you should not try to make comparisons outside of the source environment. This is sensible advice. 31/41

Benchmarking and Predicting It is inevitable that organizations are hungry both for benchmarking data on defect densities and for predictive models of defect density. For both benchmarking and predicting, we do have to make cross project comparisons and inferences. It is important, therefore for broader QA issues, that we review what is known about defect density benchmarks. 32/41 Industry Numbers It is widely believed that a (delivered) defect

density of below 2 per KLOC is good going. In one of the more revealing of the published papers [Daskalantonakis 1992] reports that Motorolas six sigma quality goal is to have no more than 3.4 defects per million of output units from a project. This translates to a an exceptionally low defect density of 0.0034 per KLOC. The paper seems to suggest that the actual defect density lay between 1 and 6 per KLOC on projects in 1990 33/41 34/41 35/41 Tools that Calculate the

Halstead Software Metrics Krakatau Professional McCabe IQ Developers Edition Testwell CMT++ and CMTJava JStyleTM npath nag_metrics 36/41

Halstead Software Science The program length (N) is the sum of the total number of operators and operands: N = N1 + N2 The vocabulary size (n) is the sum of the number of unique operators and operands:

n = n1 + n2 The program volume (V) is the information contents of the program, measured in mathematical bits. (V) describes the size of the implementation of an algorithm: V = N * log2(n) The volume of a function should be at least 20 and at most 1000. The volume of a parameterless one-line function that is not empty is about 20. The volume of a file should be at least 100 and at most 8000. These limits are based on volumes measured for files whose LOCpro and v(G) are near their recommended limits. 37/41 Halstead Software Science

The program length (N) is the sum of the total number of operators and operands: N = N1 + N2 (Suppose we have 100 symbols to encode) The vocabulary size (n) is the sum of the number of unique operators and operands: n = n1 + n2 (Suppose we have 32 unique variables and operators) (Each program token will need 5 bits to encode it. 25 = 32 or 5=log2(32)) The program volume (V) is the information contents of the program, measured in mathematical bits. (V) describes the size of the implementation of an algorithm:

V = N * log2(n) (then the program would take 100 * 5 = 500 bits to encode) 38/41 In spite of the theoretical popularity of the Halstead Software Science It is not widely supported in tools It is not widely used by commercial software developers 39/41 COCOMO 40/41 Function

Points vs. LOC 41/41 References

Albrecht A.J, Measuring Application Development, Proceedings of IBM Applications Development joint SHARE/GUIDE symposium. Monterey CA, pp 83-92, 1979. Barnard J and Price A, Managing code inspection information, IEEE Software, 59-69, March, 1994. Basili VR and Rombach HD, The TAME project: Towards improvement-oriented software environments, IEEE Transactions on Software Engineering 14(6), pp 758-773, 1988. Boehm BW, Software Engineering Economics, Prentice-Hall, New York, 1981. Bollinger TB and McGowan C, A critical look at software capability evaluations, IEEE Software, 25-41, July, 1991. Cox G, Sustaining a metrics programme in industry, in Software Reliability and Metrics (eds Fenton NE and Littlewood B), Elsevier, 1991, pp 1-15, 1991. Daskalantonakis, MK, A practical view of software measurement and implementation experiences within Motorola, IEEE Trans Software Eng, 18 (11) 998--1010, 1992. Fenton NE, Software Metrics: A Rigorous Approach, Chapman and Hall, 1991.

Fenton NE, When a sofware measure is not a measure, Software Eng J 7 (5), 357-362, 1992. Fenton NE and Pfleeger SL, Software Metrics: A Rigorous and Practical Approach (2nd Edition), International Thomson Computer Press, 1996. Fenton NE, Littlewood B, and Page S, Evaluating software engineering standards and methods, in Software Engineering: A European Perspective (Ed: Thayer R, McGettrick AD), IEEE Computer Society Press, pp 463--470, 1993. Halstead M, Elements of Software Science, North Holland, , 1977. Harel D, Algorithmics, 2nd Edition, Addison Wesley, 1992. Hatton, L., & Hopkins, T. R, Experiences with Flint, a software metrication tool for Fortran 77, In Symposium on Software Tools, Napier Polytechnic, Edinburgh, 1989. 42/41 References

Henry S and Kafura D, The evaluation of software system's struc- ture using quantitative software metrics, Software Practice and Experience 14(6), pp.561-573 (June), 1984. Humphrey WS, Managing the Software Process, Addison-Wesley, Reading, Massachusetts, 1989.

IEEE, Standard 729: Glossary of software engineering terminology, IEEE Computer Society Press, 1983. IEEE, Software quality metrics methodology Standard P-1061/D20, IEEE Computer Society, 1989. IEEE, Standard 1061: Software Quality Metrics Methodology, , 1992. IEEE P1044, A standard classification for software anomolies (draft), IEEE Computer Society, 1992. International Organisation for Standardisation , Quality Management and Quality Assurance Standards - Part 3: Guidelines for the Application of ISO 9001 to the Development, Supply and Maintenance of, ISO/IS 9000-3, 1990. International Organisation for Standardisation , Information technology - Software product evaluation - Quality characteristics and guide lines for their use, ISO/IEC IS 9126, 1991. International Standards Organisation, SPICE Baseline Practice Guide, Product Description, Issue 0.03 (Draft), July , 1993. International Standards Organisation., ISO 9001: Quality Systems - Model for Quality Assurance in Design, Development, Production, Installation and Servicing, International Standards Organisation., 1987. Jeffery DR, Low GC and Barnes M, A comparison of function point counting techniques, IEEE Trans Software Eng, 19(5), 529--532, 1993. Juran JM, Gryna FM Jr, Bingham FM (eds), Quality Control Handbook (3rd edn), McGraw Hill, New York, 1979. Keller, T , Measurements role in providing ``error-free'' onboard shuttle software, 3rd Intl Applications of Software Metrics Conference, La Jolla, California", pp 2.154-2.166, Proceedings available from Software Quality

Engineering, 1992. Kitchenham BA and de Neumann B, Cost modelling and estimation, in Software Reliability Handbook, (ed Rook P), Elsevier Applied Science, 333--376, 1990. Littlewood B, Forecasting software reliability, in Software Reliability, Modelling and Identification, (Ed. Bittanti S), Lecture Notes in Computer Science 341Springer-Verlag, 141-209, 1988. 43/41 References

Lyu MR (ed), The Handbook of Software Reliability Engineering, McGraw Hill, 1996. McCabe T, A Software Complexity Measure, IEEE Trans. Software Engineering SE-2(4), 308-320, 1976. McCall JA, Richards PK, Walters GF, Factors in Software Quality, RADC TR-77-369, 1977. Vols I,II,III', US Rome Air Development Center Reports NTIS AD/A-049 014, 015, 055, 1977. Oviedo EI, Control flow, data flow, and program complexity, In Proc COMPSAC 80, IEEE Computer Society Press, New York, 146-152, 1980. Paulk M, Weber CV, Curtis B, The Capability Maturity Model for Software: Guidelines for Improving the Software Process, Addison Wesley, 1994. Pfleeger SL, Fenton NE, Page P, Evaluating software engineering standards, IEEE Computer, 27(9), 71-79, Sept, 1994. Riley P, Towards safe and reliable software for Eurostar, GEC Journal of Research 12 (1), 3-12, 1995. Woda, H. and Schynoll, W. (eds) , Lean Software Development, ESPRIT BOOTSTRAP conference proceedings, Stuttgart, Germany, 1992. Woodward MR, Hennell MA, Hedley D, A measure of control flow complexity in program text, IEEE Trans Soft.

Eng, SE-5 (1), 45-50, 1979. Zuse H, Software Complexity: Measures and Methods, De Gruyter. Berlin, 1991. 44/41 Modeling change requests due to faults in a large-scale telecommunication system Ho-Won Jung, , a, YiKyong Lim, b and Chang-Shin Chung, c Abstract It is widely known that a small number of modules in any system are likely to contain the majority of faults. Early identification and consequent attention to such modules may mitigate or prevent many defects. The objective of this study is to use product metrics to build a prediction model of the number of change requests (CRs) that are likely to occur in individual modules during testing. The study first empirically validates eight product metrics, while considering the confounding effects of code size (lines of code). Next, a prediction model of CR outcomes is developed with the validated metrics by utilizing a negative binomial regression that allows over-dispersion. In total, 816 modules written in

the Chill programming language were analyzed in a large-scale telecommunication system. There is a positive association between the number of CRs and four product metrics (number of unique operators, unique operands, signals, and library calls) after considering the confounding effect of code size. A prediction model that includes only code size and the number of unique operands provides the best empirical fit. Author Keywords: Complexity; Metric validation; Negative binomial regression; Overdispersion; Pareto principle; Prediction model; Software metrics Journal of Systems and Software Volume 72, Issue 2, July 2004, Pages 235-247 45/41 An Investigation into the Functional Form of the Size-Defect Relationship for Software Modules A. Gne Koru, University of Maryland Baltimore County, Baltimore

Dongsong Zhang, University of Maryland Baltimore County, Baltimore Khaled El Emam, University of Ottawa, Ottawa Hongfang Liu, Georgetown University, Washington 46/41

Recently Viewed Presentations

  • Presentazione di PowerPoint

    Presentazione di PowerPoint

    Contact with nature was the best means to reach truth and awareness of the unity of all things. The 'over-soul' was the spiritual principle linking everything together. Man was the emanation of the over-soul, and the emphasis lay on his...
  • Chile Earthquake and Tsunami

    Chile Earthquake and Tsunami

    Chile Earthquake and Tsunami February 27, 2010 * * Introduction: How an earthquake happens? In geologic terms, Plate is one of the very large pieces of rock that form the earth's surface and move slowly. Subduction zone is the place...
  • Slajd 1 -

    Slajd 1 -

    JU SLUŽBA ZA ZAPOŠLJAVANJE BPK GORAŽDE IZVJEŠTAJ O RADU ZA 2009. GODINU Povećanje broja nezaposlenih u 2009. godini za 67 osoba ili 1,6% - Broj zaposlenih u 2009. godini povećao se za 415 osoba ili 8,58% - Stopa nezaposlenosti na...
  • Ohio National Guard

    Ohio National Guard

    Chapter 1: General Information. Plan upholds Merit System Principles and the Prohibited Personnel Practices outlined in 5 U.S.C. 2302(b) Developed IWA regulatory guidance issued by OPM, DoD, and the National Guard Bureau for placement and promotion of non-dual and dual...
  • TestOut Experience Conference - TestOut Presenter

    TestOut Experience Conference - TestOut Presenter

    EH Course Demonstration - Craig Jenkins. Why IT Fundamentals Pro? - Ken Sardoni. ITF Course Demonstration - Paul Miller. Q & A. Ethical Hacker Pro. Cybersecurity Market Outlook. $120B per year current spending. Projected spending of $300B per year by...
  • NECSTouR conference - Torrent

    NECSTouR conference - Torrent

    NECSTouR IS THE VOICE OF REGIONS FOR . SUSTAINABLE & COMPETITIVE TOURISM. 36 Regional Authorities. 30 Associated and Academic Members. 12 Years of Experience
  • CPSC 411 Design and Analysis of Algorithms

    CPSC 411 Design and Analysis of Algorithms

    Arial MS Pゴシック Franklin Gothic Book Wingdings Futura Times New Roman Symbol Refined 1_Refined CSCE 411 Design and Analysis of Algorithms Analyzing Calls to a Data Structure Heapsort Example Amortized Analysis Running Example #1: Augmented Stack S Running Example #2:...
  • Agenda Item 9.1.5 ITU WRC-15

    Agenda Item 9.1.5 ITU WRC-15

    St Denis. Alger. Toulouse. Nairobi. Luanda. Planned links. Aeronautical Communication VSAT Links. Kinshasa. Dar es Salaam ... 81 Countries in Region 1 opt-in and the band 3400- 3600MHz was allocated and became effective in November 2010 on a co-primary basis...